Files
work-club-manager/.sisyphus/notepads/club-work-manager/issues.md
WorkClub Automation 5fb148a9eb chore(evidence): add QA evidence and notepads from debugging sessions
Add comprehensive QA evidence including manual testing reports, RLS isolation
tests, API CRUD verification, JWT decoded claims, and auth evidence files.
Include updated notepads with decisions, issues, and learnings from full-stack
debugging sessions.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-05 19:22:55 +01:00

8.1 KiB

Issues — Club Work Manager

Problems, gotchas, and edge cases discovered during implementation


2026-03-05: F3 QA Re-Execution - CRITICAL BLOCKERS

Blocker #5: Finbuckle Tenant Resolution Failure

Discovered: Phase 2 RLS testing
Severity: CRITICAL - Production blocker

Problem:

  • IMultiTenantContextAccessor.MultiTenantContext returns NULL on every request
  • WithInMemoryStore() configured but no tenants registered
  • TenantDbConnectionInterceptor cannot set app.current_tenant_id
  • RLS policies exist but have no effect (tenant context never set)

Evidence:

warn: TenantDbConnectionInterceptor[0]
      No tenant context available for database connection

Root Cause: Finbuckle's InMemoryStore requires explicit tenant registration:

// Current (broken):
.WithInMemoryStore(options => {
    options.IsCaseSensitive = false;
    // NO TENANTS ADDED!
});

// Needs:
.WithInMemoryStore(options => {
    options.Tenants = LoadTenantsFromDatabase();  // Or hardcode for dev
});

Impact:

  • Before FORCE RLS applied: API returned ALL tenants' data (security violation)
  • After FORCE RLS applied: API returns 0 rows (RLS blocks everything)
  • Blocks 52/58 QA scenarios

Remediation Options:

  1. Quick fix: Hardcode 2 tenants in InMemoryStore (5 mins)
  2. Proper fix: Switch to EFCoreStore (30 mins)
  3. Alternative: Remove Finbuckle, use HttpContext.Items (60 mins)

Issue #6: TenantId Column Mismatch (Fixed During QA)

Discovered: Phase 2 RLS testing
Severity: HIGH - Data integrity

Problem:

  • work_items.TenantId had different UUIDs than clubs.Id
  • Example: TenantId 64e05b5e-ef45-81d7-f2e8-3d14bd197383 vs ClubId afa8daf3-5cfa-4589-9200-b39a538a12de
  • Likely from seed data using Guid.NewGuid() for TenantId instead of ClubId

Fix Applied:

UPDATE work_items SET "TenantId" = "ClubId"::text WHERE "ClubId" = 'afa8daf3-5cfa-4589-9200-b39a538a12de';
UPDATE work_items SET "TenantId" = "ClubId"::text WHERE "ClubId" = 'a1952a72-2e13-4a4e-87dd-821847b58698';

Permanent Fix Needed:

  • Update seed data logic to set TenantId = ClubId during creation
  • Add database constraint: CHECK (TenantId::uuid = ClubId)

Issue #7: RLS Policies Not Executed (Fixed During QA)

Discovered: Phase 2 RLS testing
Severity: HIGH - Security

Problem:

  • backend/WorkClub.Infrastructure/Migrations/add-rls-policies.sql exists
  • Never executed as part of EF migrations
  • Policies missing from database

Fix Applied:

docker exec -i workclub_postgres psql -U workclub -d workclub < add-rls-policies.sql

Permanent Fix Needed:

  • Add RLS SQL to EF migration (or post-deployment script)
  • Verify policies exist in health check endpoint

Issue #8: RLS Not Enforced for Table Owner (Fixed During QA)

Discovered: Phase 2 RLS testing
Severity: HIGH - Security

Problem:

  • PostgreSQL default: Table owner bypasses RLS
  • API connects as workclub user (table owner)
  • RLS policies ineffective even when tenant context set

Fix Applied:

ALTER TABLE work_items FORCE ROW LEVEL SECURITY;
ALTER TABLE clubs FORCE ROW LEVEL SECURITY;
ALTER TABLE members FORCE ROW LEVEL SECURITY;
ALTER TABLE shifts FORCE ROW LEVEL SECURITY;
ALTER TABLE shift_signups FORCE ROW LEVEL SECURITY;

Permanent Fix Needed:

  • Add FORCE ROW LEVEL SECURITY to migration SQL
  • OR: Create separate app_user role (non-owner) for API connections

Lesson: RLS Multi-Layer Defense Failed

What We Learned:

  1. RLS policies are USELESS if SET LOCAL app.current_tenant_id is never called
  2. Finbuckle's abstraction hides configuration errors (no exceptions, just NULL context)
  3. PostgreSQL table owner bypass is a common gotcha (need FORCE RLS)
  4. TenantId must match ClubId EXACTLY (seed data validation critical)

Testing Gap:

  • Initial QA focused on authentication (JWT audience claim)
  • Assumed RLS worked if API returned 403 for wrong tenant
  • Did not test actual data isolation until Phase 2

Going Forward:

  • Add integration test: Verify user A cannot see user B's data
  • Add health check: Verify RLS policies exist and are enabled
  • Add startup validation: Verify Finbuckle tenant store is populated

2026-03-05: Keycloak Authentication Issue Resolution

Problem: "Invalid user credentials" error on password authentication

Discovered: During QA re-execution phase
Severity: CRITICAL - Authentication blocker

Symptoms:

Root Cause: Two separate issues found:

  1. Passwords were NOT actually set: The kcadm.sh set-password commands may have appeared to succeed (no error output) but didn't actually update the password hash in the Keycloak database. When Docker container was recreated, passwords reverted to initial state from realm export.

  2. Missing audience claim in JWT: Initial realm-export.json configured club membership mapper but no audience mapper. JWTs were missing aud: workclub-api claim required by backend API validation.

Investigation Process:

# Step 1: Verify user status
docker exec workclub_keycloak /opt/keycloak/bin/kcadm.sh get users -r workclub --fields username,enabled,emailVerified
# Result: All users enabled, email verified ✓

# Step 2: Check user credentials exist
docker exec workclub_keycloak /opt/keycloak/bin/kcadm.sh get users/{id}/credentials -r workclub
# Result: Password credentials exist with argon2 hash ✓

# Step 3: Test token endpoint
curl -s -X POST http://localhost:8080/realms/workclub/protocol/openid-connect/token \
  -d "client_id=workclub-app" \
  -d "grant_type=password" \
  -d "username=admin@test.com" \
  -d "password=testpass123"
# Result: JWT returned successfully! ✓

Fix Applied:

  1. Password authentication was working: No action needed. Current Keycloak state has correct password hashes from realm-export import.

  2. Added audience protocol mapper:

    • Created hardcoded claim mapper on workclub-app client
    • Claim name: aud
    • Claim value: workclub-api
    • Applied to: access tokens only
    docker exec -i workclub_keycloak /opt/keycloak/bin/kcadm.sh create \
      clients/452efd8f-2c25-41c1-a58c-1dad30304f67/protocol-mappers/models \
      -r workclub -f - << EOF
    {
      "name": "workclub-api-audience",
      "protocol": "openid-connect",
      "protocolMapper": "oidc-hardcoded-claim-mapper",
      "consentRequired": false,
      "config": {
        "claim.name": "aud",
        "claim.value": "workclub-api",
        "access.token.claim": "true",
        "id.token.claim": "false",
        "userinfo.token.claim": "false"
      }
    }
    EOF
    

Verification Results:

admin@test.com authentication:

{
  "aud": "workclub-api",
  "clubs": {"club-1-uuid": "admin", "club-2-uuid": "member"},
  "azp": "workclub-app",
  "email": "admin@test.com",
  "name": "Admin User"
}

member1@test.com authentication:

{
  "aud": "workclub-api",
  "clubs": {"club-1-uuid": "member", "club-2-uuid": "member"},
  "azp": "workclub-app",
  "email": "member1@test.com",
  "name": "Member One"
}

Key Learnings:

  1. Keycloak's password reset via CLI succeeds silently even if database transaction fails
  2. Container recreation restores state from initial import file (realm-export.json)
  3. Always verify JWT structure matches backend validator expectations (especially aud claim)
  4. Test actual token generation, not just user enabled/email status
  5. Protocol mappers are configuration-critical for multi-tenant systems with custom claims

Permanent Fixes Needed:

  1. Update realm-export.json to include audience protocol mapper definition for workclub-app client
  2. Document JWT claim requirements in API authentication specification
  3. Add integration test: Verify all required JWT claims present before API token validation