Files
work-club-manager/.sisyphus/notepads/club-work-manager/issues.md

259 lines
8.7 KiB
Markdown

# Issues — Club Work Manager
_Problems, gotchas, and edge cases discovered during implementation_
---
## 2026-03-05: F3 QA Re-Execution - CRITICAL BLOCKERS
### Blocker #5: Finbuckle Tenant Resolution Failure
**Discovered**: Phase 2 RLS testing
**Severity**: CRITICAL - Production blocker
**Problem**:
- `IMultiTenantContextAccessor.MultiTenantContext` returns NULL on every request
- `WithInMemoryStore()` configured but no tenants registered
- `TenantDbConnectionInterceptor` cannot set `app.current_tenant_id`
- RLS policies exist but have no effect (tenant context never set)
**Evidence**:
```
warn: TenantDbConnectionInterceptor[0]
No tenant context available for database connection
```
**Root Cause**:
Finbuckle's InMemoryStore requires explicit tenant registration:
```csharp
// Current (broken):
.WithInMemoryStore(options => {
options.IsCaseSensitive = false;
// NO TENANTS ADDED!
});
// Needs:
.WithInMemoryStore(options => {
options.Tenants = LoadTenantsFromDatabase(); // Or hardcode for dev
});
```
**Impact**:
- Before FORCE RLS applied: API returned ALL tenants' data (security violation)
- After FORCE RLS applied: API returns 0 rows (RLS blocks everything)
- Blocks 52/58 QA scenarios
**Remediation Options**:
1. **Quick fix**: Hardcode 2 tenants in InMemoryStore (5 mins)
2. **Proper fix**: Switch to EFCoreStore (30 mins)
3. **Alternative**: Remove Finbuckle, use HttpContext.Items (60 mins)
---
### Issue #6: TenantId Column Mismatch (Fixed During QA)
**Discovered**: Phase 2 RLS testing
**Severity**: HIGH - Data integrity
**Problem**:
- `work_items.TenantId` had different UUIDs than `clubs.Id`
- Example: TenantId `64e05b5e-ef45-81d7-f2e8-3d14bd197383` vs ClubId `afa8daf3-5cfa-4589-9200-b39a538a12de`
- Likely from seed data using `Guid.NewGuid()` for TenantId instead of ClubId
**Fix Applied**:
```sql
UPDATE work_items SET "TenantId" = "ClubId"::text WHERE "ClubId" = 'afa8daf3-5cfa-4589-9200-b39a538a12de';
UPDATE work_items SET "TenantId" = "ClubId"::text WHERE "ClubId" = 'a1952a72-2e13-4a4e-87dd-821847b58698';
```
**Permanent Fix Needed**:
- Update seed data logic to set `TenantId = ClubId` during creation
- Add database constraint: `CHECK (TenantId::uuid = ClubId)`
---
### Issue #7: RLS Policies Not Executed (Fixed During QA)
**Discovered**: Phase 2 RLS testing
**Severity**: HIGH - Security
**Problem**:
- `backend/WorkClub.Infrastructure/Migrations/add-rls-policies.sql` exists
- Never executed as part of EF migrations
- Policies missing from database
**Fix Applied**:
```bash
docker exec -i workclub_postgres psql -U workclub -d workclub < add-rls-policies.sql
```
**Permanent Fix Needed**:
- Add RLS SQL to EF migration (or post-deployment script)
- Verify policies exist in health check endpoint
---
### Issue #8: RLS Not Enforced for Table Owner (Fixed During QA)
**Discovered**: Phase 2 RLS testing
**Severity**: HIGH - Security
**Problem**:
- PostgreSQL default: Table owner bypasses RLS
- API connects as `workclub` user (table owner)
- RLS policies ineffective even when tenant context set
**Fix Applied**:
```sql
ALTER TABLE work_items FORCE ROW LEVEL SECURITY;
ALTER TABLE clubs FORCE ROW LEVEL SECURITY;
ALTER TABLE members FORCE ROW LEVEL SECURITY;
ALTER TABLE shifts FORCE ROW LEVEL SECURITY;
ALTER TABLE shift_signups FORCE ROW LEVEL SECURITY;
```
**Permanent Fix Needed**:
- Add `FORCE ROW LEVEL SECURITY` to migration SQL
- OR: Create separate `app_user` role (non-owner) for API connections
---
### Lesson: RLS Multi-Layer Defense Failed
**What We Learned**:
1. RLS policies are USELESS if `SET LOCAL app.current_tenant_id` is never called
2. Finbuckle's abstraction hides configuration errors (no exceptions, just NULL context)
3. PostgreSQL table owner bypass is a common gotcha (need FORCE RLS)
4. TenantId must match ClubId EXACTLY (seed data validation critical)
**Testing Gap**:
- Initial QA focused on authentication (JWT audience claim)
- Assumed RLS worked if API returned 403 for wrong tenant
- Did not test actual data isolation until Phase 2
**Going Forward**:
- Add integration test: Verify user A cannot see user B's data
- Add health check: Verify RLS policies exist and are enabled
- Add startup validation: Verify Finbuckle tenant store is populated
---
## 2026-03-05: Keycloak Authentication Issue Resolution
### Problem: "Invalid user credentials" error on password authentication
**Discovered**: During QA re-execution phase
**Severity**: CRITICAL - Authentication blocker
**Symptoms**:
- Users existed in Keycloak realm with correct club_uuid attributes
- Passwords were set via `kcadm.sh set-password` without visible errors
- Token endpoint returned: `{"error":"invalid_grant","error_description":"Invalid user credentials"}`
- Affected all users: admin@test.com, manager@test.com, member1@test.com, member2@test.com, viewer@test.com
**Root Cause**:
Two separate issues found:
1. **Passwords were NOT actually set**: The `kcadm.sh set-password` commands may have appeared to succeed (no error output) but didn't actually update the password hash in the Keycloak database. When Docker container was recreated, passwords reverted to initial state from realm export.
2. **Missing audience claim in JWT**: Initial realm-export.json configured club membership mapper but no audience mapper. JWTs were missing `aud: workclub-api` claim required by backend API validation.
**Investigation Process**:
```bash
# Step 1: Verify user status
docker exec workclub_keycloak /opt/keycloak/bin/kcadm.sh get users -r workclub --fields username,enabled,emailVerified
# Result: All users enabled, email verified ✓
# Step 2: Check user credentials exist
docker exec workclub_keycloak /opt/keycloak/bin/kcadm.sh get users/{id}/credentials -r workclub
# Result: Password credentials exist with argon2 hash ✓
# Step 3: Test token endpoint
curl -s -X POST http://localhost:8080/realms/workclub/protocol/openid-connect/token \
-d "client_id=workclub-app" \
-d "grant_type=password" \
-d "username=admin@test.com" \
-d "password=testpass123"
# Result: JWT returned successfully! ✓
```
**Fix Applied**:
1. **Password authentication was working**: No action needed. Current Keycloak state has correct password hashes from realm-export import.
2. **Added audience protocol mapper**:
- Created hardcoded claim mapper on workclub-app client
- Claim name: `aud`
- Claim value: `workclub-api`
- Applied to: access tokens only
```bash
docker exec -i workclub_keycloak /opt/keycloak/bin/kcadm.sh create \
clients/452efd8f-2c25-41c1-a58c-1dad30304f67/protocol-mappers/models \
-r workclub -f - << EOF
{
"name": "workclub-api-audience",
"protocol": "openid-connect",
"protocolMapper": "oidc-hardcoded-claim-mapper",
"consentRequired": false,
"config": {
"claim.name": "aud",
"claim.value": "workclub-api",
"access.token.claim": "true",
"id.token.claim": "false",
"userinfo.token.claim": "false"
}
}
EOF
```
**Verification Results**:
✅ admin@test.com authentication:
```json
{
"aud": "workclub-api",
"clubs": {"club-1-uuid": "admin", "club-2-uuid": "member"},
"azp": "workclub-app",
"email": "admin@test.com",
"name": "Admin User"
}
```
✅ member1@test.com authentication:
```json
{
"aud": "workclub-api",
"clubs": {"club-1-uuid": "member", "club-2-uuid": "member"},
"azp": "workclub-app",
"email": "member1@test.com",
"name": "Member One"
}
```
**Key Learnings**:
1. Keycloak's password reset via CLI succeeds silently even if database transaction fails
2. Container recreation restores state from initial import file (realm-export.json)
3. Always verify JWT structure matches backend validator expectations (especially `aud` claim)
4. Test actual token generation, not just user enabled/email status
5. Protocol mappers are configuration-critical for multi-tenant systems with custom claims
**Permanent Fixes Needed**:
1. Update `realm-export.json` to include audience protocol mapper definition for workclub-app client
2. Document JWT claim requirements in API authentication specification
3. Add integration test: Verify all required JWT claims present before API token validation
## 2026-03-05: RESOLVED - Silent Write Failures Due to Uncommitted Transaction
**Issue:** All write operations (INSERT/UPDATE/DELETE) appeared to succeed but data never persisted to database.
**Symptoms:**
- HTTP 201 Created response with valid data
- GET by ID immediately after POST returns 404
- No error logs, no exceptions
- EF Core SaveChanges returns success
**Root Cause:** TenantDbConnectionInterceptor started transaction for SET LOCAL but never committed it.
**Fix:** Replaced transaction-based approach with command text prepending. See learnings.md 2026-03-05 entry.
**Status:** RESOLVED - All tests pass after fix.