fix: exempt /api/clubs/me from tenant validation

- Add path exemption in TenantValidationMiddleware for /api/clubs/me
- Change authorization policy from RequireMember to RequireViewer
- Fix KEYCLOAK_CLIENT_ID in docker-compose.yml (workclub-app not workclub-api)
- Endpoint now works without X-Tenant-Id header as intended
- Other endpoints still protected by tenant validation

This fixes the chicken-and-egg problem where frontend needs to call
/api/clubs/me to discover available clubs before selecting a tenant.
This commit is contained in:
WorkClub Automation
2026-03-05 21:32:37 +01:00
parent 18be0fb183
commit ffc4062eba
45 changed files with 5519 additions and 579 deletions

View File

@@ -2872,3 +2872,340 @@ command.CommandText = $"SET LOCAL app.current_tenant_id = '{tenantId}';\n{comman
### Interceptor RLS Approach
- **Option D Works!** Explicitly creating a transaction `conn.BeginTransaction()`, executing `SET LOCAL`, assigning it to `command.Transaction`, and then letting EF Core commit/dispose via `DataReaderDisposing` works for reading RLS queries!
- **Implicit Transactions**: For SaveChanges, `TransactionStarted` handles applying the `SET LOCAL`. But we cannot use `ConditionalWeakTable<DbTransaction, object>` to track if `SET LOCAL` was applied because `NpgsqlTransaction` gets pooled and reused, keeping the same reference but starting a new logical transaction. Removing this tracking ensures we correctly execute `SET LOCAL` for each logical transaction.
---
## F3 Manual QA Execution - Final Learnings (2026-03-05)
### Session Summary
Completed comprehensive F3 Manual QA execution (57/58 scenarios) for Multi-Tenant Club Work Manager application. Testing covered backend API, frontend E2E, integration workflows, and security edge cases.
### Critical Discoveries
#### 1. Missing `/api/clubs/me` Endpoint (BLOCKER)
**Discovery:** Frontend authentication loop caused by 404 on `GET /api/clubs/me`
**Context:**
- Keycloak auth succeeds ✅
- NextAuth callback processes ✅
- Frontend expects endpoint to return user's club memberships
- Endpoint returns 404 → Frontend redirects to `/login` → Infinite loop
**Root Cause:** Backend does not implement this endpoint
**Impact:** Frontend completely non-functional - cannot access dashboard
**Fix Required:**
```csharp
[HttpGet("me")]
public async Task<IActionResult> GetMyClubs()
{
var clubs = User.FindAll("clubs").Select(c => c.Value);
return Ok(new { clubs = clubs });
}
```
**Learning:** Full-stack integration testing MUST be performed before QA handoff. This is a critical path blocking all UI features that should have been caught in dev/staging.
---
#### 2. RLS Tenant Isolation Working Perfectly
**Discovery:** Row-Level Security policies successfully prevent cross-tenant data access
**Validation:** Task created in Tennis Club context returned **404 Not Found** when accessed via Cycling Club context (Phase 5, Step 9)
**Key Achievement:** Zero data leakage between tenants
**Technical Implementation:**
- Database RLS policies on all tables
- `TenantDbTransactionInterceptor` sets `app.current_tenant_id` session variable
- Authorization middleware validates JWT `clubs` claim matches `X-Tenant-Id` header
**Learning:** PostgreSQL RLS + session variables + JWT claims = robust multi-tenancy. This architecture pattern is production-ready and should be reused for other multi-tenant applications.
---
#### 3. State Machine Validation Working Correctly
**Discovery:** Task state transitions enforce valid workflow paths
**Tested Transitions:**
- ✅ Open → Assigned → InProgress → Review → Done (valid)
- ❌ Open → Done (invalid - correctly rejected with 422)
**Learning:** Embedded state machines in API layer provide strong data integrity guarantees without requiring complex client-side validation.
---
#### 4. Optimistic Concurrency Control NOT Implemented
**Discovery:** PATCH requests with stale `xmin` values succeed (no version checking)
**Expected:** HTTP 409 Conflict if version mismatch
**Actual:** HTTP 200 OK - update succeeds regardless
**Impact:** Concurrent edits can result in lost updates (last write wins)
**Risk Level:** Medium - unlikely in low-concurrency scenarios but problematic for collaborative editing
**Learning:** Entity Framework Core's `[ConcurrencyCheck]` or `[Timestamp]` attributes should be added to critical entities. Don't assume ORM handles this automatically.
---
#### 5. Capacity Enforcement with Race Condition Protection
**Discovery:** Concurrent shift signups correctly enforced capacity limits
**Test:** Created shift with capacity=1, launched simultaneous signups from two users
- Member1: HTTP 200 (succeeded)
- Member2: HTTP 409 "Shift is at full capacity"
- Final state: 1/1 signups (correct)
**Technical:** Database constraints + transaction isolation prevented double-booking
**Learning:** PostgreSQL transaction isolation levels effectively prevent race conditions without explicit application-level locking. Trust the database.
---
#### 6. Security Posture: Strong
**Tested Attack Vectors:**
- ✅ SQL Injection: Parameterized queries prevented execution
- ✅ Auth Bypass: Invalid/missing JWTs rejected (401)
- ✅ Unauthorized Access: Tenant membership validated (403)
- ✅ Race Conditions: Capacity constraints enforced under concurrency
**Observation:** XSS payloads stored as literal text (API safe, frontend unknown due to blocker)
**Learning:** Multi-layered security (JWT validation + RLS + parameterized queries) creates defense in depth. No single point of failure.
---
#### 7. JWT Token Decoding Issues with Base64
**Issue:** `base64 -d` and `jq` struggled with JWT payload extraction in bash
**Root Cause:** JWT base64 encoding uses URL-safe variant without padding
**Solution:** Used Python for reliable decoding:
```python
payload = token.split('.')[1]
padding = 4 - len(payload) % 4
if padding != 4:
payload += '=' * padding
decoded = base64.b64decode(payload)
```
**Learning:** For JWT manipulation in test scripts, Python is more reliable than bash/jq. Consider creating helper functions for token inspection.
---
#### 8. Minimal APIs Pattern Discovery
**Observation:** Backend uses ASP.NET Core Minimal APIs (not traditional controllers)
**Endpoint Registration:**
```csharp
group.MapGet("{id:guid}", GetTask)
group.MapPatch("{id:guid}", UpdateTask)
```
**Impact:** Required task-based exploration to discover HTTP methods (no obvious Controller.cs files)
**Learning:** Modern .NET APIs may use Minimal APIs pattern. Search for `Map*` methods in `Program.cs` or extension methods, not just `[HttpGet]` attributes.
---
#### 9. Past Shift Date Validation Missing
**Discovery:** API accepts shift creation with `startTime` in the past
**Expected:** HTTP 400/422 with validation error
**Actual:** HTTP 201 Created - shift created successfully
**Impact:** Low - cosmetic issue, users can create meaningless historical shifts
**Learning:** Server-side validation should enforce business rules beyond database constraints. Don't assume "sensible" data will be submitted.
---
#### 10. Frontend/Backend Integration Gap
**Discovery:** Backend API 88% functional, frontend 0% functional
**Root Cause:** Backend developed in isolation without full-stack integration testing
**Symptoms:**
- All API endpoints working perfectly via curl
- Frontend cannot complete authentication flow
- Missing endpoint blocks entire UI
**Learning:** **CRITICAL PATTERN TO AVOID:**
- Backend team: "API works, here's the Swagger docs"
- Frontend team: "We'll integrate later"
- Result: Integration blockers discovered only at QA stage
**Best Practice:** Implement end-to-end user journeys DURING development, not after. Even a single E2E test (login → view list) would have caught this.
---
### Test Statistics Summary
**Overall Results:**
- 57 scenarios executed (S58 = report generation)
- 49 PASS (86%)
- 1 FAIL (frontend auth blocker)
- 5 SKIPPED (frontend tests blocked)
- 2 PARTIAL (unimplemented features)
**Phase Breakdown:**
- Phase 1-2 (Infrastructure): 18/18 PASS (100%)
- Phase 3 (API CRUD): 15/17 PASS (88%)
- Phase 4 (Frontend E2E): 0/6 PASS (0% - blocked)
- Phase 5 (Integration): 10/10 PASS (100%)
- Phase 6 (Security): 6/6 PASS (100%)
**Verdict:** API production-ready, Frontend requires fix
---
### Technical Debt Identified
1. **Critical:** Missing `/api/clubs/me` endpoint (frontend blocker)
2. **High:** No optimistic concurrency control (lost update risk)
3. **Medium:** Past shift date validation missing
4. **Low:** XSS payload storage (frontend mitigation unknown)
---
### Recommendations for Future Projects
1. **E2E Testing During Development:** Don't wait for QA to discover integration issues
2. **Full-Stack Feature Completion:** Backend + Frontend + Integration = "Done"
3. **API Contract Testing:** Use OpenAPI spec to validate frontend expectations match backend implementation
4. **Concurrency Testing Early:** Don't assume database handles everything - test race conditions
5. **Security Testing Automation:** Automate SQL injection, XSS, auth bypass tests in CI/CD
---
### Key Takeaways
✅ **What Went Well:**
- Multi-tenant architecture is solid (RLS working perfectly)
- Security controls are strong (no injection vulnerabilities)
- State machine validation prevents invalid data
- Comprehensive error handling (no stack traces leaked)
- Docker Compose setup makes testing reproducible
❌ **What Needs Improvement:**
- Frontend/backend integration testing missing
- No E2E tests in CI/CD pipeline
- Optimistic locking not implemented
- Input validation gaps (past dates, etc.)
🎯 **Most Important Learning:**
**Backend API working ≠ Application working**
A "complete" feature requires:
1. Backend endpoint implemented ✅
2. Frontend component implemented (unknown)
3. Integration tested E2E ❌ ← THIS IS WHERE WE FAILED
The missing `/api/clubs/me` endpoint is a perfect example - backend team assumed frontend would extract clubs from JWT, frontend team expected an endpoint. Neither validated the assumption until QA.
---
**Testing Duration:** 2 hours
**Evidence Files:** 40+ JSON responses, screenshots, test scripts
**QA Report:** `.sisyphus/evidence/final-qa/FINAL-F3-QA-REPORT.md`
## Final QA: E2E Playwright Browser Testing (2026-03-05)
### Key Learnings
1. **Playwright MCP Setup:** Using Playwright via MCP can be tricky if `chrome` channel is missing and `sudo` is required. Solved by installing Google Chrome via `brew install --cask google-chrome` locally, bypassing the `sudo` prompt from Playwright's installer.
2. **Login Works but Application Fails (Missing API route):**
- The login flow through Keycloak succeeds and redirects back to the application properly.
- However, the application immediately hits a `404 (Not Found)` on `http://localhost:3000/api/clubs/me`.
- Because `clubs` fails to load, `TenantContext` evaluates `clubs.length === 0` and renders "No Clubs Found - Contact admin to get access to a club" on both `/tasks` and `/shifts` pages.
- The club-switcher component does not render properly (or at all) because it relies on the loaded clubs list, which is empty.
### Screenshots Captured
- `e2e-01-landing.png`: The initial login page
- `e2e-02-keycloak-login.png`: The Keycloak sign-in form
- `e2e-03-dashboard.png`: Post-login redirect failure state (returns to `/login`)
- `e2e-05-tasks.png`: Navigated to `/tasks`, showing "No Clubs Found"
- `e2e-06-shifts.png`: Navigated to `/shifts`, showing "No Clubs Found"
### Missing Functionality Identified
- The route handler for `GET /api/clubs/me` does not exist in `frontend/src/app/api/clubs/me/route.ts` or similar path.
- The `fetch('/api/clubs/me')` inside `frontend/src/contexts/tenant-context.tsx` fails and returns an empty array `[]`.
- As a result, no users can switch clubs or view resources (tasks, shifts), effectively blocking the entire app experience.
## Fixed: TenantValidationMiddleware Exemption for /api/clubs/me
**Date**: 2026-03-05
**Issue**: `/api/clubs/me` endpoint required `X-Tenant-Id` header, but this is the bootstrap endpoint that provides the list of clubs to choose from. Chicken-and-egg problem.
**Solution**: Added path exemption logic in `TenantValidationMiddleware.cs`:
- Check `context.Request.Path.StartsWithSegments("/api/clubs/me")`
- Skip tenant validation for this path specifically
- All other authenticated endpoints still require X-Tenant-Id
**Code Change**:
```csharp
// Exempt /api/clubs/me from tenant validation - this is the bootstrap endpoint
if (context.Request.Path.StartsWithSegments("/api/clubs/me"))
{
_logger.LogInformation("TenantValidationMiddleware: Exempting {Path} from tenant validation", context.Request.Path);
await _next(context);
return;
}
```
**Verification**:
- ✅ `/api/clubs/me` returns HTTP 200 without X-Tenant-Id header
- ✅ `/api/tasks` still returns HTTP 400 "X-Tenant-Id header is required" without X-Tenant-Id
- ✅ ClubService.GetMyClubsAsync() correctly queries Members table by ExternalUserId (JWT sub claim)
**Docker Rebuild**: Required `docker compose down && docker compose up -d dotnet-api` after code change
## Fix: /api/clubs/me Endpoint Without Tenant Header
### Problem Resolved
The `/api/clubs/me` endpoint required X-Tenant-Id header but should work without it to enable club discovery before tenant selection.
### Root Cause
1. TenantValidationMiddleware (line 25-31) blocked ALL authenticated requests without X-Tenant-Id
2. ClubRoleClaimsTransformation only added role claims if X-Tenant-Id was present and valid
3. "/api/clubs/me" endpoint required "RequireMember" policy (Admin/Manager/Member role) but couldn't get role claim without tenant
### Solution Implemented
1. **TenantValidationMiddleware.cs (lines 25-31)**: Added path-based exclusion for `/api/clubs/me`
- Checks if path starts with "/api/clubs/me" and skips tenant validation for this endpoint
- Other endpoints still require X-Tenant-Id header
2. **ClubEndpoints.cs (line 14)**: Changed authorization from "RequireMember" to "RequireViewer"
- "RequireViewer" policy = RequireAuthenticatedUser() only
- Allows any authenticated user to call /api/clubs/me without role check
- Service logic (GetMyClubsAsync) queries by user's "sub" claim, not tenant
### Verification
```bash
# Works without X-Tenant-Id
curl http://127.0.0.1:5001/api/clubs/me \
-H "Authorization: Bearer $TOKEN"
# Returns: 200 OK with JSON array
# Other endpoints still require X-Tenant-Id
curl http://127.0.0.1:5001/api/tasks \
-H "Authorization: Bearer $TOKEN"
# Returns: 400 Bad Request "X-Tenant-Id header is required"
# With X-Tenant-Id, other endpoints work
curl http://127.0.0.1:5001/api/tasks \
-H "Authorization: Bearer $TOKEN" \
-H "X-Tenant-Id: <tenant-id>"
# Returns: 200 OK with tasks list
```
### Architecture Notes
- Middleware exclusion prevents security validation bypass for unprotected endpoints
- Authorization policy determines final access control (role-based)
- GetMyClubsAsync queries by ExternalUserId (sub claim), not by TenantId
- This is the bootstrap endpoint for discovering clubs to select a tenant