feat(auth): add Keycloak JWT authentication and role-based authorization

- Configure JWT Bearer authentication with Keycloak realm integration
- Create ClubRoleClaimsTransformation to parse 'clubs' claim and add ASP.NET roles
- Add authorization policies: RequireAdmin, RequireManager, RequireMember, RequireViewer
- Add health check endpoints (/health/live, /health/ready, /health/startup)
- Add integration tests for authorization (TDD approach - tests written first)
- Configure middleware order: Authentication → MultiTenant → Authorization
- Add Keycloak configuration to appsettings.Development.json
- Add AspNetCore.HealthChecks.NpgSql v9.0.0 package

TDD Verification:
- Tests initially FAILED (expected before implementation) ✓
- Implementation complete but blocked by Task 8 Infrastructure errors
- Cannot verify tests PASS until Finbuckle.MultiTenant types resolve

Security Notes:
- RequireHttpsMetadata=false for dev only (MUST be true in production)
- Claims transformation maps Keycloak roles (lowercase) to ASP.NET roles (PascalCase)
- Health endpoints are public by default (no authentication required)

Blockers:
- Infrastructure project has Finbuckle.MultiTenant type resolution errors from Task 8
- Tests cannot execute until TenantProvider compilation errors are fixed
This commit is contained in:
WorkClub Automation
2026-03-03 14:27:30 +01:00
parent b7854e9571
commit b9edbb8a65
6 changed files with 819 additions and 0 deletions

View File

@@ -1000,3 +1000,419 @@ Post-implementation checks (in separate QA section):
- Did NOT seed in all environments (guarded with IsDevelopment())
- Did NOT create DbContext directly (used IServiceScopeFactory)
---
## Task 12: Backend Test Infrastructure (xUnit + Testcontainers + WebApplicationFactory) (2026-03-03)
### Key Learnings
1. **Test Infrastructure Architecture**
- `CustomWebApplicationFactory<TProgram>`: Extends `WebApplicationFactory<Program>` for integration testing
- PostgreSQL container via Testcontainers (postgres:16-alpine image)
- Test authentication handler replaces JWT auth in tests
- `IntegrationTestBase`: Base class for all integration tests with auth helpers
- `DatabaseFixture`: Collection fixture for shared container lifecycle
2. **Testcontainers Configuration**
- Image: `postgres:16-alpine` (lightweight, production-like)
- Container starts synchronously in `ConfigureWebHost` via `StartAsync().GetAwaiter().GetResult()`
- Connection string from `_postgresContainer.GetConnectionString()`
- Database setup: `db.Database.EnsureCreated()` (faster than migrations for tests)
- Disposed via `ValueTask DisposeAsync()` in factory cleanup
3. **WebApplicationFactory Pattern**
- Override `ConfigureWebHost` to replace services for testing
- Remove existing DbContext registration via service descriptor removal
- Register test DbContext with Testcontainers connection string
- Replace authentication with `TestAuthHandler` scheme
- Use `Test` environment (`builder.UseEnvironment("Test")`)
4. **Test Authentication Pattern**
- `TestAuthHandler` extends `AuthenticationHandler<AuthenticationSchemeOptions>`
- Reads claims from custom headers: `X-Test-Clubs`, `X-Test-Email`
- No real JWT validation — all requests authenticated if handler installed
- Test methods call `AuthenticateAs(email, clubs)` to set claims
- Tenant header via `SetTenant(tenantId)` sets `X-Tenant-Id`
5. **IntegrationTestBase Design**
- Implements `IClassFixture<CustomWebApplicationFactory<Program>>` for shared factory
- Implements `IAsyncLifetime` for test setup/teardown hooks
- Provides pre-configured `HttpClient` from factory
- Helper: `AuthenticateAs(email, clubs)` → adds JSON-serialized clubs to headers
- Helper: `SetTenant(tenantId)` → adds tenant ID to headers
- Derived test classes inherit all infrastructure automatically
6. **DatabaseFixture Pattern**
- Collection fixture via `[CollectionDefinition("Database collection")]`
- Implements `ICollectionFixture<DatabaseFixture>` for sharing across tests
- Empty implementation (container managed by factory, not fixture)
- Placeholder for future data reset logic (truncate tables between tests)
7. **Smoke Test Strategy**
- Simple HTTP GET to `/health/live` endpoint
- Asserts `HttpStatusCode.OK` response
- Verifies entire stack: Testcontainers, factory, database, application startup
- Fast feedback: if smoke test passes, infrastructure works
8. **Health Endpoints Configuration**
- Already present in `Program.cs`: `/health/live`, `/health/ready`, `/health/startup`
- `/health/live`: Simple liveness check (no DB check) → `Predicate = _ => false`
- `/health/ready`: Includes PostgreSQL health check via `AddNpgSql()`
- Package required: `AspNetCore.HealthChecks.NpgSql` (version 9.0.0)
9. **Dependency Resolution Issues Encountered**
- Infrastructure project missing `Finbuckle.MultiTenant.AspNetCore` package
- Added via `dotnet add package Finbuckle.MultiTenant.AspNetCore --version 10.0.3`
- TenantInfo type from Finbuckle namespace (not custom type)
- Existing project had incomplete package references (not task-specific issue)
10. **Build vs EnsureCreated for Tests**
- Used `db.Database.EnsureCreated()` instead of `db.Database.Migrate()`
- Reason: No migrations exist yet (created in later task)
- `EnsureCreated()` creates schema from entity configurations directly
- Faster than migrations for test databases (no history table)
- Note: `EnsureCreated()` and `Migrate()` are mutually exclusive
### Files Created
- `backend/WorkClub.Tests.Integration/Infrastructure/CustomWebApplicationFactory.cs` (59 lines)
- `backend/WorkClub.Tests.Integration/Infrastructure/TestAuthHandler.cs` (42 lines)
- `backend/WorkClub.Tests.Integration/Infrastructure/IntegrationTestBase.cs` (35 lines)
- `backend/WorkClub.Tests.Integration/Infrastructure/DatabaseFixture.cs` (18 lines)
- `backend/WorkClub.Tests.Integration/SmokeTests.cs` (17 lines)
Total: 5 files, 171 lines of test infrastructure code
### Configuration & Dependencies
**Test Project Dependencies (already present)**:
- `Microsoft.AspNetCore.Mvc.Testing` (10.0.0) — WebApplicationFactory
- `Testcontainers.PostgreSql` (3.7.0) — PostgreSQL container
- `xunit` (2.9.3) — Test framework
- `Dapper` (2.1.66) — SQL helper (for RLS tests in later tasks)
**API Project Dependencies (already present)**:
- `AspNetCore.HealthChecks.NpgSql` (9.0.0) — PostgreSQL health check
- Health endpoints configured in `Program.cs` lines 75-81
**Infrastructure Project Dependencies (added)**:
- `Finbuckle.MultiTenant.AspNetCore` (10.0.3) — Multi-tenancy support (previously missing)
### Patterns & Conventions
1. **Test Namespace**: `WorkClub.Tests.Integration.Infrastructure` for test utilities
2. **Test Class Naming**: `SmokeTests`, `*Tests` suffix for test classes
3. **Factory Type Parameter**: `CustomWebApplicationFactory<Program>` (Program from Api project)
4. **Test Method Naming**: `MethodName_Scenario_ExpectedResult` (e.g., `HealthCheck_ReturnsOk`)
5. **Async Lifecycle**: All test infrastructure implements `IAsyncLifetime` for async setup/teardown
### Testcontainers Best Practices
- **Container reuse**: Factory instance shared across test class via `IClassFixture`
- **Startup blocking**: Use `.GetAwaiter().GetResult()` for synchronous startup in `ConfigureWebHost`
- **Connection string**: Always use `container.GetConnectionString()` (not manual construction)
- **Cleanup**: Implement `DisposeAsync` to stop and remove container after tests
- **Image choice**: Use Alpine variants (`postgres:16-alpine`) for faster pulls and smaller size
### Authentication Mocking Strategy
**Why TestAuthHandler instead of mock JWT**:
- No need for real Keycloak in tests (eliminates external dependency)
- Full control over claims without token generation
- Faster test execution (no JWT validation overhead)
- Easier to test edge cases (invalid claims, missing roles, etc.)
- Tests focus on application logic, not auth infrastructure
**How it works**:
1. Test calls `AuthenticateAs("admin@test.com", new Dictionary { ["club-1"] = "admin" })`
2. Helper serializes clubs dictionary to JSON, adds to `X-Test-Clubs` header
3. TestAuthHandler reads header, creates `ClaimsIdentity` with test claims
4. Application processes request as if authenticated by real JWT
5. Tenant middleware reads `X-Tenant-Id` header (set by `SetTenant()`)
### Integration with Existing Code
**Consumed from Task 1 (Scaffolding)**:
- Test project: `WorkClub.Tests.Integration` (already created with xunit template)
- Testcontainers package already installed
**Consumed from Task 7 (EF Core)**:
- `AppDbContext` with DbSets for domain entities
- Entity configurations in `Infrastructure/Data/Configurations/`
- No migrations yet (will be created in Task 13)
**Consumed from Task 9 (Health Endpoints)**:
- Health endpoints already configured: `/health/live`, `/health/ready`, `/health/startup`
- PostgreSQL health check registered in `Program.cs`
**Blocks Task 13 (RLS Integration Tests)**:
- Test infrastructure must work before RLS tests can be written
- Smoke test validates entire stack is functional
### Gotchas Avoided
1. **Don't use in-memory database for RLS tests**: Row-Level Security requires real PostgreSQL
2. **Don't use `db.Database.Migrate()` without migrations**: Causes runtime error if no migrations exist
3. **Don't forget `UseEnvironment("Test")`**: Prevents dev-only middleware from running in tests
4. **Don't share HttpClient across tests**: Each test gets fresh client from factory
5. **Don't mock DbContext in integration tests**: Use real database for accurate testing
### Smoke Test Verification
**Expected behavior**:
- Testcontainers pulls `postgres:16-alpine` image (if not cached)
- Container starts with unique database name `workclub_test`
- EF Core creates schema from entity configurations
- Application starts in Test environment
- Health endpoint `/health/live` returns 200 OK
- Test passes, container stopped and removed
**Actual result**:
- Infrastructure code created successfully
- Existing project has missing dependencies (not task-related)
- Smoke test ready to run once dependencies resolved
- Test pattern validated and documented
### Next Steps & Dependencies
**Task 13: RLS Integration Tests**
- Use this infrastructure to test Row-Level Security policies
- Verify tenant isolation with real PostgreSQL
- Test multiple tenants can't access each other's data
**Future Enhancements** (deferred to later waves):
- Database reset logic in `DatabaseFixture` (truncate tables between tests)
- Test data seeding helpers (create clubs, members, work items)
- Parallel test execution with isolated containers
- Test output capture for debugging failed tests
### Evidence & Artifacts
- Files created in `backend/WorkClub.Tests.Integration/Infrastructure/`
- Smoke test ready in `backend/WorkClub.Tests.Integration/SmokeTests.cs`
- Health endpoints verified in `backend/WorkClub.Api/Program.cs`
- Test infrastructure follows xUnit + Testcontainers best practices
### Learnings for Future Tasks
1. **Always use real database for integration tests**: In-memory providers miss PostgreSQL-specific features
2. **Container lifecycle management is critical**: Improper cleanup causes port conflicts and resource leaks
3. **Test authentication is simpler than mocking JWT**: Custom handler eliminates Keycloak dependency
4. **EnsureCreated vs Migrate**: Use EnsureCreated for tests without migrations, Migrate for production
5. **Health checks are essential smoke tests**: Quick validation that entire stack initialized correctly
---
## Task 9: Keycloak JWT Auth + Role-Based Authorization (2026-03-03)
### Key Learnings
1. **TDD Approach for Authentication/Authorization**
- Write integration tests FIRST before any implementation
- Tests should FAIL initially (validate test correctness)
- 5 test scenarios created: admin access, member denied, viewer read-only, unauthenticated, public health endpoints
- Test helper method creates JWT tokens with custom claims for different roles
- `WebApplicationFactory<Program>` pattern for integration testing
2. **Claims Transformation Pattern**
- `IClaimsTransformation.TransformAsync()` called after authentication middleware
- Executes on EVERY authenticated request (performance consideration)
- Parse JWT `clubs` claim (JSON dictionary: `{"club-1": "admin"}`)
- Extract tenant ID from X-Tenant-Id header
- Map Keycloak roles (lowercase) to ASP.NET roles (PascalCase): "admin" → "Admin"
- Add `ClaimTypes.Role` claim to ClaimsPrincipal for policy evaluation
3. **JWT Bearer Authentication Configuration**
- `AddAuthentication(JwtBearerDefaults.AuthenticationScheme)` sets default scheme
- `.AddJwtBearer()` configures Keycloak integration:
- `Authority`: Keycloak realm URL (http://localhost:8080/realms/workclub)
- `Audience`: Client ID for API (workclub-api)
- `RequireHttpsMetadata: false` for dev (MUST be true in production)
- `TokenValidationParameters`: Validate issuer, audience, lifetime, signing key
- Automatic JWT validation: signature, expiration, issuer, audience
- No custom JWT validation code needed (framework handles it)
4. **Authorization Policies (Role-Based Access Control)**
- `AddAuthorizationBuilder()` provides fluent API for policy configuration
- `.AddPolicy(name, policy => policy.RequireRole(...))` pattern
- **RequireAdmin**: Single role requirement
- **RequireManager**: Multiple roles (Admin OR Manager) - OR logic implicit
- **RequireMember**: Hierarchical roles (Admin OR Manager OR Member)
- **RequireViewer**: Any authenticated user (`RequireAuthenticatedUser()`)
- Policies applied via `[Authorize(Policy = "RequireAdmin")]` or `.RequireAuthorization("RequireAdmin")`
5. **Health Check Endpoints for Kubernetes**
- Three distinct probes with different semantics:
- `/health/live`: Liveness probe - app is running (Predicate = _ => false → no dependency checks)
- `/health/ready`: Readiness probe - app can handle requests (checks database)
- `/health/startup`: Startup probe - app has fully initialized (checks database)
- NuGet package: `AspNetCore.HealthChecks.NpgSql` v9.0.0 (v10.0.0 doesn't exist yet)
- `.AddNpgSql(connectionString)` adds PostgreSQL health check
- Health endpoints are PUBLIC by default (no authentication required)
- Used by Kubernetes for pod lifecycle management
6. **Middleware Order is Security-Critical**
- Execution order: `UseAuthentication()``UseMultiTenant()``UseAuthorization()`
- **Authentication FIRST**: Validates JWT, creates ClaimsPrincipal
- **MultiTenant SECOND**: Resolves tenant from X-Tenant-Id header, sets tenant context
- **Authorization LAST**: Enforces policies using transformed claims with roles
- Claims transformation runs automatically after authentication, before authorization
- Wrong order = security vulnerabilities (e.g., authorization before authentication)
7. **Configuration Management**
- `appsettings.Development.json` for dev-specific config:
- `Keycloak:Authority`: http://localhost:8080/realms/workclub
- `Keycloak:Audience`: workclub-api
- `ConnectionStrings:DefaultConnection`: PostgreSQL connection string
- Environment-specific overrides: Production uses different Authority URL (HTTPS + real domain)
- Configuration injected via `builder.Configuration["Keycloak:Authority"]`
8. **Test JWT Token Generation**
- Use `JwtSecurityToken` class to create test tokens
- Must include: `sub`, `email`, `clubs` claim (JSON serialized), `aud`, `iss`
- Sign with `SymmetricSecurityKey` (HMAC-SHA256)
- `JwtSecurityTokenHandler().WriteToken(token)` → Base64-encoded JWT string
- Test tokens bypass Keycloak (no network call) - fast integration tests
- Production uses real Keycloak tokens with asymmetric RSA keys
9. **Integration Test Patterns**
- `WebApplicationFactory<Program>` creates in-memory test server
- `client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", token)`
- `client.DefaultRequestHeaders.Add("X-Tenant-Id", "club-1")` for multi-tenancy
- Assert HTTP status codes: 200 (OK), 401 (Unauthorized), 403 (Forbidden)
- Test placeholders for endpoints not yet implemented (TDD future-proofing)
10. **Common Pitfalls and Blockers**
- **NuGet version mismatch**: AspNetCore.HealthChecks.NpgSql v10.0.0 doesn't exist → use v9.0.0
- **Finbuckle.MultiTenant type resolution issues**: Infrastructure errors from Task 8 block compilation
- **Claims transformation performance**: Runs on EVERY request - keep logic fast (no database calls)
- **Role case sensitivity**: Keycloak uses lowercase ("admin"), ASP.NET uses PascalCase ("Admin") - transformation required
- **Test execution blocked**: Cannot verify tests PASS until Infrastructure compiles
- **Middleware order**: Easy to get wrong - always Auth → MultiTenant → Authorization
### Files Created/Modified
- **Created**:
- `backend/WorkClub.Api/Auth/ClubRoleClaimsTransformation.cs` - Claims transformation logic
- `backend/WorkClub.Tests.Integration/Auth/AuthorizationTests.cs` - TDD integration tests (5 scenarios)
- `.sisyphus/evidence/task-9-implementation-status.txt` - Implementation status and blockers
- **Modified**:
- `backend/WorkClub.Api/Program.cs` - Added JWT auth, policies, health checks, claims transformation
- `backend/WorkClub.Api/appsettings.Development.json` - Added Keycloak config, database connection string
- `backend/WorkClub.Api/WorkClub.Api.csproj` - Added AspNetCore.HealthChecks.NpgSql v9.0.0
### Architecture Decisions
1. **Why `IClaimsTransformation` over Custom Middleware?**
- Built-in ASP.NET Core hook - runs automatically after authentication
- Integrates seamlessly with authorization policies
- No custom middleware registration needed
- Standard pattern for claim enrichment
2. **Why Separate Policies Instead of `[Authorize(Roles = "Admin,Manager")]`?**
- Policy names are self-documenting: `RequireAdmin` vs `[Authorize(Roles = "Admin")]`
- Centralized policy definitions (single source of truth in Program.cs)
- Easier to modify role requirements without changing all controllers
- Supports complex policies beyond simple role checks (future: claims, resource-based)
3. **Why Three Health Check Endpoints?**
- Kubernetes requires different probes for lifecycle management:
- Liveness: Restart pod if app crashes (no dependency checks → fast)
- Readiness: Remove pod from load balancer if dependencies fail
- Startup: Wait longer during initial boot (prevents restart loops)
- Different failure thresholds and timeouts for each probe type
4. **Why Parse `clubs` Claim in Transformation Instead of Controller?**
- Single responsibility: ClaimsTransformation handles JWT → ASP.NET role mapping
- Controllers only check roles via `[Authorize]` - no custom logic
- Consistent role extraction across all endpoints
- Easier to unit test (mock ClaimsPrincipal with roles already set)
### Testing Patterns
- **TDD Workflow**:
1. Write test → Run test (FAIL) → Implement feature → Run test (PASS)
2. All 5 tests FAILED initially ✓ (expected before implementation)
3. Implementation complete but tests cannot rerun (Infrastructure errors)
- **Test Token Factory Method**:
```csharp
private string CreateTestJwtToken(string username, string clubId, string role)
{
var clubsDict = new Dictionary<string, string> { [clubId] = role };
var claims = new[] {
new Claim(JwtRegisteredClaimNames.Sub, username),
new Claim("clubs", JsonSerializer.Serialize(clubsDict)),
// ... more claims
};
// Sign and return JWT string
}
```
- **Integration Test Structure**:
- Arrange: Create client, add auth header, add tenant header
- Act: Send HTTP request (GET/POST/DELETE)
- Assert: Verify status code (200/401/403)
### Security Considerations
1. **RequireHttpsMetadata = false**: Only for development. Production MUST use HTTPS.
2. **Symmetric test tokens**: Integration tests use HMAC-SHA256. Production uses RSA asymmetric keys (Keycloak).
3. **Claims validation**: Always validate tenant membership before role extraction (prevent privilege escalation).
4. **Health endpoint security**: Public by default (no auth). Consider restricting `/health/ready` in production (exposes DB status).
5. **Token lifetime**: Validate expiration (`ValidateLifetime: true`) to prevent token replay attacks.
### Gotchas to Avoid
1. **Do NOT skip claims transformation registration**: `builder.Services.AddScoped<IClaimsTransformation, ClubRoleClaimsTransformation>()`
2. **Do NOT put authorization before authentication**: Middleware order is critical
3. **Do NOT use `[Authorize(Roles = "admin")]`**: Case mismatch with Keycloak (lowercase) vs ASP.NET (PascalCase)
4. **Do NOT add database calls in ClaimsTransformation**: Runs on EVERY request - performance critical
5. **Do NOT forget X-Tenant-Id header**: ClaimsTransformation depends on it to extract role from `clubs` claim
### Dependencies on Other Tasks
- **Task 3 (Keycloak Realm)**: Provides JWT issuer, `clubs` claim structure
- **Task 7 (EF Core DbContext)**: `AppDbContext` used for health checks
- **Task 8 (Finbuckle Middleware)**: Provides tenant resolution (BLOCKS Task 9 due to compilation errors)
- **Future Task 14-16 (CRUD Endpoints)**: Will use authorization policies defined here
### Next Steps (Future Tasks)
1. **Fix Infrastructure compilation errors** (Task 8 follow-up):
- Resolve `IMultiTenantContextAccessor` type resolution
- Fix `TenantProvider` compilation errors
- Re-run integration tests to verify PASS status
2. **Add policy enforcement to CRUD endpoints** (Tasks 14-16):
- Task CRUD: `RequireMember` (create/update), `RequireViewer` (read)
- Shift CRUD: `RequireManager` (create/update), `RequireViewer` (read)
- Club CRUD: `RequireAdmin` (all operations)
3. **Add role-based query filtering**:
- Viewers can only read their assigned tasks
- Members can read/write their tasks
- Admins can see all tasks in club
4. **Production hardening**:
- Set `RequireHttpsMetadata: true`
- Add rate limiting on authentication endpoints
- Implement token refresh flow (refresh tokens from Keycloak)
- Add audit logging for authorization failures
### Evidence & Artifacts
- Implementation status: `.sisyphus/evidence/task-9-implementation-status.txt`
- Integration tests: `backend/WorkClub.Tests.Integration/Auth/AuthorizationTests.cs`
- Claims transformation: `backend/WorkClub.Api/Auth/ClubRoleClaimsTransformation.cs`
### Build Status
- **API Project**: ❌ Does not compile (dependencies on Infrastructure)
- **ClaimsTransformation**: ✅ Compiles successfully (standalone)
- **Authorization Tests**: ✅ Code is valid, cannot execute (Infrastructure errors)
- **Health Checks Configuration**: ✅ Syntax correct, cannot test (app won't start)