chore(evidence): add QA evidence and notepads from debugging sessions
Add comprehensive QA evidence including manual testing reports, RLS isolation tests, API CRUD verification, JWT decoded claims, and auth evidence files. Include updated notepads with decisions, issues, and learnings from full-stack debugging sessions. Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
This commit is contained in:
@@ -2144,3 +2144,477 @@ echo $TOKEN | cut -d'.' -f2 | base64 -d | jq '.clubs'
|
||||
- Status: ✅ RESOLVED
|
||||
- Impact: Unblocks 46 remaining QA scenarios
|
||||
- Date: 2026-03-05
|
||||
|
||||
## 2026-03-05: QA Session Learnings
|
||||
|
||||
### Finbuckle Multi-Tenancy Gotchas
|
||||
|
||||
**Lesson 1: InMemoryStore Requires Explicit Registration**
|
||||
```csharp
|
||||
// WRONG (silently fails - no exception, just NULL context):
|
||||
.WithInMemoryStore(options => {
|
||||
options.IsCaseSensitive = false;
|
||||
});
|
||||
|
||||
// CORRECT:
|
||||
.WithInMemoryStore(options => {
|
||||
options.Tenants = new List<TenantInfo> {
|
||||
new() { Id = "uuid", Identifier = "uuid", Name = "Club Name" }
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
**Why This Matters**:
|
||||
- Finbuckle reads `X-Tenant-Id` header correctly
|
||||
- Looks up tenant in store
|
||||
- Returns NULL if not found (no 404, no exception)
|
||||
- `IMultiTenantContextAccessor.MultiTenantContext` is NULL
|
||||
- Downstream code (like RLS interceptor) silently degrades
|
||||
|
||||
**Detection**:
|
||||
- Log warnings: "No tenant context available"
|
||||
- API works but returns wrong data (or no data with RLS)
|
||||
- Hard to debug because no errors thrown
|
||||
|
||||
---
|
||||
|
||||
### PostgreSQL RLS Enforcement Levels
|
||||
|
||||
**Level 1: RLS Enabled (Not Enough for Owner)**
|
||||
```sql
|
||||
ALTER TABLE work_items ENABLE ROW LEVEL SECURITY;
|
||||
CREATE POLICY tenant_isolation ON work_items USING (TenantId = current_setting('app.current_tenant_id', true)::text);
|
||||
```
|
||||
- Table owner (`workclub` user) **bypasses RLS**
|
||||
- Other users respect policies
|
||||
|
||||
**Level 2: FORCE RLS (Required for API)**
|
||||
```sql
|
||||
ALTER TABLE work_items FORCE ROW LEVEL SECURITY;
|
||||
```
|
||||
- Table owner **subject to RLS**
|
||||
- All users respect policies
|
||||
|
||||
**Why This Matters**:
|
||||
- ASP.NET Core connection string uses table owner for connection pooling
|
||||
- Without FORCE, RLS is decorative (no actual enforcement)
|
||||
|
||||
**Detection**:
|
||||
- Direct SQL: `SELECT usesuper, usebypassrls FROM pg_user WHERE usename = 'workclub';`
|
||||
- Both should be `f` (false)
|
||||
- Query: `SELECT relrowsecurity, relforcerowsecurity FROM pg_class WHERE relname = 'work_items';`
|
||||
- `relforcerowsecurity` must be `t` (true)
|
||||
|
||||
---
|
||||
|
||||
### RLS Tenant Context Propagation
|
||||
|
||||
**Critical Path**:
|
||||
1. HTTP Request arrives with `X-Tenant-Id` header
|
||||
2. Finbuckle middleware resolves tenant from store
|
||||
3. Sets `IMultiTenantContextAccessor.MultiTenantContext`
|
||||
4. EF Core opens database connection
|
||||
5. `TenantDbConnectionInterceptor.ConnectionOpened()` fires
|
||||
6. Reads `_tenantAccessor.MultiTenantContext?.TenantInfo?.Identifier`
|
||||
7. Executes `SET LOCAL app.current_tenant_id = '{tenantId}'`
|
||||
8. All queries in transaction respect RLS policies
|
||||
|
||||
**Break at any step → RLS ineffective**
|
||||
|
||||
**Common Failure Points**:
|
||||
- Step 2: Tenant not in Finbuckle store (NULL context)
|
||||
- Step 7: SQL injection risk (use parameterized queries or sanitize)
|
||||
- Connection pooling: Ensure `SET LOCAL` (transaction-scoped, not session-scoped)
|
||||
|
||||
---
|
||||
|
||||
### TenantId vs ClubId Alignment
|
||||
|
||||
**Schema Design**:
|
||||
```sql
|
||||
CREATE TABLE work_items (
|
||||
"Id" uuid PRIMARY KEY,
|
||||
"TenantId" varchar(200) NOT NULL, -- For RLS filtering
|
||||
"ClubId" uuid NOT NULL, -- For business logic
|
||||
...
|
||||
);
|
||||
```
|
||||
|
||||
**Golden Rule**: `TenantId MUST equal ClubId` (as string)
|
||||
|
||||
**Why Two Columns?**
|
||||
- Finbuckle uses `TenantId` (string, supports non-UUID identifiers)
|
||||
- Domain model uses `ClubId` (uuid, foreign key to clubs table)
|
||||
- RLS policies filter on `TenantId`
|
||||
|
||||
**Validation**:
|
||||
```sql
|
||||
-- Check for mismatches:
|
||||
SELECT "Id", "TenantId", "ClubId"
|
||||
FROM work_items
|
||||
WHERE "TenantId" != "ClubId"::text;
|
||||
|
||||
-- Should return 0 rows
|
||||
```
|
||||
|
||||
**Seed Data Best Practice**:
|
||||
```csharp
|
||||
// WRONG:
|
||||
new WorkItem {
|
||||
TenantId = Guid.NewGuid().ToString(), // Random UUID
|
||||
ClubId = clubId // Different UUID
|
||||
};
|
||||
|
||||
// CORRECT:
|
||||
new WorkItem {
|
||||
TenantId = clubId.ToString(), // Same as ClubId
|
||||
ClubId = clubId
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### QA Test Strategy for Multi-Tenancy
|
||||
|
||||
**Test Pyramid**:
|
||||
|
||||
1. **Unit Tests** (TenantDbConnectionInterceptor)
|
||||
- Mock `IMultiTenantContextAccessor` with valid/NULL tenant
|
||||
- Verify `SET LOCAL` command generated
|
||||
- Verify no SQL injection with malicious tenant IDs
|
||||
|
||||
2. **Integration Tests** (RLS Isolation)
|
||||
- Seed 2+ clubs with distinct data
|
||||
- Query as Club A → Verify only Club A data returned
|
||||
- Query as Club B → Verify Club A data NOT visible
|
||||
- Query without tenant context → Verify 0 rows (or exception)
|
||||
|
||||
3. **E2E Tests** (API Layer)
|
||||
- Login as user in Club A
|
||||
- Request `/api/tasks` with `X-Tenant-Id` for Club A → Expect Club A tasks
|
||||
- Request `/api/tasks` with `X-Tenant-Id` for Club B → Expect 403 Forbidden
|
||||
- Request without `X-Tenant-Id` → Expect 400 Bad Request
|
||||
|
||||
4. **Security Tests** (Penetration)
|
||||
- SQL injection in `X-Tenant-Id` header
|
||||
- UUID guessing attacks (valid UUID format, not user's club)
|
||||
- JWT tampering (change `clubs` claim)
|
||||
- Concurrent requests (connection pooling state leak)
|
||||
|
||||
**Critical Assertion**:
|
||||
```csharp
|
||||
// In RLS integration test:
|
||||
var club1Tasks = await GetTasks(club1TenantId);
|
||||
var club2Tasks = await GetTasks(club2TenantId);
|
||||
|
||||
Assert.Empty(club1Tasks.Intersect(club2Tasks)); // NO OVERLAP
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Debugging RLS Issues
|
||||
|
||||
**Step 1: Verify Policies Exist**
|
||||
```sql
|
||||
SELECT tablename, policyname, permissive, roles, qual
|
||||
FROM pg_policies
|
||||
WHERE tablename = 'work_items';
|
||||
```
|
||||
|
||||
**Step 2: Verify FORCE RLS Enabled**
|
||||
```sql
|
||||
SELECT relname, relrowsecurity, relforcerowsecurity
|
||||
FROM pg_class
|
||||
WHERE relname = 'work_items';
|
||||
```
|
||||
|
||||
**Step 3: Test Manually**
|
||||
```sql
|
||||
BEGIN;
|
||||
SET LOCAL app.current_tenant_id = 'afa8daf3-5cfa-4589-9200-b39a538a12de';
|
||||
SELECT COUNT(*) FROM work_items; -- Should return tenant-specific count
|
||||
ROLLBACK;
|
||||
```
|
||||
|
||||
**Step 4: Check API Logs**
|
||||
```bash
|
||||
docker logs workclub_api 2>&1 | grep -i "tenant context"
|
||||
```
|
||||
- Should see: `"Set tenant context for database connection: {TenantId}"`
|
||||
- Red flag: `"No tenant context available for database connection"`
|
||||
|
||||
**Step 5: Verify Finbuckle Store**
|
||||
```csharp
|
||||
// Add to health check endpoint:
|
||||
var store = services.GetRequiredService<IMultiTenantStore<TenantInfo>>();
|
||||
var tenants = await store.GetAllAsync();
|
||||
return Ok(new { TenantCount = tenants.Count() });
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Key Takeaways
|
||||
|
||||
1. **Authentication ≠ Authorization ≠ Data Isolation**
|
||||
- Phase 1 QA verified JWT validation (authentication)
|
||||
- Phase 2 QA revealed RLS broken (data isolation)
|
||||
- All 3 layers must work for secure multi-tenancy
|
||||
|
||||
2. **RLS is Defense-in-Depth, Not Primary**
|
||||
- Application code MUST filter by TenantId (primary defense)
|
||||
- RLS prevents accidental leaks (defense-in-depth)
|
||||
- If RLS is primary filter → Application logic bypassed (bad design)
|
||||
|
||||
3. **Finbuckle Requires Active Configuration**
|
||||
- `WithInMemoryStore()` is not "automatic" - must populate
|
||||
- `WithEFCoreStore()` is better for dynamic tenants
|
||||
- Tenant resolution failure is SILENT (no exceptions)
|
||||
|
||||
4. **PostgreSQL Owner Bypass is Default**
|
||||
- Always use `FORCE ROW LEVEL SECURITY` for app tables
|
||||
- OR: Use non-owner role for API connections
|
||||
|
||||
5. **QA Must Test Isolation, Not Just Auth**
|
||||
- Positive test: User A sees their data
|
||||
- **Negative test**: User A does NOT see User B's data (critical!)
|
||||
|
||||
|
||||
## Task 3: ClubRoleClaimsTransformation - Comma-Separated Clubs Support (2026-03-05)
|
||||
|
||||
### Key Learnings
|
||||
|
||||
1. **ClubRole Claims Architecture**
|
||||
- Keycloak sends clubs as comma-separated UUIDs: `"uuid1,uuid2,uuid3"`
|
||||
- Originally code expected JSON dictionary format (legacy)
|
||||
- Both TenantValidationMiddleware AND ClubRoleClaimsTransformation needed fixing
|
||||
|
||||
2. **Database Role Lookup Pattern**
|
||||
- Member entity stores ExternalUserId (from Keycloak "sub" claim)
|
||||
- Role is stored as ClubRole enum in database, not in the JWT claim
|
||||
- Pattern: Query Members table by ExternalUserId + TenantId to get role
|
||||
- Use FirstOrDefault() synchronously in IClaimsTransformation (avoid async issues with hot reload)
|
||||
|
||||
3. **IClaimsTransformation Constraints**
|
||||
- Must return Task<ClaimsPrincipal> (interface requirement)
|
||||
- Should NOT make method async - use Task.FromResult() instead
|
||||
- Hot reload fails when making synchronous method async (ENC0098 error)
|
||||
- Synchronous database queries with try/catch are safe fallback
|
||||
|
||||
4. **Dependency Injection in Auth Services**
|
||||
- IClaimsTransformation registered as Scoped service
|
||||
- AppDbContext is also Scoped - dependency injection works correctly
|
||||
- Constructor injection in auth transforms: `IHttpContextAccessor` and `AppDbContext`
|
||||
|
||||
5. **Claim Name Mapping**
|
||||
- Keycloak "sub" claim = ExternalUserId in database
|
||||
- "clubs" claim = comma-separated UUIDs (after our fix)
|
||||
- "X-Tenant-Id" header = requested tenant from client
|
||||
- Map ClubRole enum to ASP.NET role strings (Admin, Manager, Member, Viewer)
|
||||
|
||||
### Code Pattern for Claims Transformation
|
||||
|
||||
```csharp
|
||||
// Inject dependencies in constructor
|
||||
public ClubRoleClaimsTransformation(
|
||||
IHttpContextAccessor httpContextAccessor,
|
||||
AppDbContext context)
|
||||
|
||||
// Return Task.FromResult() instead of using async/await
|
||||
public Task<ClaimsPrincipal> TransformAsync(ClaimsPrincipal principal)
|
||||
{
|
||||
// Parse comma-separated claims
|
||||
var clubIds = clubsClaim.Split(',', StringSplitOptions.RemoveEmptyEntries)
|
||||
.Select(id => id.Trim())
|
||||
.ToArray();
|
||||
|
||||
// Synchronous database query
|
||||
var member = _context.Members
|
||||
.FirstOrDefault(m => m.ExternalUserId == userIdClaim && m.TenantId == tenantId);
|
||||
|
||||
// Map enum to string
|
||||
var mappedRole = MapClubRoleToAspNetRole(member.Role);
|
||||
identity.AddClaim(new Claim(ClaimTypes.Role, mappedRole));
|
||||
|
||||
return Task.FromResult(principal);
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Task 4: TenantDbConnectionInterceptor - Connection State Fix (2026-03-05)
|
||||
|
||||
### Key Learnings
|
||||
|
||||
1. **Entity Framework Interceptor Lifecycle**
|
||||
- `ConnectionOpeningAsync`: Called BEFORE connection opens (connection still closed)
|
||||
- `ConnectionOpened`: Called AFTER connection is fully open and ready
|
||||
- Attempting SQL execution in ConnectionOpeningAsync fails with "Connection is not open"
|
||||
|
||||
2. **PostgreSQL SET LOCAL Command Requirements**
|
||||
- `SET LOCAL` must execute on an OPEN connection
|
||||
- Must use synchronous `.ExecuteNonQuery()` in ConnectionOpened (which is not async)
|
||||
- Cannot use async/await in ConnectionOpened callback
|
||||
|
||||
3. **Interceptor Design Pattern for Tenant Context**
|
||||
- Separate concerns: opening phase vs opened phase
|
||||
- ConnectionOpeningAsync: Just validation/logging (no command execution)
|
||||
- ConnectionOpened: Execute tenant context SQL command synchronously
|
||||
- Use try/catch with logging for error handling
|
||||
|
||||
4. **Testing Database State**
|
||||
- Remember to query actual database tables for TenantId values
|
||||
- JWT claims may have different UUIDs than database records
|
||||
- Database is source of truth for member-tenant relationships
|
||||
|
||||
### Code Pattern for Connection Interceptors
|
||||
|
||||
```csharp
|
||||
// Phase 1: ConnectionOpeningAsync - Connection NOT open yet
|
||||
public override async ValueTask<InterceptionResult> ConnectionOpeningAsync(
|
||||
DbConnection connection, ConnectionEventData eventData,
|
||||
InterceptionResult result, CancellationToken cancellationToken = default)
|
||||
{
|
||||
await base.ConnectionOpeningAsync(connection, eventData, result, cancellationToken);
|
||||
|
||||
var tenantId = _httpContextAccessor.HttpContext?.Items["TenantId"] as string;
|
||||
|
||||
if (string.IsNullOrWhiteSpace(tenantId))
|
||||
{
|
||||
_logger.LogWarning("No tenant context available");
|
||||
}
|
||||
|
||||
// DO NOT execute SQL here - connection not open
|
||||
return result;
|
||||
}
|
||||
|
||||
// Phase 2: ConnectionOpened - Connection is open
|
||||
public override void ConnectionOpened(DbConnection connection, ConnectionEndEventData eventData)
|
||||
{
|
||||
base.ConnectionOpened(connection, eventData);
|
||||
|
||||
var tenantId = _httpContextAccessor.HttpContext?.Items["TenantId"] as string;
|
||||
|
||||
if (string.IsNullOrWhiteSpace(tenantId))
|
||||
{
|
||||
_logger.LogWarning("No tenant context available");
|
||||
return;
|
||||
}
|
||||
|
||||
// Safe to execute SQL now - connection is open
|
||||
if (connection is NpgsqlConnection npgsqlConnection)
|
||||
{
|
||||
using var command = npgsqlConnection.CreateCommand();
|
||||
command.CommandText = $"SET LOCAL app.current_tenant_id = '{tenantId}'";
|
||||
|
||||
try
|
||||
{
|
||||
command.ExecuteNonQuery(); // Synchronous, connection open
|
||||
_logger.LogDebug("Set tenant context: {TenantId}", tenantId);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
_logger.LogError(ex, "Failed to set tenant context");
|
||||
throw;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## 2025-03-05: Fixed dotnet-api Docker build failure (NETSDK1064)
|
||||
|
||||
### Problem
|
||||
`dotnet watch --no-restore` failed with NETSDK1064 errors when volume mount `/app` overwrote the container's `obj/project.assets.json` files generated during `docker build`.
|
||||
|
||||
### Solution Applied
|
||||
Removed `--no-restore` flag from `backend/Dockerfile.dev` line 31:
|
||||
- **Before**: `ENTRYPOINT ["dotnet", "watch", "run", "--project", "WorkClub.Api/WorkClub.Api.csproj", "--no-restore"]`
|
||||
- **After**: `ENTRYPOINT ["dotnet", "watch", "run", "--project", "WorkClub.Api/WorkClub.Api.csproj"]`
|
||||
|
||||
### Result
|
||||
✅ Container rebuilds successfully
|
||||
✅ `dotnet watch` runs without NETSDK1064 errors
|
||||
✅ NuGet packages are automatically restored at runtime
|
||||
✅ Hot reload functionality preserved
|
||||
|
||||
### Why This Works
|
||||
- The `RUN dotnet restore WorkClub.slnx` in Dockerfile.dev (line 22) caches the package cache
|
||||
- Removing `--no-restore` allows `dotnet watch` to restore missing `project.assets.json` files before building
|
||||
- The NuGet package cache at `/root/.nuget/packages/` is intact and accessible inside the container
|
||||
- Volume mount still works for hot reload (no architectural change)
|
||||
|
||||
### Downstream Issue (Out of Scope)
|
||||
Application crashes during startup due to missing PostgreSQL role "app_admin", which is a database initialization issue, not a Docker build issue.
|
||||
|
||||
## RLS Setup Integration (2026-03-05)
|
||||
|
||||
**Problem**: API crashed on startup with "role app_admin does not exist" error when SeedDataService tried to `SET LOCAL ROLE app_admin`.
|
||||
|
||||
**Solution**:
|
||||
1. **PostgreSQL init.sh**: Added app_admin role creation after workclub database is created:
|
||||
```bash
|
||||
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "workclub" <<-EOSQL
|
||||
CREATE ROLE app_admin;
|
||||
GRANT app_admin TO workclub;
|
||||
EOSQL
|
||||
```
|
||||
|
||||
2. **SeedDataService.cs**: Added RLS setup after `MigrateAsync()` ensures tables exist:
|
||||
- Run `context.Database.MigrateAsync()` first
|
||||
- Then `SET LOCAL ROLE app_admin`
|
||||
- Then enable RLS + FORCE on all 5 tables
|
||||
- Then create idempotent tenant_isolation_policy + bypass_rls_policy for each table
|
||||
|
||||
**Key learnings**:
|
||||
- `GRANT app_admin TO workclub` allows workclub user to `SET LOCAL ROLE app_admin`
|
||||
- RLS policies MUST be applied AFTER tables exist (after migrations)
|
||||
- `ALTER TABLE ENABLE/FORCE ROW LEVEL SECURITY` is idempotent (safe to re-run)
|
||||
- `CREATE POLICY` is NOT idempotent — requires `IF NOT EXISTS` check via DO $$ block
|
||||
- Order: init.sh creates role → migrations create tables → SeedDataService applies RLS → seeds data
|
||||
- All 5 tables now have FORCE enabled, preventing owner bypass
|
||||
|
||||
**Verification commands**:
|
||||
```bash
|
||||
# Check policies exist
|
||||
docker exec workclub_postgres psql -U workclub -d workclub -c "SELECT tablename, policyname FROM pg_policies WHERE schemaname='public' ORDER BY tablename, policyname"
|
||||
|
||||
# Check FORCE is enabled
|
||||
docker exec workclub_postgres psql -U workclub -d workclub -c "SELECT relname, relrowsecurity, relforcerowsecurity FROM pg_class WHERE relname IN ('clubs', 'members', 'work_items', 'shifts', 'shift_signups')"
|
||||
```
|
||||
|
||||
## Keycloak Realm Export Password Configuration (2026-03-05)
|
||||
|
||||
Successfully fixed Keycloak realm import with working passwords for all test users.
|
||||
|
||||
### Key Findings:
|
||||
1. **Password format for realm imports**: Use `"value": "testpass123"` in credentials block, NOT `hashedSaltedValue`
|
||||
- Keycloak auto-hashes the plaintext password on import
|
||||
- This is the standard approach for development realm exports
|
||||
|
||||
2. **Protocol mapper JSON type for String attributes**: Must use `"jsonType.label": "String"` not `"JSON"`
|
||||
- Using "JSON" causes runtime error: "cannot map type for token claim"
|
||||
- The `clubs` attribute is stored as comma-separated UUIDs (String), not JSON object
|
||||
|
||||
3. **Deterministic GUIDs match Python MD5 calculation**:
|
||||
- Sunrise Tennis Club: `5e5be064-45ef-d781-f2e8-3d14bd197383`
|
||||
- Valley Cycling Club: `fafc4a3b-5213-c78f-b497-8ab52a0d5fda`
|
||||
- Generated with: `uuid.UUID(bytes=hashlib.md5(name.encode()).digest()[:16])`
|
||||
|
||||
4. **Protocol mapper configuration**:
|
||||
- Audience mapper uses `oidc-hardcoded-claim-mapper` type
|
||||
- Sub claim mapper uses `oidc-sub-mapper` type (built-in)
|
||||
- Both must have complete JSON structure with name, protocol, protocolMapper, config fields
|
||||
|
||||
### Verified Working Configuration:
|
||||
- All 5 users authenticate with password `testpass123`
|
||||
- JWT contains `aud: "workclub-api"` claim
|
||||
- JWT contains `sub` claim (user UUID from Keycloak)
|
||||
- JWT contains `clubs` claim with correct comma-separated tenant UUIDs
|
||||
- `sslRequired: "none"` allows HTTP token requests from localhost
|
||||
|
||||
### User-to-Club Mappings:
|
||||
- admin@test.com: Both clubs (Tennis + Cycling)
|
||||
- manager@test.com: Tennis only
|
||||
- member1@test.com: Both clubs (Tennis + Cycling)
|
||||
- member2@test.com: Tennis only
|
||||
- viewer@test.com: Tennis only
|
||||
|
||||
|
||||
Reference in New Issue
Block a user