docs(k8s): add Task 6 Kustomize base manifests learnings

- Kustomize vs Helm trade-offs and base+overlay pattern
- K8s resource naming conventions with workclub- prefix
- .NET health probe semantics (startup/liveness/readiness)
- StatefulSet + headless service pattern for Postgres
- PostgreSQL 16-alpine with pg_isready health check
- Keycloak 26.x production mode configuration
- Ingress path-based routing (/ → frontend, /api → backend)
- ConfigMap strategy for non-sensitive configuration
- Resource requests/limits placeholders for overlays
- Image tag strategy with :latest placeholder
- Gotchas: serviceName, headless service publishNotReadyAddresses, probe timeouts
This commit is contained in:
WorkClub Automation
2026-03-03 14:10:04 +01:00
parent ba024c45be
commit a1032484bd

View File

@@ -232,3 +232,418 @@ _Conventions, patterns, and accumulated wisdom from task execution_
- Set up relationships between entities - Set up relationships between entities
- Configure PostgreSQL xmin concurrency token - Configure PostgreSQL xmin concurrency token
---
## Task 6: Kubernetes Kustomize Base Manifests (2026-03-03)
### Key Learnings
1. **Kustomize vs Helm Trade-offs**
- Kustomize chosen: lightweight, YAML-native, no templating language
- Base + overlays pattern: separate environment-specific config from base
- Base manifests use placeholders for image tags (`:latest`), resource limits (100m/256Mi requests)
- Environment overlays (dev, staging, prod) override via patches/replacements
2. **Kubernetes Resource Naming & Labeling**
- Consistent `workclub-` prefix across all resources (Deployments, Services, ConfigMaps, StatefulSets, Ingress)
- Labels for resource tracking: `app: workclub-api`, `component: backend|frontend|auth|database`
- Service selectors must match Pod template labels exactly
- DNS service names within cluster: `serviceName:port` (e.g., `workclub-api:80`)
3. **.NET Health Probes (ASP.NET Core Health Checks)**
- Three distinct probes with different semantics:
- `startupProbe` (/health/startup): Initial boot, longer timeout (30s retries), prevents traffic until app fully initialized
- `livenessProbe` (/health/live): Periodic health (15s), restart pod if fails continuously (3 failures)
- `readinessProbe` (/health/ready): Pre-request check (10s), removes pod from service on failure (2 failures)
- Startup probe MUST complete before liveness/readiness are checked
- All three probes return `200 OK` for healthy status
4. **StatefulSet + Headless Service Pattern**
- StatefulSet requires `serviceName` pointing to headless service (clusterIP: None)
- Headless service enables stable network identity: `pod-0.serviceName.namespace.svc.cluster.local`
- Primary service (ClusterIP) for general pod connections
- Volume claim templates: each pod gets its own PVC (e.g., `postgres-data-workclub-postgres-0`)
- Init container scripts via ConfigMap mount to `/docker-entrypoint-initdb.d`
5. **PostgreSQL StatefulSet Configuration**
- Image: `postgres:16-alpine` (lightweight, 150MB vs 400MB+)
- Health check: `pg_isready -U app -d workclub` (simple, fast, reliable)
- Data persistence: volumeClaimTemplate with 10Gi storage, `standard` storageClassName (overrideable in overlay)
- Init script creates both `workclub` (app) and `keycloak` databases + users in single ConfigMap
6. **Keycloak 26.x Production Mode**
- Image: `quay.io/keycloak/keycloak:26.1` (Red Hat official registry)
- Command: `start` (production mode, not `start-dev`)
- Database: PostgreSQL via `KC_DB=postgres` + `KC_DB_URL_HOST=workclub-postgres`
- Probes: `/health/ready` (readiness), `/health/live` (liveness)
- Hostname: `KC_HOSTNAME_STRICT=false` in dev (allows any Host header)
- Proxy: `KC_PROXY=edge` for behind reverse proxy (Ingress)
7. **Ingress Path-Based Routing**
- Single ingress rule: `workclub-ingress` with path-based routing
- Frontend: path `/``workclub-frontend:80` (pathType: Prefix)
- Backend: path `/api``workclub-api:80` (pathType: Prefix)
- Host: `localhost` (overrideable per environment)
- TLS: deferred to production overlay (cert-manager, letsencrypt)
8. **ConfigMap Strategy for Non-Sensitive Configuration**
- Central `workclub-config` ConfigMap:
- `log-level: Information`
- `cors-origins: http://localhost:3000`
- `api-base-url: http://workclub-api`
- `keycloak-url: http://workclub-keycloak`
- `keycloak-realm: workclub`
- Database host/port/name
- Sensitive values (passwords, connection strings) → Secrets (not in base)
- Environment-specific overrides in dev/prod overlays (CORS_ORIGINS changes)
9. **Resource Requests & Limits Pattern**
- Base uses uniform placeholders (all services: 100m/256Mi requests, 500m/512Mi limits)
- Environment overlays replace via patch (e.g., prod: 500m/2Gi)
- Prevents resource contention in shared clusters
- Allows gradual scaling experiments without manifests changes
10. **Image Tag Strategy**
- Base: `:latest` placeholder for all app images
- Registry: uses default Docker Hub (no registry prefix)
- Overlay patch: environment-specific tags (`:v1.2.3`, `:latest-dev`, `:sha-abc123`)
- Image pull policy: `IfNotPresent` (caching optimization for stable envs)
### Architecture Decisions
- **Why Kustomize over Helm**: Plan explicitly avoids Helm (simpler YAML, no new DSL, easier Git diffs)
- **Why base + overlays**: Separation of concerns — base is declarative truth, overlays add environment context
- **Why two Postgres services**: Headless for StatefulSet DNS (stable identity), Primary for app connections (load balancing)
- **Why both startup + liveness probes**: Prevents restart loops during slow startup (Java/Keycloak can take 20+ seconds)
- **Why ConfigMap for init.sql**: Immutable config, easier than baked-into-image, updateable per environment
### Gotchas to Avoid
- Forgetting `serviceName` in StatefulSet causes pod DNS discovery failure (critical for Postgres)
- Missing headless service's `publishNotReadyAddresses: true` prevents pod-to-pod startup communication
- Keycloak startup probe timeout too short (<15s retries) causes premature restart loops
- `.NET health endpoints require HttpGet, not TCP probes (TCP only checks port, not app readiness)
- Ingress path `/api` must use `pathType: Prefix` to catch `/api/*` routes
### Next Steps
- Task 25: Create dev overlay (env-specific values, dev-db.postgres.svc, localhost ingress)
- Task 26: Create prod overlay (TLS config, resource limits, replica counts, PDB)
- Task 27: Add cert-manager + Let's Encrypt to prod
- Future: Network policies, pod disruption budgets, HPA (deferred to Wave 2)
---
## Task 5: Next.js 15 Project Initialization (2026-03-03)
### Key Learnings
1. **Next.js 15 with Bun Package Manager**
- `bunx create-next-app@latest` with `--use-bun` flag successfully initializes projects
- Bun installation 3-4x faster than npm/yarn (351 packages in 3.4s)
- Next.js 16.1.6 (Turbopack) is default in create-next-app@latest (latest version)
- Bun supports all Node.js ecosystem tools seamlessly
- Dev server startup: 625ms ready time (excellent for development)
2. **shadcn/ui Integration**
- Initialize with `bunx shadcn@latest init` (interactive prompt, sensible defaults)
- Default color palette: Neutral (can override with slate, gray, zinc, stone)
- CSS variables auto-generated in `src/app/globals.css` for theming
- Components installed to `src/components/ui/` automatically
- Note: `toast` component deprecated → use `sonner` instead (modern toast library)
3. **Standalone Output Configuration**
- Set `output: 'standalone'` in `next.config.ts` for Docker deployments
- Generates `.next/standalone/` with self-contained server.js entry point
- Reduces Docker image size: only includes required node_modules (not full installation)
- Production builds on this project: 2.9s compile, 240.4ms static page generation
- Standalone directory structure: `.next/`, `node_modules/`, `server.js`, `package.json`
4. **TypeScript Path Aliases**
- `@/*``./src/*` pre-configured in `tsconfig.json` by create-next-app
- Enables clean imports: `import { Button } from '@/components/ui/button'`
- Improves code readability, reduces relative path navigation (`../../`)
- Compiler validates paths automatically (LSP support included)
5. **Directory Structure Best Practices**
- App Router location: `src/app/` (not `pages/`)
- Component organization: `src/components/` for reusable, `src/components/ui/` for shadcn
- Utilities: `src/lib/` for helper functions (includes shadcn's `cn()` function)
- Custom hooks: `src/hooks/` (prepared for future implementation)
- Type definitions: `src/types/` (prepared for schema/type files)
- This structure scales from MVP to enterprise applications
6. **Build Verification**
- `bun run build` exit code 0, no errors
- TypeScript type checking passes (via Next.js)
- Static page generation: 4 pages (/, _not-found)
- No build warnings or deprecations
- Standalone build ready for Docker containerization
7. **Development Server Performance**
- `bun run dev` startup: 625ms (ready state)
- First page request: 1187ms (includes compilation + render)
- Hot Module Reloading (HMR): Turbopack provides fast incremental updates
- Bun's fast refresh cycles enable rapid development feedback
- Note: Plan indicates Bun P99 SSR latency (340ms) vs Node.js (120ms), so production deployment will use Node.js
### shadcn/ui Components Installed
All 10 components successfully added to `src/components/ui/`:
- ✓ button.tsx — Base button component with variants (primary, secondary, etc.)
- ✓ card.tsx — Card layout container (Card, CardHeader, CardFooter, etc.)
- ✓ badge.tsx — Status badges with color variants
- ✓ input.tsx — Form input field with placeholder and error support
- ✓ label.tsx — Form label with accessibility attributes
- ✓ select.tsx — Dropdown select with options (Radix UI based)
- ✓ dialog.tsx — Modal dialog component (Alert Dialog pattern)
- ✓ dropdown-menu.tsx — Context menu/dropdown menu (Radix UI based)
- ✓ table.tsx — Data table with thead, tbody, rows
- ✓ sonner.tsx — Toast notifications (modern replacement for react-hot-toast)
All components use Tailwind CSS utilities, no custom CSS files needed.
### Environment Variables Configuration
Created `.env.local.example` (committed to git) with development defaults:
```
NEXT_PUBLIC_API_URL=http://localhost:5000 # Backend API endpoint
NEXTAUTH_URL=http://localhost:3000 # NextAuth callback URL
NEXTAUTH_SECRET=dev-secret-change-me # Session encryption (Task 10)
KEYCLOAK_ISSUER=http://localhost:8080/realms/workclub # OAuth2 discovery
KEYCLOAK_CLIENT_ID=workclub-app # Keycloak client ID
KEYCLOAK_CLIENT_SECRET=<from-keycloak> # Placeholder (Task 3 fills in)
```
Pattern: `.env.local.example` is version-controlled, `.env.local` is gitignored per `.gitignore`.
### Dependencies Installed
```json
{
"dependencies": {
"next": "16.1.6",
"react": "19.2.3",
"react-dom": "19.2.3"
},
"devDependencies": {
"@tailwindcss/postcss": "4.2.1",
"@types/node": "20.19.35",
"@types/react": "19.2.14",
"@types/react-dom": "19.2.3",
"eslint": "9.39.3",
"eslint-config-next": "16.1.6",
"tailwindcss": "4.2.1",
"typescript": "5.9.3"
}
}
```
Note: Intentionally minimal dependencies for MVP. NextAuth.js added in Task 10.
### Build & Runtime Verification
**Build Verification**: ✓ PASSED
- Command: `bun run build`
- Exit Code: 0
- Compilation: 2.9s (Turbopack)
- TypeScript: No errors
- Static Generation: 4 pages in 240.4ms
- Output: `.next/standalone/` with all required files
**Dev Server Verification**: ✓ PASSED
- Command: `bun run dev`
- Startup: 625ms to ready state
- Port: 3000 (accessible)
- HTTP GET /: 200 OK in 1187ms
- Server process: Graceful shutdown with SIGTERM
**Standalone Verification**: ✓ PASSED
- `.next/standalone/server.js`: 6.55 KB entry point
- `.next/standalone/node_modules/`: Self-contained dependencies
- `.next/standalone/package.json`: Runtime configuration
- `.next/` directory: Pre-built routes and static assets
### Patterns & Conventions
1. **Component Organization**:
- UI components: `src/components/ui/` (shadcn)
- Feature components: `src/components/features/` (future)
- Layout components: `src/components/layout/` (future)
- Avoid nested folders beyond 2 levels for discoverability
2. **TypeScript Strict Mode**:
- `tsconfig.json` includes `"strict": true`
- All variables require explicit types
- Enables IDE autocomplete and early error detection
3. **Tailwind CSS v4 Configuration**:
- Uses CSS variables for theming (shadcn standard)
- Tailwind config auto-generated by shadcn init
- No custom color palette yet (uses defaults from Neutral)
4. **Git Strategy**:
- `.env.local.example` is committed (template for developers)
- `.env.local` is in `.gitignore` (personal configurations)
- No node_modules/ in repo (installed via `bun install`)
### Configuration Files Created
- `frontend/next.config.ts` — Minimal, standalone output enabled
- `frontend/tsconfig.json` — Path aliases, strict TypeScript mode
- `frontend/.env.local.example` — Environment variable template
- `frontend/components.json` — shadcn/ui configuration
- `frontend/tailwind.config.ts` — Tailwind CSS configuration with Tailwind v4
- `frontend/postcss.config.js` — PostCSS configuration for Tailwind
### Next Steps & Dependencies
- **Task 10**: NextAuth.js integration
- Adds `next-auth` dependency
- Creates `src/app/api/auth/[...nextauth]/route.ts`
- Integrates with Keycloak (configured in Task 3)
- **Task 17**: Frontend test infrastructure
- Adds vitest, @testing-library/react
- Component tests for shadcn/ui wrapper components
- E2E tests with Playwright (already in docker-compose)
- **Task 18**: Layout and authentication UI
- Creates `src/app/layout.tsx` with navbar/sidebar
- Client-side session provider setup
- Login/logout flows
- **Task 21**: Club management interface
- Feature components in `src/components/features/`
- Forms using shadcn input/select/button
- Data fetching from backend API (Task 6+)
### Gotchas to Avoid
1. **Bun vs Node.js Distinction**: This project uses Bun for development (fast HMR, 625ms startup). Production deployment will use Node.js due to P99 latency concerns (documented in plan).
2. **shadcn/ui Component Customization**: Components are meant to be copied and modified for project-specific needs. Avoid creating wrapper components — extend the shadcn components directly.
3. **Environment Variables Naming**:
- `NEXT_PUBLIC_*` are exposed to browser (use only for client-safe values)
- `KEYCLOAK_CLIENT_SECRET` is server-only (never exposed to frontend)
- `.env.local` for local development, CI/CD environment variables at deployment
4. **Path Aliases in Dynamic Imports**: If using dynamic imports with `next/dynamic`, ensure paths use `@/*` syntax for alias resolution.
5. **Tailwind CSS v4 Breaking Changes**:
- Requires `@tailwindcss/postcss` package (not default tailwindcss)
- CSS layer imports may differ from v3 (auto-handled by create-next-app)
### Evidence & Artifacts
- Build output: `.sisyphus/evidence/task-5-nextjs-build.txt`
- Dev server output: `.sisyphus/evidence/task-5-dev-server.txt`
- Git commit: `chore(frontend): initialize Next.js project with Tailwind and shadcn/ui`
## Task 3: Keycloak Realm Configuration (2026-03-03)
### Key Learnings
1. **Keycloak Realm Export Structure**
- Realm exports are JSON files with top-level keys: `realm`, `clients`, `users`, `roles`, `groups`
- Must include `enabled: true` for realm and clients to be active on import
- Version compatibility: Export from Keycloak 26.x is compatible with 26.x imports
- Import command: `start-dev --import-realm` (Docker volume mount required)
2. **Protocol Mapper Configuration for Custom JWT Claims**
- Mapper type: `oidc-usermodel-attribute-mapper` (NOT Script Mapper)
- Critical setting: `jsonType.label: JSON` ensures claim is parsed as JSON object (not string)
- User attribute: `clubs` (custom attribute on user entity)
- Token claim name: `clubs` (appears in JWT payload)
- Must include in: ID token, access token, userinfo endpoint (all three flags set to true)
- Applied to both clients: workclub-api and workclub-app (defined in client protocolMappers array)
3. **Client Configuration Patterns**
- **Confidential client (workclub-api)**:
- `publicClient: false`, has client secret
- `serviceAccountsEnabled: true` for service-to-service auth
- `standardFlowEnabled: false`, `directAccessGrantsEnabled: false` (no user login)
- Used by backend for client credentials grant
- **Public client (workclub-app)**:
- `publicClient: true`, no client secret
- `standardFlowEnabled: true` for OAuth2 Authorization Code Flow
- `directAccessGrantsEnabled: true` (enables password grant for dev testing)
- PKCE enabled via `attributes.pkce.code.challenge.method: S256`
- Redirect URIs: `http://localhost:3000/*` (wildcard for dev)
- Web origins: `http://localhost:3000` (CORS configuration)
4. **User Configuration with Custom Attributes**
- Custom attribute format: `attributes.clubs: ["{\"club-1-uuid\": \"admin\"}"]`
- Attribute value is array of strings (even for single value)
- JSON must be escaped as string in user attributes
- Protocol mapper will parse this string as JSON when generating JWT claim
- Users must have: `enabled: true`, `emailVerified: true`, no `requiredActions: []`
5. **Password Hashing in Realm Exports**
- Algorithm: `pbkdf2-sha512` (Keycloak default)
- Hash iterations: 210000 (high security for dev environment)
- Credentials structure includes: `hashedSaltedValue`, `salt`, `hashIterations`, `algorithm`
- Password: `testpass123` (all test users use same password for simplicity)
- Note: Hashed values in this export are PLACEHOLDER — Keycloak will generate real hashes on first user creation
6. **Multi-Tenant Club Membership Data Model**
- Format: `{"<tenant-id>": "<role>"}`
- Example: `{"club-1-uuid": "admin", "club-2-uuid": "member"}`
- Keys: Club UUIDs (tenant identifiers)
- Values: Role strings (admin, manager, member, viewer)
- Users can belong to multiple clubs with different roles in each
- Placeholder UUIDs used: `club-1-uuid`, `club-2-uuid` (real UUIDs created in Task 11 seed data)
7. **Test User Scenarios**
- **admin@test.com**: Multi-club admin (admin in club-1, member in club-2)
- **manager@test.com**: Single club manager (manager in club-1)
- **member1@test.com**: Multi-club member (member in both clubs)
- **member2@test.com**: Single club member (member in club-1)
- **viewer@test.com**: Read-only viewer (viewer in club-1)
- Covers all role types and single/multi-club scenarios
8. **Docker Environment Configuration**
- Keycloak 26.1 runs in Docker container
- Realm import via volume mount: `./infra/keycloak:/opt/keycloak/data/import`
- Health check endpoint: `/health/ready`
- Token endpoint: `/realms/workclub/protocol/openid-connect/token`
- Admin credentials: `admin/admin` (for Keycloak admin console)
9. **JWT Token Testing Approach**
- Use password grant (Direct Access Grant) for testing: `grant_type=password&username=...&password=...&client_id=workclub-app`
- Decode JWT: Split on `.`, extract second part (payload), base64 decode, parse JSON
- Verify claim type: `jq -r '.clubs | type'` should return `object` (NOT `string`)
- Test script: `infra/keycloak/test-auth.sh` automates this verification
10. **Common Pitfalls Avoided**
- DO NOT use Script Mapper (complex, requires JavaScript, harder to debug)
- DO NOT use `jsonType.label: String` (will break multi-tenant claim parsing)
- DO NOT forget `multivalued: false` in protocol mapper (we want single JSON object, not array)
- DO NOT hardcode real UUIDs in test users (use placeholders, seed data creates real IDs)
- DO NOT export realm without users (need `--users realm_file` or admin UI export with users enabled)
### Configuration Files Created
- **infra/keycloak/realm-export.json**: Complete realm configuration (8.9 KB)
- **infra/keycloak/test-auth.sh**: Automated verification script for JWT claims
- **.sisyphus/evidence/task-3-verification.txt**: Detailed verification documentation
- **.sisyphus/evidence/task-3-user-auth.txt**: User authentication results (placeholder)
- **.sisyphus/evidence/task-3-jwt-claims.txt**: JWT claim structure documentation (placeholder)
### Docker Environment Issue
- Colima (Docker runtime on macOS) failed to start with VZ driver error
- Verification deferred until Docker environment is available
- All configuration files are complete and JSON-validated
- Test script is ready for execution when Docker is running
### Next Phase Considerations
- Task 8 (Finbuckle) will consume `clubs` claim to implement tenant resolution
- Task 9 (JWT auth middleware) will validate tokens from Keycloak
- Task 10 (NextAuth) will use workclub-app client for frontend authentication
- Task 11 (seed data) will replace placeholder UUIDs with real club IDs
- Production deployment will need: real client secrets, HTTPS redirect URIs, proper password policies