diff --git a/docs/superpowers/specs/2026-04-26-k8s-deployment-design.md b/docs/superpowers/specs/2026-04-26-k8s-deployment-design.md new file mode 100644 index 0000000..488ea2c --- /dev/null +++ b/docs/superpowers/specs/2026-04-26-k8s-deployment-design.md @@ -0,0 +1,308 @@ +# EZSCALE Website — Production Docker + Kubernetes Deployment + +**Date:** 2026-04-26 +**Status:** Approved +**Scope:** Production-ready container build and Helm chart for deploying the EZSCALE website Laravel app to the existing K3s cluster. Two environments: `local` (a developer's k3d/minikube cluster) and `us-prod` (the existing US K3s cluster, namespace `ezscale`). + +## Goals + +- Ship a Helm chart that mirrors the sister `ezscale_api` chart's shape, so cluster-side deploy/rollback/ops scripts work without modification. +- Bake source, vendor, and built assets into immutable images. No host bind-mounts in production. +- Reuse the cluster's existing infrastructure: mariadb-operator, Longhorn PVCs, Traefik IngressRoute, cert-manager `letsencrypt` ClusterIssuer, Gitea Container Registry, Storj for object storage. +- Preserve the encryption-critical state (`APP_KEY`, Passport keys) across deploys without ever regenerating it. +- Keep the existing `docker-compose.yml` dev stack untouched — production is a separate, additional path. + +## Non-goals + +- Cloudflare Zero Trust for the admin panel (deferred — initial chart ships plain Let's Encrypt TLS). +- EU region deployment (single region for now). +- ExternalDNS automation (DNS managed manually in Cloudflare Terraform for now). +- Sealed Secrets / External Secrets Operator (raw Secret applied via `kubeseal` by hand, matching sister chart). +- WHMCS migration tooling (separate concern). + +## Cluster context (assumed prerequisites) + +The `us-prod` deployment depends on these already being installed in the K3s cluster — they all are, today, per `infrastructure/kubernetes/`: + +- **mariadb-operator** — provides `MariaDB`, `Database`, `User`, `Grant` CRDs (`k8s.mariadb.com/v1alpha1`). +- A replicated `MariaDB` CR named `mariadb` in the `ezscale` namespace, fronted by **MaxScale** for read/write splitting and autofailover, backed by Longhorn PVCs with daily backup CronJobs. +- **cert-manager** with a ClusterIssuer named `letsencrypt`. +- **Traefik** with the `cloudflarewarp` middleware (`kube-system` namespace) for client IP restoration from `CF-Connecting-IP`. +- **Gitea Container Registry** at `git.ezscale.cloud` and an image-pull `Secret` named `gitea-registry` in the target namespace. +- A Storj account with an S3-compatible bucket reserved for the website's user uploads and PDF cache. + +For `local`, the developer installs mariadb-operator into their k3d/minikube cluster (one-liner: `helm install mariadb-operator -n mariadb-operator --create-namespace mariadb-operator/mariadb-operator`). Cert-manager and Traefik are not strictly required locally — the chart's IngressRoute and Certificate templates are toggleable. + +## Repository layout + +``` +website/ +├── docker/ # existing dev compose stuff (unchanged) +├── docker-compose.yml # existing dev stack (unchanged) +├── Dockerfile # NEW: production multi-stage +├── helm/ +│ └── ezscale-website/ +│ ├── Chart.yaml +│ ├── values.yaml # safe defaults, no secrets +│ ├── values-local.yaml # k3d/minikube — everything in-cluster +│ ├── values-us-prod.yaml # uses existing ezscale-namespace MariaDB + Storj +│ └── templates/ +│ ├── _helpers.tpl +│ ├── configmap.yaml # APP_ENV, non-secret env vars +│ ├── secret.yaml # placeholder; only renders if values provided +│ ├── deployment-app.yaml # nginx + php-fpm sidecar +│ ├── deployment-horizon.yaml +│ ├── deployment-scheduler.yaml +│ ├── service.yaml +│ ├── ingressroute.yaml # Traefik CRD, three hosts → one Service +│ ├── certificate.yaml # cert-manager Certificate +│ ├── job-migrate.yaml # Helm hook: pre-install + pre-upgrade +│ ├── hpa-app.yaml # autoscale web pods on CPU +│ ├── mariadb-database.yaml # operator CRDs +│ ├── mariadb-user.yaml +│ ├── mariadb-grant.yaml +│ ├── mariadb-instance.yaml # only renders when mariadb.enabled=true +│ └── statefulset-valkey.yaml # only renders when valkey.enabled=true +└── .gitea/ + └── workflows/ + └── release.yml # NEW: build + push on v* tags +``` + +Chart name `ezscale-website` mirrors sister chart's `ezscale-api`. + +## Production Dockerfile (multi-stage) + +A single `Dockerfile` at the repo root with three named build targets that share common base layers: + +| Stage | Base | Purpose | +|-------|------|---------| +| `composer-deps` | `composer:2` | `composer install --no-dev --no-scripts --prefer-dist` → `vendor/` | +| `node-build` | `node:24-alpine` | `npm ci && npm run build` → `public/build/` | +| `runtime-base` | `php:8.3-fpm-bookworm` | PHP extensions (pdo_mysql, intl, bcmath, gd, zip, pcntl, posix, exif, sockets, opcache, redis), opcache config, www-data UID, copies vendor + source + built assets | +| `app` (target) | `runtime-base` | CMD: `php-fpm`. Pairs with nginx sidecar in the Deployment. | +| `horizon` (target) | `runtime-base` | CMD: `php artisan horizon`. SIGTERM, 60s grace period. | +| `scheduler` (target) | `runtime-base` | CMD: `php artisan schedule:work`. | + +Image tags published to `git.ezscale.cloud/ezscale/website:{role}-{version}` and `:{role}-latest`. The chart's `image.tag` value selects the version; the role suffix (`app`/`horizon`/`scheduler`) is appended in each Deployment template via `_helpers.tpl`. + +**Why three targets sharing one Dockerfile, not one image with a parameterized command?** Image immutability and security. The horizon/scheduler images don't need nginx config or a php-fpm pool, and they're long-lived — separate targets let us trim each one to its minimum. + +## Web pod shape + +One `Deployment` named `ezscale-website-app` with **two containers** in a single pod: + +- `nginx` — `nginx:1.30-alpine`, ConfigMap-mounted vhost serving `/var/www/html/public`, fastcgi → `127.0.0.1:9000`. Listens on `:80`. +- `app` — the `app` Dockerfile target. php-fpm on `:9000`. + +The two containers share the source via an `emptyDir` populated by an init container that runs `cp -a /var/www/html/. /shared/` from the app image. This pattern is copied verbatim from the sister chart and lets us update nginx config without rebuilding the app image. + +**Health probes:** +- Liveness: HTTP `GET /up` on nginx (Laravel's built-in health endpoint). +- Readiness: same path, with `failureThreshold: 3`. +- Startup probe: `GET /up` with a generous threshold to cover migrations finishing in front-of-pod warmup. + +**HPA:** `1 → 8` replicas on 70% CPU, matches sister chart's prod values. + +## Subdomain routing + +Three subdomains → one Service. Laravel's `Route::domain()` in `bootstrap/app.php` handles per-subdomain dispatch in-pod. + +```yaml +# ingressroute.yaml (simplified) +spec: + entryPoints: [websecure] + routes: + - match: Host(`ezscale.cloud`) || Host(`account.ezscale.cloud`) || Host(`admin.ezscale.cloud`) + middlewares: + - name: cloudflarewarp + namespace: kube-system + services: + - name: ezscale-website + port: 80 + tls: + secretName: ezscale-website-tls +``` + +A second IngressRoute on entryPoint `web` redirects HTTP → HTTPS via the `kube-system/http-to-https` middleware (matches sister pattern). + +One `Certificate` resource covers all three SAN names. cert-manager solves HTTP-01 via Traefik on `:80`. + +Cloudflare Zero Trust for the admin host is **deferred**. When ready, layer Access on by adding an annotation to the IngressRoute or splitting the admin host into its own IngressRoute with a Cloudflare Tunnel sidecar. + +## File storage + +Web/horizon/scheduler pods are stateless. All filesystem reads/writes go to Laravel's `s3` disk in prod: + +- `values-us-prod.yaml` sets `FILESYSTEM_DISK=s3`. +- Storj credentials (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_BUCKET`, `AWS_ENDPOINT`, `AWS_DEFAULT_REGION`, `AWS_USE_PATH_STYLE_ENDPOINT=true`) live in the chart's `Secret`. +- User uploads (avatars, KB images, ticket attachments) and cached invoice PDFs all go to Storj. + +`local` defaults to the standard `local` disk on an `emptyDir` — fine for dev. + +No PVCs on the app/horizon/scheduler Deployments. + +## Persistent state inventory + +| Data | Storage | Persistence guarantee | +|------|---------|-----------------------| +| **Application DB** | Existing `mariadb` CR in `ezscale` ns | Longhorn replicated PVCs + existing backup CronJob to Storj | +| **Sessions** | Valkey StatefulSet (this chart, 1 replica) | Valkey AOF on a Longhorn PVC. AOF survives pod restart. If Valkey is destroyed, users get logged out — acceptable. | +| **Cache** | Same Valkey | Ephemeral by design — anything in `cache:` is regenerable | +| **Queue (Horizon)** | Same Valkey | Important — losing the queue loses pending jobs. Same AOF-backed PVC. | +| **User uploads + cached PDFs** | Storj S3 | Bucket versioning + Storj's intrinsic replication | +| **`APP_KEY`** | k8s Secret `ezscale-website-secrets` | **Bootstrap once, never regenerated.** Decrypts `users.two_factor_secret`, encrypted credentials, encrypted cookies. | +| **Passport keys** (`oauth-private.key`, `oauth-public.key`) | Same Secret | Same constraint — bootstrapped once, never overwritten. Used to sign OAuth access tokens. | + +### `APP_KEY` and Passport key bootstrap procedure + +The chart's `templates/secret.yaml` only renders if `secret.create=true` AND a value is supplied. Default for prod is `secret.create=false` — the chart assumes a Secret named `ezscale-website-secrets` already exists in the namespace and references it by name. + +First-time bootstrap (one-time, manual): +1. Generate `APP_KEY` locally: `php artisan key:generate --show`. +2. Generate Passport keys locally: `php artisan passport:keys` (writes to `storage/oauth-{public,private}.key`). +3. Create the Secret: `kubectl create secret generic ezscale-website-secrets -n ezscale --from-literal=APP_KEY=... --from-file=oauth-private.key=... --from-file=oauth-public.key=... --from-literal=DB_PASSWORD=... --from-literal=AWS_SECRET_ACCESS_KEY=... --from-literal=STRIPE_SECRET=... ...` etc. +4. (Optional) Run that command's output through `kubeseal` and check the resulting `SealedSecret` into `infrastructure/`. + +Subsequent `helm upgrade` invocations never touch this Secret. The Deployments mount it via `envFrom: secretRef:` and the entrypoint copies the OAuth keys into `storage/`. + +### Why this matters + +If the chart ever regenerates `APP_KEY`, every encrypted value in the database becomes garbage — 2FA secrets, encrypted gateway credentials, encrypted session payloads. Same for Passport keys: regenerating them invalidates every issued access token at once. The chart's secret-handling MUST treat both values as immutable post-bootstrap. + +## Database wiring (operator-managed) + +For `us-prod`, the chart creates three CRDs in the `ezscale` namespace, all referencing the existing `mariadb` instance: + +```yaml +# mariadb-database.yaml +apiVersion: k8s.mariadb.com/v1alpha1 +kind: Database +metadata: + name: ezscale-billing + namespace: ezscale +spec: + mariaDbRef: { name: mariadb } + characterSet: utf8mb4 + collate: utf8mb4_unicode_ci + name: ezscale_billing +``` + +```yaml +# mariadb-user.yaml +apiVersion: k8s.mariadb.com/v1alpha1 +kind: User +metadata: + name: ezscale-website-app + namespace: ezscale +spec: + mariaDbRef: { name: mariadb } + passwordSecretKeyRef: + name: ezscale-website-secrets + key: DB_PASSWORD + host: "%" + maxUserConnections: 50 +``` + +```yaml +# mariadb-grant.yaml +apiVersion: k8s.mariadb.com/v1alpha1 +kind: Grant +metadata: + name: ezscale-website-app-grant + namespace: ezscale +spec: + mariaDbRef: { name: mariadb } + username: ezscale-website-app + host: "%" + privileges: ["ALL PRIVILEGES"] + database: ezscale_billing + table: "*" +``` + +Pods connect via the MaxScale router service (read/write split) at `mariadb-maxscale.ezscale.svc.cluster.local:3306` (port may differ — TBD verified from existing MaxScale Service). + +For `local`, an additional `mariadb-instance.yaml` template renders a 1-replica `MariaDB` CR in the same chart release, plus a root-password Secret. `Database`/`User`/`Grant` reference that local instance instead. + +## Valkey + +`templates/statefulset-valkey.yaml` (toggleable via `valkey.enabled`): + +- 1 replica, `valkey/valkey:9-alpine` +- Command: `valkey-server --appendonly yes --maxmemory 1gb --maxmemory-policy allkeys-lru` (LRU is fine because cache and sessions can be evicted; queue uses dedicated keys but Horizon will retry lost jobs). +- 5Gi PVC on Longhorn (prod) / local-path (local) +- ClusterIP Service on `:6379` +- No password in `local`. In `us-prod`, password from the Secret. + +Both envs default to `valkey.enabled=true`. There's no current need for an external Redis in prod — running per-app Valkey matches sister API and infrastructure/petro patterns. + +## Migrations + +`templates/job-migrate.yaml` — Helm hook: + +```yaml +metadata: + annotations: + "helm.sh/hook": pre-upgrade,pre-install + "helm.sh/hook-weight": "0" + "helm.sh/hook-delete-policy": before-hook-creation +``` + +Runs `php artisan migrate --force --no-interaction`. Optional second step (`--seed --class=ProductionSeeder`) toggleable via `migrate.seed=true`. Image: same as the `app` target. + +If the Job fails, `helm upgrade` aborts before any pod rolls. The previous ReplicaSet stays serving traffic. + +For emergency manual deploys: `--set migrate.enabled=false`. + +## Scheduler + +A `Deployment` (1 replica, no autoscale) running `php artisan schedule:work`. This long-running command checks for due tasks every minute and spawns them as subprocesses. Survives pod restart with no missed runs as long as the pod is up. + +We chose this over a `CronJob` running `schedule:run` every minute because: +- Logs land in one place (the Deployment), easier to tail. +- No per-minute pod-creation overhead. +- Matches the dev compose pattern, easier mental model. + +Single replica is intentional — running two `schedule:work` instances would double-fire scheduled tasks. + +## Image registry, CI, deploy + +`.gitea/workflows/release.yml` mirrors sister API: + +- Trigger: `push` of `v*` tags +- Build & push three images (`app`, `horizon`, `scheduler`) tagged `:{role}-{version}` and `:{role}-latest` +- Login: `git.ezscale.cloud` with `${{ secrets.CI_TOKEN }}` +- After build: `helm upgrade --install ezscale-website helm/ezscale-website -n ezscale -f helm/ezscale-website/values-us-prod.yaml --set image.tag=v{X.Y.Z}` (executed against the cluster via a self-hosted runner with kubeconfig). + +Pull secret: `gitea-registry` (already exists in the `ezscale` namespace). + +Existing CI (tests, Pint) stays in `.gitea/workflows/ci.yml` if present, or is added separately — out of scope for this spec. + +## Open questions / TBD during implementation + +- Verify the exact MaxScale Service name and port in `infrastructure/kubernetes/ezscale/mysql/`. The chart's default `DB_HOST` should match what MaxScale exposes. +- Confirm the cluster's StorageClass name for production (Longhorn vs local-path) by inspecting the existing `mariadb` CR's PVCs. +- Confirm the exact Storj bucket name to use in `us-prod` (proposal: `ezscale-website-prod`). Local doesn't need one — it uses the `local` disk on `emptyDir`. + +## Out of scope (separate spec needed before adding) + +- Cloudflare Zero Trust for the admin host +- EU region deployment + DB replication topology +- Backup verification / restore drills +- Multi-tenancy (Kasm) — see `KASM_AND_MULTITENANCY.md` +- WHMCS migration runbook + +## Implementation order (for the plan that follows) + +1. Production `Dockerfile` (build the three targets locally, smoke-test via `docker run`). +2. Helm chart skeleton (`Chart.yaml`, `values.yaml`, `_helpers.tpl`). +3. Core templates: `configmap`, `secret` (placeholder), `deployment-app`, `service`. +4. Database CRDs (`mariadb-database`, `-user`, `-grant`, `-instance` for local). +5. `statefulset-valkey`. +6. `deployment-horizon`, `deployment-scheduler`. +7. `job-migrate` (Helm hook). +8. `ingressroute`, `certificate`. +9. `hpa-app`. +10. `values-local.yaml` and `values-us-prod.yaml`. +11. `.gitea/workflows/release.yml`. +12. Local end-to-end test in k3d. +13. Documentation: `helm/ezscale-website/README.md` covering bootstrap procedure for `APP_KEY` / Passport keys.