SHA256

Files

Andrew bd7d99b8d1 docs(spec): k8s deployment design (Helm chart + production Dockerfile)

Locks in the production deployment shape: Helm chart matching sister
ezscale-api pattern, multi-stage Dockerfile with three targets
(app/horizon/scheduler), operator-managed MariaDB CRDs that plug into
the existing ezscale-namespace MariaDB instance, per-app Valkey,
Traefik IngressRoute + cert-manager TLS, Storj for file storage.

Critical invariant captured: APP_KEY and Passport keys are bootstrapped
once and never regenerated by the chart.

Two environments: local (k3d/minikube) and us-prod.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-26 22:23:56 -04:00

16 KiB

Raw Blame History

EZSCALE Website — Production Docker + Kubernetes Deployment

Date: 2026-04-26 Status: Approved Scope: Production-ready container build and Helm chart for deploying the EZSCALE website Laravel app to the existing K3s cluster. Two environments: local (a developer's k3d/minikube cluster) and us-prod (the existing US K3s cluster, namespace ezscale).

Goals

Ship a Helm chart that mirrors the sister ezscale_api chart's shape, so cluster-side deploy/rollback/ops scripts work without modification.
Bake source, vendor, and built assets into immutable images. No host bind-mounts in production.
Reuse the cluster's existing infrastructure: mariadb-operator, Longhorn PVCs, Traefik IngressRoute, cert-manager letsencrypt ClusterIssuer, Gitea Container Registry, Storj for object storage.
Preserve the encryption-critical state (APP_KEY, Passport keys) across deploys without ever regenerating it.
Keep the existing docker-compose.yml dev stack untouched — production is a separate, additional path.

Non-goals

Cloudflare Zero Trust for the admin panel (deferred — initial chart ships plain Let's Encrypt TLS).
EU region deployment (single region for now).
ExternalDNS automation (DNS managed manually in Cloudflare Terraform for now).
Sealed Secrets / External Secrets Operator (raw Secret applied via kubeseal by hand, matching sister chart).
WHMCS migration tooling (separate concern).

Cluster context (assumed prerequisites)

The us-prod deployment depends on these already being installed in the K3s cluster — they all are, today, per infrastructure/kubernetes/:

mariadb-operator — provides MariaDB, Database, User, Grant CRDs (k8s.mariadb.com/v1alpha1).
A replicated MariaDB CR named mariadb in the ezscale namespace, fronted by MaxScale for read/write splitting and autofailover, backed by Longhorn PVCs with daily backup CronJobs.
cert-manager with a ClusterIssuer named letsencrypt.
Traefik with the cloudflarewarp middleware (kube-system namespace) for client IP restoration from CF-Connecting-IP.
Gitea Container Registry at git.ezscale.cloud and an image-pull Secret named gitea-registry in the target namespace.
A Storj account with an S3-compatible bucket reserved for the website's user uploads and PDF cache.

For local, the developer installs mariadb-operator into their k3d/minikube cluster (one-liner: helm install mariadb-operator -n mariadb-operator --create-namespace mariadb-operator/mariadb-operator). Cert-manager and Traefik are not strictly required locally — the chart's IngressRoute and Certificate templates are toggleable.

Repository layout

website/
├── docker/                              # existing dev compose stuff (unchanged)
├── docker-compose.yml                   # existing dev stack (unchanged)
├── Dockerfile                           # NEW: production multi-stage
├── helm/
│   └── ezscale-website/
│       ├── Chart.yaml
│       ├── values.yaml                  # safe defaults, no secrets
│       ├── values-local.yaml            # k3d/minikube — everything in-cluster
│       ├── values-us-prod.yaml          # uses existing ezscale-namespace MariaDB + Storj
│       └── templates/
│           ├── _helpers.tpl
│           ├── configmap.yaml           # APP_ENV, non-secret env vars
│           ├── secret.yaml              # placeholder; only renders if values provided
│           ├── deployment-app.yaml      # nginx + php-fpm sidecar
│           ├── deployment-horizon.yaml
│           ├── deployment-scheduler.yaml
│           ├── service.yaml
│           ├── ingressroute.yaml        # Traefik CRD, three hosts → one Service
│           ├── certificate.yaml         # cert-manager Certificate
│           ├── job-migrate.yaml         # Helm hook: pre-install + pre-upgrade
│           ├── hpa-app.yaml             # autoscale web pods on CPU
│           ├── mariadb-database.yaml    # operator CRDs
│           ├── mariadb-user.yaml
│           ├── mariadb-grant.yaml
│           ├── mariadb-instance.yaml    # only renders when mariadb.enabled=true
│           └── statefulset-valkey.yaml  # only renders when valkey.enabled=true
└── .gitea/
    └── workflows/
        └── release.yml                  # NEW: build + push on v* tags

Chart name ezscale-website mirrors sister chart's ezscale-api.

Production Dockerfile (multi-stage)

A single Dockerfile at the repo root with three named build targets that share common base layers:

Stage	Base	Purpose
`composer-deps`	`composer:2`	`composer install --no-dev --no-scripts --prefer-dist` → `vendor/`
`node-build`	`node:24-alpine`	`npm ci && npm run build` → `public/build/`
`runtime-base`	`php:8.3-fpm-bookworm`	PHP extensions (pdo_mysql, intl, bcmath, gd, zip, pcntl, posix, exif, sockets, opcache, redis), opcache config, www-data UID, copies vendor + source + built assets
`app` (target)	`runtime-base`	CMD: `php-fpm`. Pairs with nginx sidecar in the Deployment.
`horizon` (target)	`runtime-base`	CMD: `php artisan horizon`. SIGTERM, 60s grace period.
`scheduler` (target)	`runtime-base`	CMD: `php artisan schedule:work`.

Image tags published to git.ezscale.cloud/ezscale/website:{role}-{version} and :{role}-latest. The chart's image.tag value selects the version; the role suffix (app/horizon/scheduler) is appended in each Deployment template via _helpers.tpl.

Why three targets sharing one Dockerfile, not one image with a parameterized command? Image immutability and security. The horizon/scheduler images don't need nginx config or a php-fpm pool, and they're long-lived — separate targets let us trim each one to its minimum.

Web pod shape

One Deployment named ezscale-website-app with two containers in a single pod:

nginx — nginx:1.30-alpine, ConfigMap-mounted vhost serving /var/www/html/public, fastcgi → 127.0.0.1:9000. Listens on :80.
app — the app Dockerfile target. php-fpm on :9000.

The two containers share the source via an emptyDir populated by an init container that runs cp -a /var/www/html/. /shared/ from the app image. This pattern is copied verbatim from the sister chart and lets us update nginx config without rebuilding the app image.

Health probes:

Liveness: HTTP GET /up on nginx (Laravel's built-in health endpoint).
Readiness: same path, with failureThreshold: 3.
Startup probe: GET /up with a generous threshold to cover migrations finishing in front-of-pod warmup.

HPA: 1 → 8 replicas on 70% CPU, matches sister chart's prod values.

Subdomain routing

Three subdomains → one Service. Laravel's Route::domain() in bootstrap/app.php handles per-subdomain dispatch in-pod.

# ingressroute.yaml (simplified)
spec:
  entryPoints: [websecure]
  routes:
    - match: Host(`ezscale.cloud`) || Host(`account.ezscale.cloud`) || Host(`admin.ezscale.cloud`)
      middlewares:
        - name: cloudflarewarp
          namespace: kube-system
      services:
        - name: ezscale-website
          port: 80
  tls:
    secretName: ezscale-website-tls

A second IngressRoute on entryPoint web redirects HTTP → HTTPS via the kube-system/http-to-https middleware (matches sister pattern).

One Certificate resource covers all three SAN names. cert-manager solves HTTP-01 via Traefik on :80.

Cloudflare Zero Trust for the admin host is deferred. When ready, layer Access on by adding an annotation to the IngressRoute or splitting the admin host into its own IngressRoute with a Cloudflare Tunnel sidecar.

File storage

Web/horizon/scheduler pods are stateless. All filesystem reads/writes go to Laravel's s3 disk in prod:

values-us-prod.yaml sets FILESYSTEM_DISK=s3.
Storj credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_BUCKET, AWS_ENDPOINT, AWS_DEFAULT_REGION, AWS_USE_PATH_STYLE_ENDPOINT=true) live in the chart's Secret.
User uploads (avatars, KB images, ticket attachments) and cached invoice PDFs all go to Storj.

local defaults to the standard local disk on an emptyDir — fine for dev.

No PVCs on the app/horizon/scheduler Deployments.

Persistent state inventory

Data	Storage	Persistence guarantee
Application DB	Existing `mariadb` CR in `ezscale` ns	Longhorn replicated PVCs + existing backup CronJob to Storj
Sessions	Valkey StatefulSet (this chart, 1 replica)	Valkey AOF on a Longhorn PVC. AOF survives pod restart. If Valkey is destroyed, users get logged out — acceptable.
Cache	Same Valkey	Ephemeral by design — anything in `cache:` is regenerable
Queue (Horizon)	Same Valkey	Important — losing the queue loses pending jobs. Same AOF-backed PVC.
User uploads + cached PDFs	Storj S3	Bucket versioning + Storj's intrinsic replication
`APP_KEY`	k8s Secret `ezscale-website-secrets`	Bootstrap once, never regenerated. Decrypts `users.two_factor_secret`, encrypted credentials, encrypted cookies.
Passport keys (`oauth-private.key`, `oauth-public.key`)	Same Secret	Same constraint — bootstrapped once, never overwritten. Used to sign OAuth access tokens.

`APP_KEY` and Passport key bootstrap procedure

The chart's templates/secret.yaml only renders if secret.create=true AND a value is supplied. Default for prod is secret.create=false — the chart assumes a Secret named ezscale-website-secrets already exists in the namespace and references it by name.

First-time bootstrap (one-time, manual):

Generate APP_KEY locally: php artisan key:generate --show.
Generate Passport keys locally: php artisan passport:keys (writes to storage/oauth-{public,private}.key).
Create the Secret: kubectl create secret generic ezscale-website-secrets -n ezscale --from-literal=APP_KEY=... --from-file=oauth-private.key=... --from-file=oauth-public.key=... --from-literal=DB_PASSWORD=... --from-literal=AWS_SECRET_ACCESS_KEY=... --from-literal=STRIPE_SECRET=... ... etc.
(Optional) Run that command's output through kubeseal and check the resulting SealedSecret into infrastructure/.

Subsequent helm upgrade invocations never touch this Secret. The Deployments mount it via envFrom: secretRef: and the entrypoint copies the OAuth keys into storage/.

Why this matters

If the chart ever regenerates APP_KEY, every encrypted value in the database becomes garbage — 2FA secrets, encrypted gateway credentials, encrypted session payloads. Same for Passport keys: regenerating them invalidates every issued access token at once. The chart's secret-handling MUST treat both values as immutable post-bootstrap.

Database wiring (operator-managed)

For us-prod, the chart creates three CRDs in the ezscale namespace, all referencing the existing mariadb instance:

# mariadb-database.yaml
apiVersion: k8s.mariadb.com/v1alpha1
kind: Database
metadata:
  name: ezscale-billing
  namespace: ezscale
spec:
  mariaDbRef: { name: mariadb }
  characterSet: utf8mb4
  collate: utf8mb4_unicode_ci
  name: ezscale_billing

# mariadb-user.yaml
apiVersion: k8s.mariadb.com/v1alpha1
kind: User
metadata:
  name: ezscale-website-app
  namespace: ezscale
spec:
  mariaDbRef: { name: mariadb }
  passwordSecretKeyRef:
    name: ezscale-website-secrets
    key: DB_PASSWORD
  host: "%"
  maxUserConnections: 50

# mariadb-grant.yaml
apiVersion: k8s.mariadb.com/v1alpha1
kind: Grant
metadata:
  name: ezscale-website-app-grant
  namespace: ezscale
spec:
  mariaDbRef: { name: mariadb }
  username: ezscale-website-app
  host: "%"
  privileges: ["ALL PRIVILEGES"]
  database: ezscale_billing
  table: "*"

Pods connect via the MaxScale router service (read/write split) at mariadb-maxscale.ezscale.svc.cluster.local:3306 (port may differ — TBD verified from existing MaxScale Service).

For local, an additional mariadb-instance.yaml template renders a 1-replica MariaDB CR in the same chart release, plus a root-password Secret. Database/User/Grant reference that local instance instead.

Valkey

templates/statefulset-valkey.yaml (toggleable via valkey.enabled):

1 replica, valkey/valkey:9-alpine
Command: valkey-server --appendonly yes --maxmemory 1gb --maxmemory-policy allkeys-lru (LRU is fine because cache and sessions can be evicted; queue uses dedicated keys but Horizon will retry lost jobs).
5Gi PVC on Longhorn (prod) / local-path (local)
ClusterIP Service on :6379
No password in local. In us-prod, password from the Secret.

Both envs default to valkey.enabled=true. There's no current need for an external Redis in prod — running per-app Valkey matches sister API and infrastructure/petro patterns.

Migrations

templates/job-migrate.yaml — Helm hook:

metadata:
  annotations:
    "helm.sh/hook": pre-upgrade,pre-install
    "helm.sh/hook-weight": "0"
    "helm.sh/hook-delete-policy": before-hook-creation

Runs php artisan migrate --force --no-interaction. Optional second step (--seed --class=ProductionSeeder) toggleable via migrate.seed=true. Image: same as the app target.

If the Job fails, helm upgrade aborts before any pod rolls. The previous ReplicaSet stays serving traffic.

For emergency manual deploys: --set migrate.enabled=false.

Scheduler

A Deployment (1 replica, no autoscale) running php artisan schedule:work. This long-running command checks for due tasks every minute and spawns them as subprocesses. Survives pod restart with no missed runs as long as the pod is up.

We chose this over a CronJob running schedule:run every minute because:

Logs land in one place (the Deployment), easier to tail.
No per-minute pod-creation overhead.
Matches the dev compose pattern, easier mental model.

Single replica is intentional — running two schedule:work instances would double-fire scheduled tasks.

Image registry, CI, deploy

.gitea/workflows/release.yml mirrors sister API:

Trigger: push of v* tags
Build & push three images (app, horizon, scheduler) tagged :{role}-{version} and :{role}-latest
Login: git.ezscale.cloud with ${{ secrets.CI_TOKEN }}
After build: helm upgrade --install ezscale-website helm/ezscale-website -n ezscale -f helm/ezscale-website/values-us-prod.yaml --set image.tag=v{X.Y.Z} (executed against the cluster via a self-hosted runner with kubeconfig).

Pull secret: gitea-registry (already exists in the ezscale namespace).

Existing CI (tests, Pint) stays in .gitea/workflows/ci.yml if present, or is added separately — out of scope for this spec.

Open questions / TBD during implementation

Verify the exact MaxScale Service name and port in infrastructure/kubernetes/ezscale/mysql/. The chart's default DB_HOST should match what MaxScale exposes.
Confirm the cluster's StorageClass name for production (Longhorn vs local-path) by inspecting the existing mariadb CR's PVCs.
Confirm the exact Storj bucket name to use in us-prod (proposal: ezscale-website-prod). Local doesn't need one — it uses the local disk on emptyDir.

Out of scope (separate spec needed before adding)

Cloudflare Zero Trust for the admin host
EU region deployment + DB replication topology
Backup verification / restore drills
Multi-tenancy (Kasm) — see KASM_AND_MULTITENANCY.md
WHMCS migration runbook

Implementation order (for the plan that follows)

Production Dockerfile (build the three targets locally, smoke-test via docker run).
Helm chart skeleton (Chart.yaml, values.yaml, _helpers.tpl).
Core templates: configmap, secret (placeholder), deployment-app, service.
Database CRDs (mariadb-database, -user, -grant, -instance for local).
statefulset-valkey.
deployment-horizon, deployment-scheduler.
job-migrate (Helm hook).
ingressroute, certificate.
hpa-app.
values-local.yaml and values-us-prod.yaml.
.gitea/workflows/release.yml.
Local end-to-end test in k3d.
Documentation: helm/ezscale-website/README.md covering bootstrap procedure for APP_KEY / Passport keys.

16 KiB Raw Blame History