Production Deployment — Cloudflare + Railway
End-to-end production deployment using Cloudflare Pages for the frontend and Railway for server, worker, Postgres, and Redis — selection rationale, env var wiring, migration hooks, and pitfall checklist
What this doc answers
After reading this you can take Zapvol to production from scratch, and you’ll understand why each choice is the way it is. Not a selection comparison — see Architecture Overview for that.
Key parameters
| Item | Value |
|---|---|
| Frontend hosting | Cloudflare Pages |
| Backend hosting | Railway |
| Database | Railway Postgres plugin |
| Queue / Cache | Railway Redis plugin |
| File storage | Cloudflare R2 |
| Log backend | Grafana Cloud Loki |
| Server image | Single Dockerfile, shared by server / worker |
| Starting monthly cost | $5–15 |
Why Cloudflare + Railway
After elimination:
- Vercel doesn’t fit the server:
@hono/node-wsneeds persistent WebSocket connections, BullMQ workers need long-lived processes. Vercel Functions is a serverless model — neither runs there. - Cloudflare Workers doesn’t fit the server:
@hono/node-server/ioredis/postgresaren’t compatible with the Workers runtime. Rewriting all of them isn’t free. - Fly.io recommends Upstash Redis — Upstash has connection-count and per-command pricing constraints that fight BullMQ’s blocking-read + persistent-TCP workload. Not a great match.
- Railway gives real Postgres + real Redis containers, with server / worker / DB / cache all sharing a private
network
*.railway.internal. Zero egress charges, low latency, and monorepo multi-service support.
Why not Railway for the frontend too — Cloudflare Pages has a far denser CDN edge and serves static assets free.
Topology
Four Railway services + two CF Pages projects:
| Service | Platform | Role | Public ingress |
|---|---|---|---|
marketing | Cloudflare Pages | Marketing site (Astro) | zapvol.com |
web | Cloudflare Pages | Web app (Vite SPA) | app.zapvol.com |
server | Railway | API + WebSocket (Hono) | api.zapvol.com |
worker | Railway | BullMQ queue consumer | none |
postgres | Railway plugin | Primary database | private only |
redis | Railway plugin | BullMQ + cache | private only |
External dependencies: Anthropic / OpenAI (agent inference), Cloudflare R2 (file storage), Grafana Cloud Loki (logs).
Server / worker share one image
All deployment artifacts live under docker/:
docker/
├─ Dockerfile multi-stage build
├─ docker-compose.yaml self-hosted deploy + local prod-mirror (pulls ghcr image)
├─ docker-compose.build.yaml override — build the image locally instead of pulling
├─ railway.server.json Railway server service config
└─ railway.worker.json Railway worker service config
docker/Dockerfile produces one image; two services reuse it with different startCommand:
docker/Dockerfile (build context = repo root)
├─ Stage 1-3: install + build (pnpm workspaces, tsup)
├─ Stage 4: prod-only deps
└─ Stage 5: production image
└─ CMD ["node", "apps/server/dist/index.mjs"] ← default: server
Railway service "server" → CMD: node apps/server/dist/index.mjs
Railway service "worker" → CMD: node apps/server/dist/worker.mjs
Why not run both processes in one CMD: a worker crash takes the server down with it; scaling granularity is also coarser. Two services with independent restart policies and scaling tiers is the better shape.
Two Railway configs:
docker/railway.server.json ← server service
docker/railway.worker.json ← worker service
Server config — key fields:
{
"build": { "builder": "DOCKERFILE", "dockerfilePath": "docker/Dockerfile" },
"deploy": {
"startCommand": "node apps/server/dist/index.mjs",
"preDeployCommand": "node apps/server/dist/db-migrate.mjs",
"healthcheckPath": "/health",
"healthcheckTimeout": 30,
"restartPolicyType": "ON_FAILURE",
"restartPolicyMaxRetries": 5
}
}
preDeployCommand runs once before the new release starts, dedicated to Drizzle migrations. The worker config does
not have one — running DDL from two processes simultaneously creates lock contention; migrations belong on the
server side only.
Local prod-mirror (run this before pushing to Railway)
docker/docker-compose.yaml replicates Railway’s service topology (postgres + redis + migrate + server + worker) and is
the same compose used for self-hosted deploys — not a separate “test-only” config.
By default it pulls ghcr.io/zapvol/zapvol-server:${VERSION:-latest}. To test code that isn’t published yet, overlay
docker-compose.build.yaml so Compose builds the image from docker/Dockerfile on the spot:
# From the repo root — local-build (the usual dev flow)
docker compose -f docker/docker-compose.yaml -f docker/docker-compose.build.yaml up -d --build
# Or pull from ghcr (verifying a published version)
docker compose -f docker/docker-compose.yaml up -d
# Healthcheck
curl http://localhost:8001/health # → {"ok":true}
Startup ordering is enforced by depends_on:
postgres → migrate (one-shot, runs Drizzle migrations and exits) → server / worker. Same semantic as Railway’s
preDeployCommand → server start.
Credentials (AI / R2 / OAuth) live in the repo-root .env (gitignored); Compose loads them automatically. At minimum
set BETTER_AUTH_SECRET and one AI provider key — otherwise the server boots fine but agent calls return 401. The full
variable list is in .env.example.
Deploy timeline
A single git push triggers the full pipeline. CF Pages and Railway are fired in parallel (the same push webhook
reaches both); the diagram serializes them for readability.
Robustness comes from three things: (1) migration runs in exactly one place and any failure rolls back the whole
release; (2) server only takes traffic after /health passes; (3) worker only restarts after server has succeeded.
Step-by-step deploy
First-time onboarding
1. Create the Railway project
# In Railway dashboard:
# 1. New Project → Deploy from GitHub repo → pick zapvol
# 2. Inside project → Add Service → Database → PostgreSQL (plugin)
# 3. Add Service → Database → Redis (plugin)
2. Configure the server service
The repo is already linked, so Railway auto-detects the Dockerfile. Manual tweaks:
- Settings → Source → Config-as-code Path:
docker/railway.server.json - Settings → Source → Watch Paths:
apps/server/**,packages/backend/**,packages/common/**,Dockerfile,pnpm-lock.yaml - Settings → Networking → Public Networking: enable, bind
api.zapvol.com
3. Duplicate as the worker service
Railway lets you Duplicate an existing service inside the same project so you don’t redo the GitHub link:
- Add Service → From existing service → server
- Change Config-as-code Path to
docker/railway.worker.json - Leave Networking off — the worker has no public ingress
4. Environment variables
Each service has its own Variables tab. Railway uses ${{ServiceName.VAR}} for cross-service references:
server — required:
NODE_ENV=production
PORT=8001
# Private connection strings — use the PRIVATE variant to avoid egress charges
DATABASE_URL=${{Postgres.DATABASE_PRIVATE_URL}}
REDIS_URL=${{Redis.REDIS_PRIVATE_URL}}
# better-auth
BETTER_AUTH_URL=https://api.zapvol.com
BETTER_AUTH_SECRET=<openssl rand -hex 32>
BETTER_AUTH_COOKIE_DOMAIN=.zapvol.com # so the web subdomain shares sessions
# CORS — every public origin that hits api.zapvol.com
CORS_ORIGINS=https://app.zapvol.com,https://zapvol.com
# Model providers
ANTHROPIC_API_KEY=<...>
OPENAI_API_KEY=<...>
# File storage — R2
R2_ACCESS_KEY_ID=<...>
R2_SECRET_ACCESS_KEY=<...>
R2_BUCKET=zapvol-prod
R2_ENDPOINT=https://<account>.r2.cloudflarestorage.com
# Logs — Grafana Cloud
LOKI_URL=https://logs-prod-xxx.grafana.net
LOKI_USERNAME=<...>
LOKI_PASSWORD=<...>
worker is the same minus PORT / BETTER_AUTH_* / CORS_ORIGINS (no HTTP surface). Everything else — DB, Redis,
AI keys, R2, Loki — is required.
5. CF Pages
Two projects, both linked to the same GitHub repo:
| CF Pages project | Repo path | Build command | Output dir | Custom domain |
|---|---|---|---|---|
zapvol-marketing | apps/marketing | pnpm install && pnpm --filter=marketing build | apps/marketing/dist | zapvol.com |
zapvol-web | apps/web | pnpm install && pnpm --filter=web build | apps/web/dist | app.zapvol.com |
The web SPA needs an apps/web/public/_redirects for client-side routing fallback:
/* /index.html 200
Without this, refreshing any non-root path 404s.
Environment variables (web only):
VITE_API_BASE_URL=https://api.zapvol.com
VITE_WS_URL=wss://api.zapvol.com/ws
6. DNS
In Cloudflare DNS:
zapvol.com → CNAME zapvol-marketing.pages.dev (proxied)
app.zapvol.com → CNAME zapvol-web.pages.dev (proxied)
api.zapvol.com → CNAME <railway-server>.up.railway.app (DNS only — important)
api.zapvol.com must be set to “DNS only”, not proxied — Cloudflare’s free-tier proxy terminates WebSockets, and
the server depends heavily on WS. Going direct to Railway’s edge gives more stable long-lived connections.
Subsequent pushes
git push origin main triggers the full pipeline. Both CF Pages and Railway auto-deploy — no manual steps.
Rollback: Railway → service → Deployments → previous version → Redeploy. One click. Same on CF Pages.
What this setup does NOT do
- No hand-rolled CI/CD: no GitHub Actions, ArgoCD, Jenkins. CF + Railway handle it.
- No container orchestration: no Kubernetes, no docker-compose for production. Until traffic hits hundreds of thousands of QPS, Railway’s service model is enough.
- No multi-region: the server is single-region. The frontend already gets global reach via CF’s edge; the complexity of cross-region replication (consistency, write routing) outweighs the win at this stage.
- No serverless: see “Why not Vercel / CF Workers” above.
- No self-hosted Postgres / Redis in the same container as the app. Use the Railway plugins. Backups, version upgrades, disk growth, and HA are someone else’s problem.
Pitfall checklist
Each item below corresponds to a real failure mode:
- CF proxying WebSocket — covered above.
api.zapvol.commust be DNS only. If WS connections drop ~30s in orwss://handshakes 200 then close, check CF proxy first. - Postgres public vs private URL —
DATABASE_URLdefaults to the public hostname, routes through the public internet, and counts against egress. UseDATABASE_PRIVATE_URLfor*.railway.internal. - preDeployCommand on the wrong service — workers must not run migrations. Don’t cross-wire the two
railway.*.jsonfiles. /healthgoes through the full middleware stack — currently it traversespino-logger+cors+requestContext, so every healthcheck writes a log line. Loki cardinality is fine but it’s noisy. If that bothers you, register/healthbefore those middlewares.- Drizzle migrations folder missing from the image — Stage 5 of
Dockerfilemust explicitlyCOPY --from=build /app/apps/server/drizzle/ apps/server/drizzle/. Without it,db-migratefails to find migration files, the pre-deploy hook errors, and the entire release rolls back. - better-auth cookie domain — if
BETTER_AUTH_COOKIE_DOMAINisn’t.zapvol.com, the web app (app.zapvol.com) can’t read session cookies set by the server (api.zapvol.com). - CORS_ORIGINS missing
marketing— if marketing has a “Try now” button callingapi.zapvol.comdirectly,https://zapvol.commust also be in the allowlist. - Watch Paths too broad — Railway watches the whole repo by default. A marketing tweak then triggers a server rebuild and burns build minutes. Narrow Watch Paths to the four entries listed above.
- Healthcheck timeout — default is 30 s. If cold starts load large skill packages via
initToolRegistry()and exceed that, bumphealthcheckTimeoutto 60 s. db-migrateis idempotent — it’s a stateless process that scansdrizzle/and compares against__drizzle_migrationsevery run, skipping applied ones. Safe to re-trigger.
Cost estimate
For current scale (< 100 DAU, single server / worker instance, single Postgres / Redis):
| Item | Monthly |
|---|---|
| Railway Hobby plan (includes $5 usage credit) | $5 |
| Railway actual usage (server + worker + DB + Redis at 256MB–1GB each) | $0–10 |
| Cloudflare Pages (marketing + web) | $0 |
| Cloudflare R2 (< 10 GB storage, < 1M ops) | $0 |
| Grafana Cloud (free tier) | $0 |
| Total | $5–15 |
Scaling up:
- Each extra server replica ≈ +$3–5/month
- Postgres at 4 GB RAM ≈ +$15/month
- Once traffic really takes off, plan for ECS / Kubernetes — by which time the monthly bill starts in the hundreds.
Further reading
- Observability overview — how pino → Alloy → Loki → Grafana wires up
- Local dev to Grafana Cloud — local logs into the production dashboard
- Observability dashboards — what to watch after launch
- Architecture overview — per-service technical choices in depth