Guide

Serverless computing explained

Serverless computing does not mean there are no servers — it means you stop managing them. A cloud provider runs your code in short-lived execution environments that spin up on demand, scale to zero when idle, and bill per invocation rather than per reserved machine hour. The dominant flavor is function-as-a-service (FaaS): upload a handler, wire it to an event source (HTTP request, queue message, file upload, cron schedule), and the platform handles provisioning, patching, and horizontal scaling. AWS Lambda, Google Cloud Functions, Azure Functions, and edge runtimes like Cloudflare Workers all follow this model with different limits and pricing curves. Serverless is excellent for bursty, stateless glue work — webhooks, image thumbnails, ETL transforms, API backends with unpredictable traffic — and a poor fit for long-running processes, persistent WebSocket sessions, or workloads where millisecond cold-start latency is unacceptable. This guide explains the architecture, the hidden constraints, and how to decide whether serverless belongs in your stack alongside containers and Kubernetes.

What "serverless" actually means

The term bundles two ideas that are often conflated:

  • Operational serverless — you do not SSH into VMs, patch kernels, or right-size instance types. The provider owns the runtime.
  • Billing serverless — you pay for consumed resources (invocations, GB-seconds of memory, outbound data) rather than idle capacity.

FaaS is the clearest example: one function, one responsibility, triggered by events. Backend-as-a-service (BaaS) is the sibling category — managed databases (Firebase Firestore), auth (Auth0, Cognito), and storage (S3) where you also skip server ops but integrate via SDKs instead of uploading functions. A modern "serverless app" usually mixes both: Lambda handlers behind API Gateway, DynamoDB for state, SQS for async work, and CloudFront at the edge.

The mental model is event-driven micro-functions rather than a monolithic process listening on port 8080 forever. Each invocation is isolated, ephemeral, and should complete quickly. That constraint shapes every design decision downstream.

How FaaS platforms run your code

When an event arrives — say, an HTTP POST to API Gateway — the platform checks whether a warm execution environment already exists for that function version.

Cold starts vs warm invocations

A cold start happens when no idle container is available. The provider must download your deployment package, start a sandbox (often a Firecracker micro-VM or a hardened container), initialize the language runtime, and run your module-level initialization code before executing the handler. Cold starts range from tens of milliseconds (Cloudflare Workers on V8 isolates) to multiple seconds (Java Spring Boot on Lambda with a 250 MB package).

A warm invocation reuses the existing environment. Only your handler runs — latency drops sharply. Providers keep environments alive for a few minutes after the last request, but there is no guarantee; traffic dips bring cold starts back. Provisioned concurrency (pre-warmed instances you pay for) and smaller deployment artifacts are the main mitigations.

Concurrency and limits

Each function has a concurrency limit — how many simultaneous executions are allowed account-wide and per function. Burst traffic beyond the limit gets throttled (HTTP 429) unless you request a quota increase. Timeouts are hard caps (Lambda defaults to 3 seconds, max 15 minutes). Memory allocation is configurable and proportionally increases CPU share on most platforms. Design handlers to finish well inside the timeout and to be idempotent because retries are common when clients or queues redeliver events.

Statelessness requirement

Execution environments are recycled without notice. Anything stored in global variables, local disk, or process memory may disappear between invocations. Durable state belongs in external stores: DynamoDB, Redis, S3, PostgreSQL. File writes to /tmp are allowed on some platforms but limited in size and lifetime — treat them as scratch space, not storage.

Common trigger patterns

Serverless shines when work arrives as discrete events rather than a steady stream you would colocate on one long-lived server:

  • HTTP APIs — API Gateway or Function URLs map routes to handlers. Good for REST backends with spiky traffic; pair with rate limiting at the edge.
  • Queue consumers — SQS, Pub/Sub, or Kafka triggers process messages one batch at a time. Natural fit for event-driven pipelines and decoupled workers.
  • Scheduled jobs — CloudWatch Events or cron triggers replace a always-on cron daemon for nightly reports or cache warming.
  • Storage events — S3 object-created notifications kick off image resizing, virus scanning, or log indexing.
  • Stream processing — DynamoDB Streams or Kinesis shards invoke functions per record batch for near-real-time aggregation.
  • Webhooks — payment providers, GitHub, and SaaS integrations POST to a function URL; verify signatures, enqueue work, return 200 fast.

The anti-pattern is running a traditional web framework that expects persistent connections, in-memory session stores, and background threads — then wondering why cold starts and 15-minute timeouts hurt.

Pricing: when serverless saves money — and when it does not

FaaS pricing is deceptively simple: pay per million invocations plus GB-seconds of memory duration. At low or irregular volume, that beats renting a $50/month VPS that sits idle 90% of the time. A webhook that fires 10,000 times a month at 128 MB for 200 ms each costs pennies.

Costs climb when:

  • Traffic is steady and high — a always-on container on Kubernetes or a small VM often wins on unit economics above a few hundred sustained requests per second.
  • Functions are memory-heavy or slow — GB-seconds multiply; a 3 GB Python ML inference at 2 seconds per call adds up fast.
  • Outbound data transfer — moving large payloads out of the cloud region is billed separately and ignored in back-of-envelope math.
  • Provisioned concurrency — eliminating cold starts reintroduces reserved capacity you pay for whether or not requests arrive.

Run the arithmetic with real p50/p95 duration and monthly invocation counts before committing. Many teams start serverless for MVP speed and migrate hot paths to containers when the bill crosses the break-even line.

Serverless vs containers vs Kubernetes

These are not mutually exclusive — mature platforms use all three for different tiers:

  • Serverless (FaaS) — fastest path to deploy a single function; zero cluster ops; best for event glue and variable load; worst for long-lived connections and tight latency SLAs on cold paths.
  • Containers (Docker on a VM or managed service) — full control of runtime, arbitrary process lifetimes, WebSockets, and custom binaries; you still manage scaling unless you use a platform like ECS Fargate or Cloud Run that blurs the line toward serverless containers.
  • Kubernetes — maximum flexibility for multi-service apps, stateful workloads with operators, custom networking, and teams that already operate clusters. Higher baseline complexity; overkill for one cron job.

Cloud Run and AWS Fargate sit in the middle: container images with serverless scaling and per-request billing, trading some FaaS purity for portability of your existing Docker artifacts.

Edge serverless: Workers and latency-sensitive paths

Cloudflare Workers, Vercel Edge Functions, and Deno Deploy run JavaScript, WebAssembly, or Python at points of presence close to users. They use V8 isolates instead of full containers, so cold starts are typically sub-10 ms and there is no regional "spin up a VM" phase. Constraints are tighter: CPU time limits per request, restricted Node APIs, and smaller bundle sizes.

Edge functions excel at auth checks, A/B routing, geo-based redirects, HTML personalization, and caching logic — work that must run before the origin responds. They complement a CDN rather than replacing a full backend. Heavy database writes and multi-second computations still belong at a regional origin or in a queue-backed worker.

Production pitfalls teams learn the hard way

  • Cold-start latency on user-facing paths — measure p99 with realistic package sizes; shrink dependencies, use ARM Graviton where cheaper, or add provisioned concurrency for hot routes.
  • Connection storms to databases — thousands of concurrent Lambdas each opening a Postgres connection exhausts max_connections. Use connection pooling (RDS Proxy, PgBouncer) or serverless-friendly databases (Aurora Serverless, PlanetScale, DynamoDB).
  • Vendor lock-in — IAM, trigger wiring, and SDK calls are cloud-specific. Abstraction frameworks (Serverless Framework, SST, Pulumi) help but do not eliminate migration cost.
  • Observability gaps — distributed traces across API Gateway, Lambda, and downstream services require deliberate instrumentation. Wire structured logs, metrics, and traces from day one.
  • Local dev friction — emulating cloud triggers on a laptop is imperfect. Invest in integration tests against deployed staging stacks and keep functions small enough to unit test without the full platform.
  • Security sprawl — each function needs its own least-privilege IAM role. A shared "god role" for convenience becomes a blast-radius nightmare.

Production checklist

  1. Confirm the workload is event-shaped — short, stateless, idempotent handlers triggered by HTTP, queues, or schedules.
  2. Measure cold-start p99 with production-sized artifacts; set SLOs before launch, not after user complaints.
  3. Externalize all durable state — no in-memory sessions; use managed stores with defined consistency needs.
  4. Pool database connections or choose a serverless-native datastore; load-test concurrency limits.
  5. Return quickly from triggers — offload heavy work to async queues; respond 202 and process in the background.
  6. Enable structured logging and tracing with correlation IDs across API edge, function, and downstream calls.
  7. Scope IAM per function — minimum permissions for the specific resources that handler touches.
  8. Plan the cost crossover — model monthly spend at 10x projected traffic; know when to move hot paths to containers.
  9. Automate deploys through a CI/CD pipeline with staged aliases and rollback, not manual console uploads.

Key takeaways

  • Serverless is an ops and billing model, not a magic scalability button — servers still exist, you just do not patch them.
  • Cold starts and statelessness are the defining constraints; design around them or pay for provisioned capacity.
  • Best for bursty, glue, and webhook workloads; poor for long-running jobs, WebSockets, and steady high-QPS services without cost modeling.
  • Combine with queues, managed data, and observability — a function alone is not an architecture.

Related reading