Guide
Microservices architecture explained
Microservices split a software product into independently deployable services, each owning a slice of business capability and its own data store. Instead of one giant codebase that ships together, teams ship small services on their own cadence — scale the checkout service without redeploying the blog. The trade-off is operational complexity: network calls replace function calls, failures become partial, and debugging spans dozens of logs. This guide explains what microservices actually buy you, how to draw service boundaries, which communication patterns work in production, and the anti-patterns that turn a promising architecture into a slower, more fragile monolith spread across fifteen repositories. For foundational context, see our guides on distributed consistency, message queues, and Kubernetes fundamentals.
Monolith vs microservices — what changes
A monolith is a single deployable unit: one process (or one tightly coupled cluster) running all features — user accounts, catalog, payments, notifications. Developers call functions across module boundaries inside one address space. Transactions span tables in one database. Deployments are all-or-nothing: a typo in the newsletter template can block a critical payment fix from reaching production.
Microservices invert that packaging. Each service is a separate deployable with its own repository (usually), runtime, and release pipeline. Services communicate over the network via HTTP, gRPC, or asynchronous events. The goal is independent deployability and team autonomy, not smaller functions. You can still write bad code inside a microservice; you have simply moved the coupling boundary from compile-time imports to network contracts.
Neither shape is universally superior. Early-stage products benefit from monolith simplicity: one database, one deployment, easy refactors. Microservices pay off when organizational scale — multiple teams, different scaling profiles, regulated components that must ship on separate schedules — exceeds what a well-structured monolith can carry without constant merge conflicts and coordinated releases.
When to split — and when to wait
The most expensive microservices mistake is premature decomposition. Teams split along technical layers ("user-service", "database-service") instead of business capabilities, then discover every feature requires coordinated changes across five services — a distributed monolith with all the pain of distribution and none of the autonomy.
Strong signals that microservices may help:
- Different scaling needs — image processing needs GPU workers; auth needs a tiny always-on API. Running both in one process wastes money or starves one workload.
- Independent release cadence — compliance requires a payment service audited quarterly while the marketing site ships daily.
- Technology heterogeneity — a Python ML inference service alongside a TypeScript API without forcing one runtime.
- Fault isolation — a bug in the recommendation engine must not take down checkout.
- Team boundaries — Conway's Law: architecture mirrors communication structure. Two teams stepping on the same monolith module need a seam.
Weak signals — often excuses rather than reasons:
- "Microservices are modern" without organizational pain to solve.
- Splitting before domain boundaries are understood — you will cut along the wrong lines and pay migration tax twice.
- Fewer than ~10 engineers and no production scaling crisis — operational overhead likely exceeds benefit.
A proven path: start with a modular monolith — clear package boundaries, no cross-module database access, interfaces that could become HTTP later — and extract services only when a specific bottleneck or team conflict demands it.
Drawing service boundaries with domain-driven design
Good boundaries align with business capabilities, not CRUD tables. Domain-driven design (DDD) offers vocabulary: a bounded context is a coherent subdomain where terms have one meaning. "Customer" in billing includes credit limits; in support it includes ticket history. Merging those models into one service creates endless special cases; splitting them lets each evolve independently.
Practical boundary heuristics
- High cohesion inside, loose coupling outside — changes to pricing rules should not require redeploying the email service.
- One team per service (ideally) — shared ownership of multiple services reintroduces coordination overhead.
- Data ownership — only the Orders service writes to the orders table. Others request data via API or consume events; they do not query foreign databases directly.
- Minimize chatty sync calls — if Service A needs five round trips to Service B per user request, the boundary is probably wrong or needs a read model.
Start by mapping user journeys: place order, process payment, ship package, send receipt. Each step that can fail, scale, or change policy independently is a candidate service — not every step; the ones with real operational divergence.
Communication patterns: sync vs async
Synchronous HTTP and gRPC
REST over HTTP is the default integration style: Service A calls Service B and waits for a response. Simple to debug with curl and standard load balancers. Downsides: temporal coupling (B must be up), latency stacks across call chains, and cascading failures when B slows down.
gRPC uses Protocol Buffers over HTTP/2 — strongly typed contracts, streaming, lower overhead. Common for internal east-west traffic between services in the same cluster. Less friendly to browser clients without a gateway translation layer.
Asynchronous events and messaging
Event-driven patterns decouple producers and consumers in time. OrderPlaced events fan out to inventory, billing, and analytics without the order service knowing every subscriber. Message brokers (Kafka, RabbitMQ, SQS) provide buffering, retries, and at-least-once delivery — but require idempotent consumers and clarity about ordering guarantees.
Use sync when the caller needs an immediate answer (authorization check at checkout). Use async when the operation can complete eventually (send welcome email, update search index). Hybrid systems are normal; the mistake is using sync chains five hops deep for workflows that tolerate seconds of delay.
Data: database per service and distributed transactions
The microservices mantra "database per service" means each service owns its schema privately. No shared tables, no cross-service JOINs in SQL. Other services get data through APIs or replicated read models built from events.
This abandons single-database ACID transactions across domains. Placing an order and
charging a card cannot be one BEGIN … COMMIT spanning two PostgreSQL
instances. Patterns that replace two-phase commit:
- Saga — a sequence of local transactions with compensating actions (reserve inventory, charge card; if charge fails, release inventory).
- Outbox pattern — write business row and outbound event in one local transaction; a relay publishes to the message bus reliably.
- Eventual consistency — accept temporary inconsistency (order confirmed, wallet balance updates a second later) with clear UX for pending states.
Read our database transactions guide for ACID basics and consistency models before choosing how strong cross-service guarantees must be.
Deployment, discovery, and the API gateway
Each microservice runs as one or more container instances behind a
Kubernetes Deployment
(or similar orchestrator). Services register with DNS or a discovery layer; callers
resolve payments.default.svc.cluster.local instead of hard-coded IPs.
An API gateway sits at the edge: authenticates clients, terminates TLS,
routes /api/orders to the order service, applies
rate limits,
and aggregates responses when needed. A
reverse proxy like
nginx can serve small setups; dedicated gateways (Kong, Envoy) add policy plugins and
observability hooks.
A service mesh (Istio, Linkerd) injects sidecar proxies for mutual TLS, retries, and traffic splitting between services — valuable at dozens of services, heavy for a handful. Start without a mesh; add when plain client libraries cannot enforce policy consistently.
Packaging with Docker, CI/CD pipelines per repo, and feature flags for gradual rollouts are table stakes. Independent deploys only help if deploys are actually independent — shared libraries versioned in lockstep recreate monolith coupling.
Resilience: what breaks in production
Network calls fail. Design for it:
- Timeouts everywhere — unbounded waits tie up thread pools. Set client timeouts shorter than upstream deadlines.
- Retries with jitter — retry idempotent reads and safe writes; never blind-retry a payment charge without idempotency keys.
- Circuit breakers — stop hammering a failing dependency; fail fast and degrade gracefully (show cached catalog, disable recommendations).
- Bulkheads — isolate thread pools per dependency so one slow service cannot exhaust all workers.
- Health checks — liveness vs readiness probes so orchestrators route traffic only to instances that can serve.
Observability is non-negotiable: structured logs with correlation IDs propagated across service boundaries, metrics (latency, error rate, saturation), and distributed tracing (OpenTelemetry) to reconstruct call graphs. Without tracing, a 2-second checkout latency spike becomes a week-long guessing game.
Common pitfalls and a migration checklist
Pitfalls
- Distributed monolith — shared databases, synchronized releases, circular service dependencies.
- Chatty APIs — N+1 service calls per page load; fix with batch endpoints or materialized views.
- Missing contract tests — consumer-driven contracts (Pact) catch breaking API changes before production.
- Ignoring local dev ergonomics — if engineers need 12 containers to fix a typo, velocity dies. Invest in docker-compose overlays or remote dev environments.
- Shared mutable libraries — a "common" jar that changes weekly forces coordinated deploys across all services.
Extraction checklist
Before carving a module out of a monolith:
- Define the bounded context and its public API contract.
- Move data ownership — migrate tables, block direct access from the monolith.
- Replace in-process calls with HTTP or events; add integration tests.
- Instrument tracing and dashboards before cutover, not after an outage.
- Run shadow traffic or dual-write validation during migration.
- Document failure modes and on-call runbooks per service.
Key takeaways
- Microservices trade single-process simplicity for independent deployability, scaling, and fault isolation — at the cost of distributed operations.
- Split on business boundaries, not technical layers; premature decomposition creates distributed monoliths.
- Database per service eliminates cross-domain JOINs; sagas, outbox, and eventual consistency replace global transactions.
- Mix sync and async — HTTP/gRPC for immediate queries, events for fan-out and background work.
- Resilience and observability are features: timeouts, circuit breakers, correlation IDs, and tracing are as important as the service code.
- Start modular, extract when forced — a well-structured monolith beats a poorly drawn microservices estate.
Related reading
- Distributed systems consistency explained — CAP, eventual vs strong models, and quorum reads
- Event-driven architecture explained — events vs commands, idempotent consumers, delivery guarantees
- Kubernetes fundamentals explained — pods, Deployments, Services, and scaling workloads
- CI/CD pipelines explained — automated deploys, canary releases, and rollback playbooks