Guide

gRPC and Protocol Buffers explained: RPC, schemas, and streaming

gRPC is Google's open-source framework for remote procedure calls: your service calls a function on another machine as if it were local, with a strongly typed contract and a compact binary wire format. The payload language is almost always Protocol Buffers (protobuf) — a schema-first serialization format defined in .proto files and compiled into client and server stubs in Go, Java, Python, Rust, and dozens of other languages. gRPC rides on HTTP/2, so one TCP connection can multiplex many concurrent RPCs, support flow control, and carry metadata headers alongside message bodies. Inside a microservices mesh, gRPC is the default lingua franca for low-latency, high-throughput service-to-service traffic. This guide covers how protobuf schemas work, the four gRPC call types, deadlines and interceptors, versioning strategy, and when REST or GraphQL still wins for browser-facing APIs.

Why gRPC exists

JSON over HTTP/1.1 REST is excellent for public APIs humans debug in curl. It is less ideal when fifty internal services exchange millions of small messages per second. JSON parsing is CPU-heavy, field names repeat on every request, and HTTP/1.1's one-request-per-connection model encourages connection sprawl unless you add pooling layers.

gRPC addresses three pain points at once:

  • Contract-first APIs — the .proto file is the source of truth; breaking changes are caught at compile time instead of in production JSON parsing errors.
  • Efficient encoding — protobuf uses field numbers and variable-length integers; payloads are typically 3–10x smaller than equivalent JSON and faster to serialize.
  • First-class streaming — server streaming, client streaming, and bidirectional streams are part of the core model, not bolted on with WebSockets or SSE hacks.

The trade-off is ergonomics: protobuf binaries are not human-readable without tooling, browser support requires gRPC-Web proxies, and every consumer must regenerate stubs when schemas change. That is why gRPC dominates backend east-west traffic while REST and GraphQL remain common for north-south (client-to-server) APIs.

Protocol Buffers: schemas and wire format

A protobuf message is a typed struct. You declare it once in a .proto file, then run protoc (the protocol compiler) with a language plugin to generate structs and encode/decode helpers.

syntax = "proto3";

package orders.v1;

message CreateOrderRequest {
  string customer_id = 1;
  repeated LineItem items = 2;
  string idempotency_key = 3;
}

message LineItem {
  string sku = 1;
  int32 quantity = 2;
}

message CreateOrderResponse {
  string order_id = 1;
  int64 created_at_unix = 2;
}

Field numbers are permanent

Each field has a numeric tag (1, 2, 3…) that appears on the wire, not the field name. Never reuse a field number after deletion — old clients would mis-decode data. Reserve removed tags with reserved 3; and add new fields with new numbers. Unknown fields are skipped on read, which is how backward-compatible evolution works: old binaries ignore fields they do not know; new binaries populate optional new fields while old clients still function.

Scalar types and well-known types

Protobuf provides string, int32, int64, bool, bytes, and floating-point types. Use google.protobuf.Timestamp for instants and google.protobuf.Duration for timeouts instead of rolling your own epoch integers everywhere. For nullable semantics in proto3, use wrapper types like google.protobuf.StringValue or the optional keyword where your compiler version supports it.

Services in the same file

RPC methods are declared alongside messages:

service OrderService {
  rpc CreateOrder(CreateOrderRequest) returns (CreateOrderResponse);
  rpc StreamOrderEvents(OrderEventsRequest) returns (stream OrderEvent);
}

The compiler generates abstract client interfaces and server base classes. Your server implements CreateOrder; the framework handles framing, compression, and status codes.

The four gRPC call types

Every gRPC method is one of four patterns. Picking the right one avoids polling loops and oversized single responses.

Unary

One request, one response — the gRPC equivalent of a normal function call or REST POST. Most CRUD operations fit here: CreateOrder, GetBalance, ValidateToken.

Server streaming

Client sends one request; server streams many responses. Use for log tailing, large result sets that should arrive incrementally, or live price feeds. The client reads messages until the server closes the stream or returns an error status.

Client streaming

Client streams many requests; server returns one aggregated response. Common for bulk uploads, batched metric ingestion, or file chunk assembly where the server acknowledges once all pieces arrive.

Bidirectional streaming

Both sides send a sequence of messages independently — chat protocols, collaborative editing sync, or game state replication between backend shards. Ordering is per-stream; you design application-level heartbeats and backpressure because either side can stall.

HTTP/2 under the hood

gRPC maps each RPC to an HTTP/2 stream. Request and response metadata travel in HEADERS frames (content-type application/grpc, custom key-value pairs like auth tokens). Message bodies are length-prefixed protobuf blobs in DATA frames. A single HTTP/2 connection between two pods can carry hundreds of concurrent unary calls without opening hundreds of TCP sockets — a major win behind load balancers compared to HTTP/1.1 keep-alive pools.

gRPC status codes reuse HTTP semantics translated to a rich error model: NOT_FOUND, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, UNAVAILABLE. Clients should distinguish retryable codes (UNAVAILABLE, ABORTED) from permanent failures (INVALID_ARGUMENT, PERMISSION_DENIED) — the same discipline as idempotent retries on REST.

Deadlines, cancellation, and interceptors

Every gRPC call should carry a deadline (absolute time) or timeout (relative duration). When the deadline passes, the client cancels the RPC and the server should stop work — propagating the same deadline to downstream calls prevents one slow leaf service from pinning thread pools across the chain. This pairs naturally with circuit breakers at the client stub layer.

Interceptors (middleware for gRPC) wrap unary and streaming calls on both client and server. Typical uses: inject trace IDs for distributed tracing, attach JWT validation, log latency histograms, enforce rate limits, and redact sensitive fields. Keep interceptors fast — they run on every RPC.

Versioning and compatibility

Treat .proto files like database schemas: additive changes are safe, renames and type changes are breaking unless you version the package. Common patterns:

  • Package versioningorders.v1, orders.v2 as separate services or packages; run both during migration windows.
  • Field addition only — new optional fields with new numbers; never change the type of an existing field.
  • Deprecation annotations — mark fields [deprecated = true] and document removal timelines in changelogs.
  • Buf or prototool linting — CI checks that block field-number reuse and enforce style before merge.

Unlike JSON APIs where clients ignore unknown keys by default, protobuf's strict decoding means server and client must agree on field semantics. Rolling deploys therefore require backward-compatible schema changes first, then consumer updates, then producer cleanup — the same two-phase dance as database migrations.

gRPC vs REST vs GraphQL

ConcerngRPC + protobufREST + JSONGraphQL
Browser clientsNeeds gRPC-Web + proxyNativeNative
Payload size / speedExcellentModerateModerate (often POST)
StreamingBuilt-in four modesSSE / chunked hacksSubscriptions (server push)
Contract enforcementStrong (codegen)OpenAPI optionalSchema required
Debugging with curlHard (use grpcurl)EasyModerate
Best fitInternal microservicesPublic HTTP APIsFlexible client queries

Many teams expose REST or GraphQL at the edge and translate to gRPC behind an API gateway. That keeps developer experience friendly for mobile and web while preserving efficient binary RPC between core services. Our REST API design guide covers the public-surface patterns gRPC intentionally does not optimize for.

Security and operations

gRPC supports TLS by default — mutual TLS (mTLS) is standard in service meshes like Istio and Linkerd: each workload presents a certificate, and the mesh encrypts east-west traffic without application code changes. For authentication claims, pass bearer tokens in metadata headers; validate in server interceptors the same way you would on REST middleware.

Operational gotchas to plan for:

  • L7 load balancing — naive TCP load balancers pin HTTP/2 connections to one backend; use client-side load balancing, service mesh, or proxies that understand gRPC routing.
  • Health checks — implement the standard grpc.health.v1.Health service so orchestrators mark pods ready only when dependencies are warm.
  • Reflection — enable server reflection in dev so grpcurl can discover methods without local .proto files; disable in production unless tightly access-controlled.
  • Message size limits — default max message sizes may be 4 MB; raise consciously for bulk streaming and enforce at ingress.

Production checklist

  1. Define services and messages in versioned .proto packages; lint in CI.
  2. Generate stubs in both client and server repos (or a shared schema repo) on every schema merge.
  3. Set per-RPC deadlines; propagate deadlines across downstream gRPC calls.
  4. Add client interceptors for retries (idempotent RPCs only), tracing, and metrics.
  5. Use server interceptors for auth, logging, and panic recovery.
  6. Enable TLS; prefer mTLS for service-to-service traffic in production.
  7. Implement health checks and graceful shutdown (drain in-flight RPCs on SIGTERM).
  8. Load-test streaming paths separately — backpressure bugs show up only under sustained streams.
  9. Document which RPCs are idempotent and safe to retry; use idempotency keys in request messages where needed.
  10. Keep a REST or GraphQL edge if browser clients exist; do not force gRPC through the public internet without gRPC-Web.

Key takeaways

  • gRPC is contract-first RPC over HTTP/2 with protobuf payloads — optimized for fast, typed service-to-service calls.
  • Protobuf schemas use numbered fields; evolve APIs by adding fields, not reusing numbers or changing types.
  • Unary, server-streaming, client-streaming, and bidirectional streaming cover most backend communication patterns without polling.
  • Deadlines, interceptors, and proper status-code retry logic are non-optional for production resilience.
  • Use gRPC internally; keep REST or GraphQL at the public boundary unless you control every client.

Related reading