Guide

GraphQL API design explained: schema, queries, and best practices

GraphQL is a query language and runtime that lets clients describe exactly which fields they need from your API in a single HTTP request. Instead of chaining multiple REST calls or over-fetching unused JSON, a mobile app can request user { name orders { id total } } and get a shaped response. That flexibility is powerful — and dangerous without discipline. This guide covers schema-first design, resolvers, the infamous N+1 problem, pagination, errors, security limits, and when GraphQL is the right tool versus REST.

What GraphQL solves (and what it does not)

REST maps resources to URLs; GraphQL maps a typed graph to a single endpoint (usually POST /graphql). Clients send a query document; the server validates it against a published schema and executes resolvers to fetch data. Benefits:

Precise fetching — clients request only the fields they render, reducing payload size on slow networks.
One round trip — nested relationships (user → orders → line items) resolve in a single request instead of three REST calls.
Strong contracts — the schema is the API documentation; breaking changes are visible at compile time in typed clients.
Evolution without versioning — add optional fields; deprecate old ones with @deprecated instead of shipping /v2.

GraphQL does not replace databases, caching layers, or authorization logic. It is a composition layer on top of them. File uploads, simple CRUD microservices, and cache-friendly public CDNs often stay cleaner in REST. Use GraphQL when many client types need different field shapes from the same backend — dashboards, mobile apps, and partner integrations pulling from shared domain models.

Schema-first design

Start with the schema — a contract written in GraphQL SDL (Schema Definition Language). Types describe your domain; root Query and Mutation types define entry points.

type User {
  id: ID!
  email: String!
  displayName: String
  orders(first: Int = 20, after: String): OrderConnection!
  createdAt: DateTime!
}

type Query {
  user(id: ID!): User
  me: User
}

type Mutation {
  updateProfile(input: UpdateProfileInput!): UpdateProfilePayload!
}

Naming and nullability conventions

Use PascalCase for types (OrderLineItem) and camelCase for fields (displayName).
! means non-null. Put it on fields that are always present; avoid blanket ! on every field — nullable types express optional data honestly.
Prefer enums for fixed sets (enum OrderStatus { PENDING PAID SHIPPED }) over stringly-typed status fields.
Use input types for mutation arguments (UpdateProfileInput) so you can add fields without breaking callers.
Return payload types (UpdateProfilePayload { user, errors }) instead of bare entities — room for validation errors without throwing HTTP 400 for business-rule failures.

Publish the schema in your repo and generate types for server resolvers and client SDKs (GraphQL Code Generator, Apollo, gql.tada). Schema-first means the contract leads implementation, not the other way around.

Queries, mutations, and subscriptions

Queries are read operations. They should be side-effect free and safe to retry. Treat them like HTTP GET — idempotent reads only.

Mutations change state: create orders, update profiles, cancel subscriptions. GraphQL sends mutations sequentially by default (not in parallel like sibling query fields), which helps avoid race conditions but can increase latency if clients batch unrelated writes. Design mutations to be atomic per call: one business action per mutation, not a grab-bag of unrelated side effects.

Subscriptions push real-time updates over WebSockets or SSE. They shine for live feeds — order status, chat, price ticks — but add operational complexity (connection scaling, auth on long-lived sockets). If you only need occasional updates, polling or webhooks may be simpler. For browser push without custom sockets, see WebSockets and SSE.

Resolvers and the N+1 problem

A resolver is a function that fetches data for one field. The default pattern is intuitive: a User.orders resolver loads orders for each user. The trap appears when a query lists 50 users — your server fires 50 separate database queries. That is the N+1 problem.

Fixes, often combined:

DataLoader — batch and cache loads within a single request. Collect all userId values, issue SELECT * FROM orders WHERE user_id IN (...) once, distribute results.
Join at the parent — if clients almost always want orders with users, fetch both in the root resolver with a SQL join.
Lookahead / query planning — inspect the requested field tree before executing and prefetch related data.

Resolver performance is where GraphQL APIs live or die. Every nested field is a potential database round trip. Instrument resolver timings and watch p95 latency under realistic queries — not just { user { id } } smoke tests. Slow list endpoints often trace back to missing database indexes on foreign keys you join in resolvers.

Pagination: connections and cursors

Offset pagination (limit 20 offset 40) breaks under concurrent writes — rows shift between pages. GraphQL popularized the Relay connection spec:

type OrderConnection {
  edges: [OrderEdge!]!
  pageInfo: PageInfo!
  totalCount: Int
}

type OrderEdge {
  cursor: String!
  node: Order!
}

type PageInfo {
  hasNextPage: Boolean!
  endCursor: String
}

Clients pass first: 20, after: "cursor" to page forward. Cursors are opaque, stable tokens (often base64-encoded sort keys), not raw offsets. Document maximum first values (e.g. cap at 100) and reject larger requests — unbounded list fields are an easy denial-of-service vector.

Errors and partial results

GraphQL returns HTTP 200 even when parts of a query fail. The response includes both data and an errors array with paths pointing to the failing field. This enables partial success: a dashboard query can return user profile data while a downstream inventory service times out on one nested field.

Structure errors consistently:

message — human-readable summary safe to show users.
path — which field failed (["user", "orders", 2, "total"]).
extensions.code — machine-readable code (UNAUTHENTICATED, FORBIDDEN, BAD_USER_INPUT).

Do not leak stack traces or SQL in message. For validation failures on mutations, prefer returning errors inside the payload type (errors: [UserError!]) so clients can display field-level feedback without treating the whole mutation as a transport failure.

Security and abuse prevention

A single GraphQL request can expand into thousands of resolver calls. Production APIs need guardrails:

Query depth limit — reject queries nesting more than N levels (e.g. 10).
Complexity scoring — assign cost per field; reject queries above a budget.
Persisted queries — mobile and web clients send a hash of a pre-registered query instead of arbitrary strings, blocking ad-hoc introspection attacks.
Disable introspection in production — or restrict it to authenticated developers.
Authentication on every resolver — never trust the client to skip sensitive fields; enforce row-level access in resolvers, not just at the root.
Rate limiting — apply per API key and per IP. GraphQL makes cost-based limits more important than request-count limits; see our API rate limiting guide.

Browser clients calling GraphQL from SPAs need correct CORS headers on the GraphQL endpoint. Send auth tokens in headers, not query strings — URLs get logged; headers should not.

GraphQL vs REST: a decision framework

Choose GraphQL when…	Stay with REST when…
Many clients need different field subsets of the same data	One or two known clients with stable response shapes
Aggregating multiple backend services behind one gateway	Simple CRUD with excellent HTTP caching (CDN-friendly public JSON)
Mobile bandwidth is costly and over-fetching hurts UX	File uploads, binary streams, or webhook callbacks are primary
You can invest in resolver tooling, DataLoader, and query cost limits	Team is small and REST + OpenAPI already ships features quickly

Hybrid architectures are common: REST for payments webhooks and health checks, GraphQL for the product API surface. Solana indexers and portfolio apps often expose GraphQL because wallet dashboards need flexible nested queries across tokens, NFTs, and transactions — but payment verification endpoints stay simple POST handlers.

Common mistakes to avoid

Exposing your raw database schema as GraphQL types — domain types should reflect product concepts, not table names.
Putting business logic only in resolvers with no service layer — resolvers should orchestrate, not contain 200-line SQL blocks.
Ignoring N+1 until production load hits — load-test nested list queries early.
Using mutations for reads (mutation { searchUsers(...) }) — queries are cacheable; mutations are not.
Returning unbounded arrays without pagination — always paginate lists that can grow.
Skipping query cost limits — one malicious client can flatten your database.
Treating HTTP 200 as "no errors" — always check the errors array client-side.

Design checklist

Schema published and version-controlled; types use consistent naming and nullability.
Mutations are atomic; input and payload types separate validation from transport errors.
DataLoader or equivalent batching on every nested relationship.
Cursor-based pagination on all list fields that can exceed a few dozen rows.
Depth and complexity limits enforced; introspection restricted in production.
Auth checked per field for sensitive data, not only at the query root.
Rate limits and observability on resolver latency — pair with metrics and tracing to find slow fields.