Guide

Docker fundamentals explained

Docker is the tool most developers reach for when they want to package an application — and everything it needs to run — into a single, portable unit called a container. Instead of emailing a setup guide ("install Node 20, PostgreSQL 16, copy these env vars…"), you ship an image that runs identically on a laptop, a CI runner, and a production server. Docker sits on top of Linux containers (cgroups and namespaces) and adds a developer-friendly workflow: build, tag, push, pull, run. This guide covers the core concepts, Dockerfile patterns that actually save time, local multi-service setups with Compose, and how Docker fits into a CI/CD pipeline before you ever touch Kubernetes.

Containers vs virtual machines

A virtual machine emulates full hardware and boots its own guest operating system. Each VM carries gigabytes of kernel and userspace overhead. A container shares the host kernel and isolates only the process tree, filesystem, and network namespace of one application. Startup is measured in milliseconds, not minutes, and you can run dozens of containers on hardware that supports a handful of VMs.

That isolation is real but not absolute. Containers are not security boundaries against a determined attacker with kernel exploits — treat them as packaging and dependency isolation, not a substitute for proper network segmentation and least-privilege IAM. For most web backends, APIs, and workers, the trade-off is overwhelmingly favorable: reproducible builds, pinned dependency versions, and no more "works on my machine" disputes.

Images and containers: the core distinction

An image is an immutable, layered filesystem snapshot plus metadata (default command, exposed ports, environment variables). A container is a running instance of an image — a writable layer stacked on top of the read-only image layers. You can start many containers from one image; each gets its own process ID namespace and optional writable layer that disappears when the container is removed (unless you attach a volume).

Think of the image as a class and the container as an object instance. docker build creates images; docker run creates and starts containers. Tags like myapp:1.4.2 are pointers to image IDs — always pin by digest (myapp@sha256:abc…) in production manifests so a registry retag cannot silently change what you deploy.

Writing a Dockerfile that builds fast

A Dockerfile is a recipe: a sequence of instructions (FROM, RUN, COPY, CMD) that Docker executes to assemble layers. Each instruction creates a new layer cached by content hash. Order matters enormously for build speed.

Layer caching and dependency installs

Put instructions that change rarely at the top and volatile copies at the bottom. For a Node app, copy package.json and package-lock.json first, run npm ci, then copy source code. Dependency layers stay cached across commits that only touch application logic. The same pattern applies to Python (requirements.txt before app code) and Go (download modules before copying the rest of the tree).

Multi-stage builds

A multi-stage Dockerfile uses multiple FROM statements. The first stage compiles your binary or bundles frontend assets with a full SDK image; the final stage copies only the artifact into a minimal runtime image (e.g. distroless or alpine). Production images shrink from hundreds of megabytes to tens — fewer packages means a smaller attack surface and faster pulls during deploys.

# Example: Go multi-stage pattern
FROM golang:1.22 AS builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /bin/api ./cmd/api

FROM gcr.io/distroless/static-debian12
COPY --from=builder /bin/api /api
USER nonroot:nonroot
ENTRYPOINT ["/api"]

CMD vs ENTRYPOINT

ENTRYPOINT defines the executable; CMD supplies default arguments that users can override at docker run time. Use ENTRYPOINT for the main process and CMD for flags. Run containers as a non-root user via USER — many base images ship with a pre-created node or www-data account. Root inside a container that escapes to the host is a full compromise.

Running containers: ports, env, and lifecycle

docker run -d -p 8080:8080 --name api myapp:1.4.2 starts a detached container, maps host port 8080 to container port 8080, and names it for easier logs and stop commands. Pass secrets and config with -e DATABASE_URL=… or --env-file .env — never bake credentials into the image layers; they persist in registry history even if you delete the line later.

Restart policies and health

--restart unless-stopped brings a container back after host reboot or process crash — useful on a single VM. For orchestrated environments, the platform (Docker Swarm, Kubernetes, ECS) owns restart logic instead. Add a HEALTHCHECK instruction so Docker can mark unhealthy containers; orchestrators use the same signal to replace instances during rolling updates.

Logs and debugging

Container stdout/stderr goes to Docker's logging driver (default: json-file on disk). Use docker logs -f api locally; in production, ship logs to a central system as part of your observability stack. For interactive debugging, docker exec -it api sh opens a shell inside a running container — convenient, but fix the Dockerfile so you do not depend on manual hot-patching in prod.

Storage: volumes, bind mounts, and tmpfs

Container writable layers are ephemeral. Anything that must survive restart — database files, uploaded media, SQLite — needs external storage.

  • Named volumes — managed by Docker, portable across hosts when using a shared volume driver. Best default for database data.
  • Bind mounts — map a host directory into the container. Ideal for local development hot-reload (mount source code into a Node dev container) but couple container state to a specific machine path.
  • tmpfs — in-memory filesystem; good for sensitive temp data that should never hit disk unencrypted.

In production, prefer object storage (S3, GCS) for user uploads and managed databases for relational data rather than running Postgres inside Docker on a single VM without backups. Docker makes running a database easy; making it durable is a separate discipline.

Networking between containers

Default bridge networking gives each container a private IP on an internal bridge. Containers on the same user-defined network resolve each other by name — http://api:8080 from a frontend container without hard-coding IPs. Published ports (-p) expose services to the host; unpublished ports stay internal.

host networking removes isolation (container shares host network stack) — occasionally needed for performance or multicast, rarely for typical HTTP APIs. none disables networking entirely for batch jobs that should not phone home. In cloud deployments, a load balancer or reverse proxy terminates TLS and routes to container backends on private subnets.

Docker Compose for local multi-service stacks

Real applications rarely run alone. A web API needs Postgres, Redis, and maybe a message broker. Docker Compose defines multiple services, networks, and volumes in a compose.yaml file and starts them with one command: docker compose up.

services:
  api:
    build: .
    ports: ["8080:8080"]
    environment:
      DATABASE_URL: postgres://app:secret@db:5432/app
    depends_on:
      db:
        condition: service_healthy
  db:
    image: postgres:16-alpine
    volumes: [pgdata:/var/lib/postgresql/data]
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app"]
      interval: 5s
      retries: 5
volumes:
  pgdata:

Compose is optimized for developer ergonomics, not production HA. Use it to spin up consistent local and CI integration-test environments. For production at scale, graduate to Kubernetes, Amazon ECS, Google Cloud Run, or another orchestrator — but keep the same images you built and tested in Compose.

Registries and the image supply chain

Images live in a container registry: Docker Hub, GitHub Container Registry (ghcr.io), Amazon ECR, Google Artifact Registry. Your CI pipeline builds on every merge, tags with the git SHA (myapp:abc1234), pushes to the registry, and deployment pulls that exact tag. Immutable tags prevent drift between environments.

Security hygiene

  • Scan images for known CVEs in base layers (Trivy, Grype, built-in registry scanners). Patch base images regularly.
  • Pin base image digests in Dockerfiles for reproducible builds; update deliberately after review.
  • Minimize installed packages — fewer tools means fewer vulnerabilities and smaller images.
  • Do not run as root; drop capabilities; use read-only root filesystem where the app allows it.
  • Sign images (cosign, Notary) in high-assurance environments so clusters reject unsigned pulls.

Docker images follow the OCI (Open Container Initiative) spec — the same image format Kubernetes, containerd, and Podman consume. Learning Docker is not wasted if you later move to K8s; you are learning the packaging layer underneath.

Docker alone vs orchestration

A single VM running docker compose up -d behind nginx is a legitimate production architecture for small teams and low-traffic services. You get reproducible deploys, easy rollbacks (pull previous tag, restart), and minimal operational surface. Add complexity only when pain appears: multiple machines, zero-downtime deploys across zones, autoscaling on CPU, or dozens of interdependent services.

Docker Swarm (built into Docker Engine) offers basic orchestration but has lost mindshare to Kubernetes. Managed platforms like Cloud Run or Fly.io hide orchestration entirely — you push an image, they run it. The through-line is always the same: build a good image once, promote it through environments, observe it in production.

Common pitfalls

  • Mutable latest tags — production pulls surprise breaking changes. Tag with versions or git SHAs.
  • Giant images — slow deploys, slow CI, more CVEs. Use multi-stage builds and slim bases.
  • Stateful data in container layers — wiped on recreate. Use volumes or external stores.
  • Ignoring .dockerignore — copying node_modules or .git into the build context bloats images and busts cache.
  • One process per container — anti-pattern to run cron, nginx, and app in one container. Sidecar pattern or separate services scale independently.

Key takeaways

  • Images are immutable templates; containers are running instances with optional ephemeral writable layers.
  • Order Dockerfile instructions for layer cache hits; use multi-stage builds to ship small, secure runtime images.
  • Persist data with named volumes or external services — not container filesystems alone.
  • Docker Compose standardizes local and CI multi-service stacks; production may use the same images on K8s or managed runtimes.
  • Pin tags by digest, scan images, run as non-root, and integrate builds into your CI/CD pipeline before reaching for orchestration.

Related reading