Guide

Matplotlib fundamentals explained

Every Friday Harbor Analytics emails leadership a one-page PDF: revenue trend, sign-up funnel, and error-rate sparkline. The data lives in pandas; the charts must survive print, Slack previews, and a board projector without looking like a default spreadsheet export. Matplotlib is the library that makes that possible. It is the rendering engine beneath Seaborn, many scientific papers, and countless batch reports. Unlike browser-first tools such as Plotly, Matplotlib targets static, pixel- and vector-perfect output. This guide covers the Figure and Axes object model, pyplot’s implicit state versus the object-oriented API, core plot types, global styling with rcParams, multi-panel layouts with subplots and GridSpec, color and legend discipline, export settings for publication, a Harbor Analytics weekly KPI report worked example, a Matplotlib vs Seaborn vs Plotly decision table, common pitfalls, and a production checklist. Use it alongside our Jupyter fundamentals guide when building notebook-to-PDF pipelines.

What Matplotlib is and where it fits

Matplotlib is a comprehensive 2D (and limited 3D) plotting library for Python. It draws lines, markers, patches, text, and images onto a canvas, then serializes the result to PNG, PDF, SVG, or other backends. Nearly every Python visualization stack touches it: Seaborn delegates rendering to Matplotlib axes; pandas DataFrame.plot is a thin wrapper; machine-learning tutorials use plt.imshow for confusion matrices.

Choose Matplotlib when you need fine-grained control over layout, annotations, and print output. Choose Seaborn when the task is statistical aggregation with sensible defaults. Choose Plotly when stakeholders must zoom, hover, and filter in the browser. In practice, teams mix them: Seaborn for exploration, Matplotlib for the final annotated figure, Plotly for the self-serve dashboard.

When Matplotlib is the right default

  • Publication and compliance PDFs — vector text, embedded fonts, exact margins.
  • Custom annotations — arrows, bracket labels, inset zooms, mixed chart types on one axes.
  • Non-standard geometry — polar plots, broken axes, shared colorbars across irregular grids.
  • Headless servers — cron jobs that write PNGs without a display (use the Agg backend).

Skip Matplotlib as the only tool when interactivity is the product, when you need linked brushing across dozens of charts, or when a high-level statistical API will ship the insight faster with no custom layout.

Figure, Axes, and Artist hierarchy

Matplotlib models a plot as a tree of objects. A Figure is the top-level container — think of it as the page. One or more Axes live on the figure; each axes is a coordinate system with x and y limits, ticks, labels, and a title. Everything drawn — lines, rectangles, text — is an Artist. Understanding this hierarchy prevents the most common beginner bug: calling pyplot functions in the wrong order and drawing on an unexpected axes.

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(8, 4))
ax.plot([1, 2, 3], [4, 1, 9], marker="o", label="Series A")
ax.set_xlabel("Week")
ax.set_ylabel("Revenue (USD thousands)")
ax.set_title("Harbor weekly revenue")
ax.legend(loc="upper left")
fig.savefig("revenue.png", dpi=150, bbox_inches="tight")

Here fig owns the canvas size; ax owns the data limits and decorations. Multiple axes on one figure share the figure but have independent transforms. Use fig.suptitle for a figure-wide title above subplot grids.

pyplot state machine vs object-oriented API

matplotlib.pyplot maintains implicit “current” figure and axes. Quick scripts work with plt.plot followed by plt.show(). That convenience breaks in loops, callbacks, and GUI embeds where state leaks between calls. Production code should prefer explicit fig, ax = plt.subplots() and pass ax= to every plotting call. Libraries like Seaborn accept an ax argument for this reason.

Rule of thumb: notebooks may start with pyplot for speed; refactor to the OO API before the chart ships to a report generator or unit test.

Core plot types and data expectations

Line and scatter

ax.plot(x, y) connects points with lines — ideal for time series and ordered categories. ax.scatter(x, y, s=sizes, c=colors) draws individual markers without connecting them. Use scatter when each point is an observation; use line when the x-axis is continuous and interpolation between points is meaningful. Control overplotting with alpha, smaller markers, or ax.hexbin for density.

Bars and histograms

ax.bar(categories, heights) and ax.barh for horizontal layouts. Pass explicit x positions when bars are grouped: width and offset multiples of bar width for side-by-side series. ax.hist(data, bins=30) counts frequency per bin; prefer bins="fd" or explicit bin edges when defaults hide multimodal distributions. For normalized comparisons use density=True.

Images and heatmaps

ax.imshow(matrix, aspect="auto", cmap="viridis") displays 2D arrays. Add a colorbar with fig.colorbar(im, ax=ax, label="Correlation"). For labeled rows and columns, set xticks and yticklabels manually or delegate to Seaborn’s heatmap when you want annotations in cells.

Working with pandas

df.plot(ax=ax, kind="line") is convenient for quick views but hides styling details. For production figures, extract df["week"] and df["revenue"] and call ax.plot directly so legend labels and colors are explicit and stable when column order changes.

Styling: rcParams, colors, and legends

Global defaults live in matplotlib.rcParams. Set them once at process start:

plt.rcParams.update({
    "figure.figsize": (8, 4),
    "font.size": 11,
    "axes.titlesize": 13,
    "axes.labelsize": 11,
    "legend.frameon": False,
    "axes.spines.top": False,
    "axes.spines.right": False,
})

Use style sheets for brand consistency: plt.style.use("seaborn-v0_8-whitegrid") or a custom .mplstyle file checked into the repo. Cycle colors with ax.set_prop_cycle(color=[...]) so multi-series charts match design tokens.

Legends: call ax.legend() after all artists are drawn. Place with loc or bbox_to_anchor outside the plot area when series overlap the data region. For many series, consider direct labels on the last point instead of a crowded legend box.

Date and currency axes

Time series need matplotlib.dates: convert strings to datetime, then ax.xaxis.set_major_formatter(mdates.DateFormatter("%b %d")). For currency, use matplotlib.ticker.FuncFormatter or StrMethodFormatter("${x:,.0f}") so axis ticks read naturally in executive summaries.

Subplots, GridSpec, and shared axes

fig, axes = plt.subplots(2, 2, figsize=(10, 8), sharex=True) returns a 2×2 numpy array of axes. Index with axes[0, 1] for the top-right panel. sharex and sharey align scales across panels — essential for small-multiple comparisons.

GridSpec handles irregular layouts: a wide top panel spanning two columns above two equal bottom panels. Create with gs = fig.add_gridspec(2, 2, height_ratios=[2, 1]), then ax_top = fig.add_subplot(gs[0, :]) and ax_bl = fig.add_subplot(gs[1, 0]). This pattern appears in dashboard-style PDFs where one hero chart dominates and supporting metrics sit below.

Tight layout: fig.tight_layout() or fig.subplots_adjust(hspace=0.35) prevents label overlap. When saving, bbox_inches="tight" crops whitespace but can clip suptitles — adjust rect= in tight_layout or add top margin with subplots_adjust.

Export, backends, and reproducibility

Set the backend before importing pyplot in headless environments: matplotlib.use("Agg"). Save vector PDFs for print (fig.savefig("report.pdf")) and PNGs for Slack (dpi=150 minimum; dpi=300 for slides on retina displays). Embed fonts with pdf.fonttype = 42 in rcParams so text remains editable in Illustrator.

Use rasterized=True on scatter layers with millions of points when embedding in vector PDFs — file size stays manageable. Set a random seed only when plots include stochastic elements; otherwise pin data and code versions. Log the git commit hash alongside generated figures in batch pipelines for audit trails.

Worked example: Harbor Analytics weekly KPI report

Harbor’s ops team generates a Friday PDF from three CSV extracts: daily revenue, funnel conversion counts, and API error rates. The script uses the object-oriented API and GridSpec:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.gridspec import GridSpec

plt.rcParams.update({"font.size": 10, "axes.spines.top": False, "axes.spines.right": False})

revenue = pd.read_csv("harbor_revenue_daily.csv", parse_dates=["date"])
funnel = pd.read_csv("harbor_funnel_weekly.csv")
errors = pd.read_csv("harbor_errors_daily.csv", parse_dates=["date"])

fig = plt.figure(figsize=(10, 7))
gs = GridSpec(2, 2, figure=fig, height_ratios=[2, 1], hspace=0.35, wspace=0.25)

ax_rev = fig.add_subplot(gs[0, :])
ax_rev.plot(revenue["date"], revenue["revenue_kusd"], color="#2d6a4f", lw=2)
ax_rev.set_title("Daily revenue (USD thousands)")
ax_rev.xaxis.set_major_formatter(mdates.DateFormatter("%b %d"))

ax_fun = fig.add_subplot(gs[1, 0])
ax_fun.bar(funnel["stage"], funnel["conversion_pct"], color="#4361ee")
ax_fun.set_ylabel("Conversion %")
ax_fun.set_title("Weekly funnel")

ax_err = fig.add_subplot(gs[1, 1])
ax_err.plot(errors["date"], errors["error_rate_pct"], color="#e63946", lw=1.5)
ax_err.axhline(0.5, color="gray", ls="--", lw=0.8, label="SLO 0.5%")
ax_err.legend(loc="upper right")
ax_err.set_title("API error rate")

fig.suptitle("Harbor Analytics — week ending 2026-06-06", fontsize=14, y=0.98)
fig.savefig("harbor_kpi_week.pdf", bbox_inches="tight")

The hero revenue panel spans the full width; funnel and error sparklines share the bottom row. Spines are trimmed for a clean executive look. The same figure object could be passed to a Seaborn overlay on ax_rev for a confidence band without restructuring the layout.

Matplotlib vs Seaborn vs Plotly

Need Matplotlib Seaborn Plotly
Pixel-perfect PDF / print Native strength Via Matplotlib Needs Kaleido export
Custom layout and annotations Full control Limited; compose on axes Moderate
Statistical defaults (CI, bins) Manual Built in Partial in Express
Interactive exploration Basic widgets Static only Native
Headless batch PNG/PDF Excellent (Agg) Excellent Heavier deps
Learning curve for bespoke charts Steeper Lower for stats Low for dashboards
3D and polar plots Supported Limited Supported

Common pitfalls

  • Implicit pyplot state in loops — each iteration overwrites the “current” axes; use explicit fig, ax.
  • Wrong figure savedplt.savefig saves the current figure; prefer fig.savefig.
  • Overlapping labels — rotate tick labels (plt.setp(ax.get_xticklabels(), rotation=45, ha="right")) or reduce tick density.
  • Dual y-axes abuse — twin axes distort scale relationships; normalize series or use two stacked panels instead.
  • Non-interactive show() in CI — blocks headless runners; savefig and close figures with plt.close(fig).
  • Enormous scatter in vector PDFs — rasterize or subsample; otherwise files balloon to hundreds of megabytes.
  • Color-only encoding — add markers or line styles for colorblind readers; test palettes with simulators.
  • Timezone-naive dates — parse with UTC awareness before plotting multi-region revenue series.

Production checklist

  • Pin matplotlib version in lockfiles; test figure output on upgrade.
  • Use the OO API (fig, ax = plt.subplots()) in all shipped code paths.
  • Set matplotlib.use("Agg") before pyplot import in servers.
  • Centralize brand rcParams or a .mplstyle file in the repo.
  • Call fig.savefig(..., dpi=150, bbox_inches="tight") and plt.close(fig) to free memory in batch jobs.
  • Embed fonts in PDFs (pdf.fonttype = 42) for print vendors.
  • Assert input DataFrame schemas before plotting; fail fast on missing columns.
  • Compose Seaborn on explicit axes when you need both statistical defaults and custom layout.
  • Pair static Matplotlib PDFs with Plotly dashboards when drill-down is required.
  • Log output paths and data snapshot hashes for reproducibility audits.

Key takeaways

  • Figure and Axes form the object tree; prefer explicit axes over pyplot state in production.
  • Core primitives — plot, scatter, bar, hist, imshow — cover most static reporting needs.
  • rcParams and style sheets enforce consistent typography and color across automated reports.
  • GridSpec builds irregular multi-panel layouts without manual coordinate math.
  • Seaborn and Plotly layer on top; Matplotlib remains the print-quality foundation.

Related reading