Guide
Matplotlib fundamentals explained
Every Friday Harbor Analytics emails leadership a one-page PDF: revenue trend,
sign-up funnel, and error-rate sparkline. The data lives in
pandas;
the charts must survive print, Slack previews, and a board projector without
looking like a default spreadsheet export. Matplotlib is the
library that makes that possible. It is the rendering engine beneath
Seaborn,
many scientific papers, and countless batch reports. Unlike browser-first
tools such as
Plotly,
Matplotlib targets static, pixel- and vector-perfect output. This guide covers
the Figure and Axes object model, pyplot’s implicit state versus the
object-oriented API, core plot types, global styling with rcParams,
multi-panel layouts with subplots and GridSpec, color and legend
discipline, export settings for publication, a Harbor Analytics weekly KPI
report worked example, a Matplotlib vs Seaborn vs Plotly decision table, common
pitfalls, and a production checklist. Use it alongside our
Jupyter fundamentals guide
when building notebook-to-PDF pipelines.
What Matplotlib is and where it fits
Matplotlib is a comprehensive 2D (and limited 3D) plotting library for
Python. It draws lines, markers, patches, text, and images onto a canvas,
then serializes the result to PNG, PDF, SVG, or other backends. Nearly every
Python visualization stack touches it: Seaborn delegates rendering to
Matplotlib axes; pandas DataFrame.plot is a thin wrapper;
machine-learning tutorials use plt.imshow for confusion matrices.
Choose Matplotlib when you need fine-grained control over layout, annotations, and print output. Choose Seaborn when the task is statistical aggregation with sensible defaults. Choose Plotly when stakeholders must zoom, hover, and filter in the browser. In practice, teams mix them: Seaborn for exploration, Matplotlib for the final annotated figure, Plotly for the self-serve dashboard.
When Matplotlib is the right default
- Publication and compliance PDFs — vector text, embedded fonts, exact margins.
- Custom annotations — arrows, bracket labels, inset zooms, mixed chart types on one axes.
- Non-standard geometry — polar plots, broken axes, shared colorbars across irregular grids.
- Headless servers — cron jobs that write PNGs without a display (use the
Aggbackend).
Skip Matplotlib as the only tool when interactivity is the product, when you need linked brushing across dozens of charts, or when a high-level statistical API will ship the insight faster with no custom layout.
Figure, Axes, and Artist hierarchy
Matplotlib models a plot as a tree of objects. A Figure is the top-level container — think of it as the page. One or more Axes live on the figure; each axes is a coordinate system with x and y limits, ticks, labels, and a title. Everything drawn — lines, rectangles, text — is an Artist. Understanding this hierarchy prevents the most common beginner bug: calling pyplot functions in the wrong order and drawing on an unexpected axes.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot([1, 2, 3], [4, 1, 9], marker="o", label="Series A")
ax.set_xlabel("Week")
ax.set_ylabel("Revenue (USD thousands)")
ax.set_title("Harbor weekly revenue")
ax.legend(loc="upper left")
fig.savefig("revenue.png", dpi=150, bbox_inches="tight")
Here fig owns the canvas size; ax owns the data
limits and decorations. Multiple axes on one figure share the figure but have
independent transforms. Use fig.suptitle for a figure-wide title
above subplot grids.
pyplot state machine vs object-oriented API
matplotlib.pyplot maintains implicit “current” figure
and axes. Quick scripts work with plt.plot followed by
plt.show(). That convenience breaks in loops, callbacks, and GUI
embeds where state leaks between calls. Production code should prefer explicit
fig, ax = plt.subplots() and pass ax= to every
plotting call. Libraries like Seaborn accept an ax argument for
this reason.
Rule of thumb: notebooks may start with pyplot for speed; refactor to the OO API before the chart ships to a report generator or unit test.
Core plot types and data expectations
Line and scatter
ax.plot(x, y) connects points with lines — ideal for time
series and ordered categories. ax.scatter(x, y, s=sizes, c=colors)
draws individual markers without connecting them. Use scatter when each point
is an observation; use line when the x-axis is continuous and interpolation
between points is meaningful. Control overplotting with alpha,
smaller markers, or ax.hexbin for density.
Bars and histograms
ax.bar(categories, heights) and ax.barh for
horizontal layouts. Pass explicit x positions when bars are grouped:
width and offset multiples of bar width for side-by-side series.
ax.hist(data, bins=30) counts frequency per bin; prefer
bins="fd" or explicit bin edges when defaults hide
multimodal distributions. For normalized comparisons use
density=True.
Images and heatmaps
ax.imshow(matrix, aspect="auto", cmap="viridis")
displays 2D arrays. Add a colorbar with
fig.colorbar(im, ax=ax, label="Correlation").
For labeled rows and columns, set xticks and
yticklabels manually or delegate to Seaborn’s
heatmap when you want annotations in cells.
Working with pandas
df.plot(ax=ax, kind="line") is convenient for quick
views but hides styling details. For production figures, extract
df["week"] and df["revenue"] and
call ax.plot directly so legend labels and colors are explicit and
stable when column order changes.
Styling: rcParams, colors, and legends
Global defaults live in matplotlib.rcParams. Set them once at
process start:
plt.rcParams.update({
"figure.figsize": (8, 4),
"font.size": 11,
"axes.titlesize": 13,
"axes.labelsize": 11,
"legend.frameon": False,
"axes.spines.top": False,
"axes.spines.right": False,
})
Use style sheets for brand consistency:
plt.style.use("seaborn-v0_8-whitegrid") or a custom
.mplstyle file checked into the repo. Cycle colors with
ax.set_prop_cycle(color=[...]) so multi-series charts match
design tokens.
Legends: call ax.legend() after all artists are drawn. Place with
loc or bbox_to_anchor outside the plot area when
series overlap the data region. For many series, consider direct labels on the
last point instead of a crowded legend box.
Date and currency axes
Time series need matplotlib.dates: convert strings to datetime,
then ax.xaxis.set_major_formatter(mdates.DateFormatter("%b %d")).
For currency, use matplotlib.ticker.FuncFormatter or
StrMethodFormatter("${x:,.0f}") so axis ticks read
naturally in executive summaries.
Subplots, GridSpec, and shared axes
fig, axes = plt.subplots(2, 2, figsize=(10, 8), sharex=True)
returns a 2×2 numpy array of axes. Index with
axes[0, 1] for the top-right panel. sharex and
sharey align scales across panels — essential for
small-multiple comparisons.
GridSpec handles irregular layouts: a wide top panel spanning
two columns above two equal bottom panels. Create with
gs = fig.add_gridspec(2, 2, height_ratios=[2, 1]), then
ax_top = fig.add_subplot(gs[0, :]) and
ax_bl = fig.add_subplot(gs[1, 0]). This pattern appears in
dashboard-style PDFs where one hero chart dominates and supporting metrics sit
below.
Tight layout: fig.tight_layout() or
fig.subplots_adjust(hspace=0.35) prevents label overlap.
When saving, bbox_inches="tight" crops whitespace but
can clip suptitles — adjust rect= in
tight_layout or add top margin with subplots_adjust.
Export, backends, and reproducibility
Set the backend before importing pyplot in headless environments:
matplotlib.use("Agg"). Save vector PDFs for print
(fig.savefig("report.pdf")) and PNGs for Slack
(dpi=150 minimum; dpi=300 for slides on retina
displays). Embed fonts with pdf.fonttype = 42 in rcParams so
text remains editable in Illustrator.
Use rasterized=True on scatter layers with millions of points when
embedding in vector PDFs — file size stays manageable. Set a random
seed only when plots include stochastic elements; otherwise pin data and code
versions. Log the git commit hash alongside generated figures in batch
pipelines for audit trails.
Worked example: Harbor Analytics weekly KPI report
Harbor’s ops team generates a Friday PDF from three CSV extracts: daily revenue, funnel conversion counts, and API error rates. The script uses the object-oriented API and GridSpec:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.gridspec import GridSpec
plt.rcParams.update({"font.size": 10, "axes.spines.top": False, "axes.spines.right": False})
revenue = pd.read_csv("harbor_revenue_daily.csv", parse_dates=["date"])
funnel = pd.read_csv("harbor_funnel_weekly.csv")
errors = pd.read_csv("harbor_errors_daily.csv", parse_dates=["date"])
fig = plt.figure(figsize=(10, 7))
gs = GridSpec(2, 2, figure=fig, height_ratios=[2, 1], hspace=0.35, wspace=0.25)
ax_rev = fig.add_subplot(gs[0, :])
ax_rev.plot(revenue["date"], revenue["revenue_kusd"], color="#2d6a4f", lw=2)
ax_rev.set_title("Daily revenue (USD thousands)")
ax_rev.xaxis.set_major_formatter(mdates.DateFormatter("%b %d"))
ax_fun = fig.add_subplot(gs[1, 0])
ax_fun.bar(funnel["stage"], funnel["conversion_pct"], color="#4361ee")
ax_fun.set_ylabel("Conversion %")
ax_fun.set_title("Weekly funnel")
ax_err = fig.add_subplot(gs[1, 1])
ax_err.plot(errors["date"], errors["error_rate_pct"], color="#e63946", lw=1.5)
ax_err.axhline(0.5, color="gray", ls="--", lw=0.8, label="SLO 0.5%")
ax_err.legend(loc="upper right")
ax_err.set_title("API error rate")
fig.suptitle("Harbor Analytics — week ending 2026-06-06", fontsize=14, y=0.98)
fig.savefig("harbor_kpi_week.pdf", bbox_inches="tight")
The hero revenue panel spans the full width; funnel and error sparklines share
the bottom row. Spines are trimmed for a clean executive look. The same
figure object could be passed to a Seaborn overlay on ax_rev for
a confidence band without restructuring the layout.
Matplotlib vs Seaborn vs Plotly
| Need | Matplotlib | Seaborn | Plotly |
|---|---|---|---|
| Pixel-perfect PDF / print | Native strength | Via Matplotlib | Needs Kaleido export |
| Custom layout and annotations | Full control | Limited; compose on axes | Moderate |
| Statistical defaults (CI, bins) | Manual | Built in | Partial in Express |
| Interactive exploration | Basic widgets | Static only | Native |
| Headless batch PNG/PDF | Excellent (Agg) |
Excellent | Heavier deps |
| Learning curve for bespoke charts | Steeper | Lower for stats | Low for dashboards |
| 3D and polar plots | Supported | Limited | Supported |
Common pitfalls
- Implicit pyplot state in loops — each iteration overwrites the “current” axes; use explicit
fig, ax. - Wrong figure saved —
plt.savefigsaves the current figure; preferfig.savefig. - Overlapping labels — rotate tick labels (
plt.setp(ax.get_xticklabels(), rotation=45, ha="right")) or reduce tick density. - Dual y-axes abuse — twin axes distort scale relationships; normalize series or use two stacked panels instead.
- Non-interactive
show()in CI — blocks headless runners; savefig and close figures withplt.close(fig). - Enormous scatter in vector PDFs — rasterize or subsample; otherwise files balloon to hundreds of megabytes.
- Color-only encoding — add markers or line styles for colorblind readers; test palettes with simulators.
- Timezone-naive dates — parse with UTC awareness before plotting multi-region revenue series.
Production checklist
- Pin
matplotlibversion in lockfiles; test figure output on upgrade. - Use the OO API (
fig, ax = plt.subplots()) in all shipped code paths. - Set
matplotlib.use("Agg")before pyplot import in servers. - Centralize brand rcParams or a
.mplstylefile in the repo. - Call
fig.savefig(..., dpi=150, bbox_inches="tight")andplt.close(fig)to free memory in batch jobs. - Embed fonts in PDFs (
pdf.fonttype = 42) for print vendors. - Assert input DataFrame schemas before plotting; fail fast on missing columns.
- Compose Seaborn on explicit axes when you need both statistical defaults and custom layout.
- Pair static Matplotlib PDFs with Plotly dashboards when drill-down is required.
- Log output paths and data snapshot hashes for reproducibility audits.
Key takeaways
- Figure and Axes form the object tree; prefer explicit axes over pyplot state in production.
- Core primitives — plot, scatter, bar, hist, imshow — cover most static reporting needs.
- rcParams and style sheets enforce consistent typography and color across automated reports.
- GridSpec builds irregular multi-panel layouts without manual coordinate math.
- Seaborn and Plotly layer on top; Matplotlib remains the print-quality foundation.
Related reading
- Seaborn fundamentals explained — statistical plots built on Matplotlib axes
- pandas fundamentals explained — tidy tables that feed your charts
- Plotly fundamentals explained — interactive charts when static PNGs are not enough
- Jupyter fundamentals explained — notebooks where Matplotlib figures are authored