Guide
Seaborn fundamentals explained
A product manager asks whether mobile users retain better than desktop after a
pricing change. You have a tidy
pandas
DataFrame with columns signup_week, platform,
weeks_since_signup, and still_active. With raw
Matplotlib you would wire axes, compute group means, pick colors, and align
legends by hand. Seaborn expresses the same insight in a few
lines: pass column names, choose a plot family, and let the library handle
aggregation, error bars, and aesthetic defaults. Built on top of
Matplotlib,
Seaborn is the standard choice for exploratory and presentation-quality
statistical charts in Python notebooks and batch reports. This guide
covers Seaborn’s data model, relational distribution and categorical plot
families, themes and color palettes, faceting with FacetGrid and
figure-level functions, regression and matrix plots, Matplotlib integration,
a Harbor Analytics cohort retention worked example, a Seaborn vs Matplotlib vs
Plotly decision table, common pitfalls, and a production checklist. Pair it with
our
Plotly fundamentals guide
and
scikit-learn fundamentals guide
for interactive dashboards and model diagnostics.
What Seaborn is and where it fits
Seaborn is a high-level visualization library for statistical graphics. It
accepts pandas DataFrames (or objects that convert to them) and maps variables
to visual channels — position, hue, size, style — through a
consistent API. Under the hood every figure is still Matplotlib: you can call
plt.savefig, tweak axes with ax.set_ylabel, and
embed plots in PDF reports exactly as you would with raw Matplotlib.
Seaborn shines when the question is statistical: “How does metric Y vary with X, broken out by category Z?” It computes aggregations (means, medians, counts), draws confidence intervals, and chooses sensible bin widths for histograms. When the question is bespoke illustration — a custom annotated diagram, a non-standard layout, or pixel-perfect brand compliance — drop to Matplotlib or compose Seaborn axes inside a manually built Figure. When the question is interactive exploration in a browser, reach for Plotly or a dashboard framework like Streamlit instead.
When Seaborn is the right default
- EDA in Jupyter — quick views of distributions, correlations, and category breakdowns.
- ML feature review — pair plots, KDE overlays, and boxplots before training.
- Static reports — PNG/PDF slides and email attachments with consistent styling.
- Faceted comparisons — same chart repeated across segments without copy-paste loops.
Skip Seaborn for real-time dashboards, heavy interactivity (zoom, linked brushing), or charts Seaborn does not model (Sankey, geographic choropleths, custom 3D scenes).
Data model: long-form tables and semantic mapping
Seaborn functions take a data argument plus keyword parameters
that name columns: x, y, hue,
size, style, col, row.
This is tidy data: one row per observation, one column per
variable. Wide pivot tables work for heatmaps but most plot functions expect
long form. If your data is wide, use pd.melt or
DataFrame.melt first.
The hue parameter splits a single plot by category with distinct
colors. style varies line markers or dashes; size
maps a numeric column to point areas. Combining them encodes four dimensions
on one axes — powerful but easy to overdraw; cap category cardinality
or facet instead.
Axes-level vs figure-level functions
Axes-level functions (scatterplot,
lineplot, histplot, boxplot, etc.)
draw onto an existing Matplotlib axes or create one implicitly. Use them when
composing multi-panel figures with
GridSpec
or when you need fine control per subplot.
Figure-level functions (relplot, displot,
catplot, lmplot) wrap axes-level plotters in a
FacetGrid and return a grid object. They handle faceting via
col and row and manage figure size automatically.
Prefer them for consistent small-multiple layouts.
Plot families you will use daily
Relational plots
sns.scatterplot shows individual points with optional regression
trend lines via sns.regplot on the same axes.
sns.lineplot aggregates repeated x values —
it computes mean (or median) per bin and draws a confidence band. Use lineplot
for time series and ordered categories; use scatterplot when every point
matters. sns.relplot is the figure-level wrapper supporting
faceting.
Distribution plots
Modern Seaborn unified histograms and KDEs under
sns.histplot and sns.kdeplot.
histplot supports binwidth, stat normalization
(count, density, probability), and
stacked or side-by-side hue groups.
sns.displot combines hist and kde with faceting.
sns.ecdfplot draws empirical cumulative distributions —
useful for latency percentiles without choosing bin edges.
Categorical plots
sns.barplot shows point estimates (mean by default) with
confidence intervals; sns.countplot counts observations per
category. sns.boxplot and sns.violinplot summarize
spread — violins add KDE shape but obscure exact quartiles unless paired
with strip or swarm overlays (sns.stripplot,
sns.swarmplot). sns.catplot facets any categorical
estimator. Order categories explicitly with the order parameter
so bars appear in business-meaningful sequence, not alphabetical.
Matrix and regression plots
sns.heatmap visualizes correlation matrices or pivot tables;
pass annot=True for cell labels and cmap for
diverging scales centered at zero. sns.clustermap hierarchically
clusters rows and columns — helpful for gene-expression style data or
feature correlation discovery. sns.regplot and
sns.lmplot fit linear models with confidence bands; set
logistic=True for binary y.
Themes, palettes, and Matplotlib integration
Call sns.set_theme() once per notebook to apply Seaborn’s
default aesthetics: grid lines, font sizes, and desaturated colors. Variants
include darkgrid, whitegrid, ticks, and
white. Override context for publication with
sns.set_context("paper") or
"talk" to scale fonts.
Palettes are first-class: sns.color_palette("husl", 5),
sns.color_palette("vlag", as_cmap=True) for heatmaps,
and sns.husl_palette for perceptually distinct categorical
colors. For colorblind-safe defaults use "colorblind".
Pass palette= to any plot function or set
sns.set_palette globally.
To combine with Matplotlib: create fig, ax = plt.subplots(), pass
ax=ax to axes-level functions, then call
ax.set_title or add Matplotlib annotations. Retrieve the figure
from a FacetGrid via g.figure. Reset styles in tests with
sns.reset_defaults() to avoid polluting other plots.
Faceting without boilerplate
sns.FacetGrid(data, col="region", row="product")
builds a grid of axes sharing x and y limits. Call
g.map_dataframe(sns.scatterplot, x="revenue", y="churn_rate")
or use the modern g.map API. Figure-level functions encode the
same pattern: sns.relplot(data=df, x="week", y="active_users", col="platform", kind="line").
Control wrap with col_wrap=3 for many facets in a wrapped grid.
Set sharex=False when scales differ legitimately (revenue in USD
vs EUR subsidiaries). Use height and aspect on
figure-level functions instead of guessing figsize.
Worked example: Harbor Analytics cohort retention
Harbor Analytics tracks weekly retention for users who signed up in January vs February, split by mobile and desktop. The analytics engineer loads a CSV into pandas, filters to the first eight weeks, and builds a faceted line chart:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="whitegrid", context="notebook")
df = pd.read_csv("harbor_cohort_retention.csv")
g = sns.relplot(
data=df,
x="weeks_since_signup",
y="retention_rate",
hue="signup_cohort",
col="platform",
kind="line",
marker="o",
height=4,
aspect=1.1,
)
g.set_axis_labels("Weeks since signup", "Active user share")
g.set_titles("{col_name} users")
g.figure.suptitle("Harbor cohort retention by platform", y=1.02)
plt.savefig("harbor_retention_q1.png", dpi=150, bbox_inches="tight")
relplot aggregates duplicate week/cohort/platform rows and shades
95% confidence intervals automatically. For a distribution view of session
lengths by platform, a companion sns.histplot(data=df, x="session_minutes", hue="platform", multiple="stack", stat="density")
reveals whether mobile skews toward short sessions. The PM gets two PNGs in
Slack without a dashboard server.
Seaborn vs Matplotlib vs Plotly
| Need | Seaborn | Matplotlib | Plotly |
|---|---|---|---|
| Statistical defaults (CI, bins) | Built in | Manual | Partial via Express |
| Custom illustration | Drop to Matplotlib | Full control | Moderate |
| Interactive HTML | No (static) | Limited widgets | Native |
| Faceted small multiples | relplot / catplot |
Manual subplots | facet_col in Express |
| PDF / print pipeline | Excellent | Excellent | Requires Kaleido/static export |
| pandas integration | Column names native | Via df.plot |
Via Express |
| Learning curve for EDA | Low | Medium | Low for Express |
Common pitfalls
- Wide data passed directly — melt to long form or heatmaps will mislabel axes.
- Too many hue categories — legends overflow; filter top-N or facet by
col. - Alphabetical bar order — pass
order=for meaningful ranking. - Dual y-axes confusion — Seaborn does not encourage twin axes; normalize or facet instead.
- Deprecated APIs — avoid legacy
distplot; usehistplot/kdeplot. - Global style leakage —
set_themeaffects all subsequent figures in the kernel; reset in tests. - Overplotting dense scatter — use
alpha=0.3,rasterized=Truefor PDF, or hexbin via Matplotlib. - Trusting defaults for skewed finance data — prefer log scales or percentiles; boxplots hide multimodal tails.
Production checklist
- Pin
seaborn,matplotlib, andpandasversions in project lockfiles. - Call
sns.set_themeonce at notebook or script entry; document palette choices for brand consistency. - Validate tidy schema before plotting; assert required columns exist.
- Set explicit
order=on categorical axes for reproducible slides. - Export with
bbox_inches="tight"anddpi=150minimum for retina displays. - Use
rasterized=Trueon heavy scatter layers embedded in vector PDFs. - Compose with Matplotlib when Seaborn lacks a primitive; share one Figure object.
- Pair static Seaborn exports with Plotly dashboards when stakeholders need drill-down.
- Unit-test data transforms; snapshot-test PNGs only when stable across OS font stacks.
- Log figure paths and filter hashes in batch pipelines for auditability.
Key takeaways
- Seaborn maps tidy DataFrame columns to statistical charts on Matplotlib axes.
- Plot families — relational, distribution, categorical, matrix — cover most EDA questions.
- Figure-level functions (
relplot,catplot,displot) handle faceting cleanly. - Themes and palettes give consistent aesthetics; override per-axes when composing complex figures.
- Plotly wins interactivity; Matplotlib wins bespoke control; Seaborn wins fast statistical defaults.
Related reading
- Matplotlib fundamentals explained — Figure/Axes model and publication export
- pandas fundamentals explained — tidy tables and groupby for plotting
- Plotly fundamentals explained — interactive charts and dashboards
- Jupyter fundamentals explained — notebooks where Seaborn charts are born