Guide

Seaborn fundamentals explained

A product manager asks whether mobile users retain better than desktop after a pricing change. You have a tidy pandas DataFrame with columns signup_week, platform, weeks_since_signup, and still_active. With raw Matplotlib you would wire axes, compute group means, pick colors, and align legends by hand. Seaborn expresses the same insight in a few lines: pass column names, choose a plot family, and let the library handle aggregation, error bars, and aesthetic defaults. Built on top of Matplotlib, Seaborn is the standard choice for exploratory and presentation-quality statistical charts in Python notebooks and batch reports. This guide covers Seaborn’s data model, relational distribution and categorical plot families, themes and color palettes, faceting with FacetGrid and figure-level functions, regression and matrix plots, Matplotlib integration, a Harbor Analytics cohort retention worked example, a Seaborn vs Matplotlib vs Plotly decision table, common pitfalls, and a production checklist. Pair it with our Plotly fundamentals guide and scikit-learn fundamentals guide for interactive dashboards and model diagnostics.

What Seaborn is and where it fits

Seaborn is a high-level visualization library for statistical graphics. It accepts pandas DataFrames (or objects that convert to them) and maps variables to visual channels — position, hue, size, style — through a consistent API. Under the hood every figure is still Matplotlib: you can call plt.savefig, tweak axes with ax.set_ylabel, and embed plots in PDF reports exactly as you would with raw Matplotlib.

Seaborn shines when the question is statistical: “How does metric Y vary with X, broken out by category Z?” It computes aggregations (means, medians, counts), draws confidence intervals, and chooses sensible bin widths for histograms. When the question is bespoke illustration — a custom annotated diagram, a non-standard layout, or pixel-perfect brand compliance — drop to Matplotlib or compose Seaborn axes inside a manually built Figure. When the question is interactive exploration in a browser, reach for Plotly or a dashboard framework like Streamlit instead.

When Seaborn is the right default

  • EDA in Jupyter — quick views of distributions, correlations, and category breakdowns.
  • ML feature review — pair plots, KDE overlays, and boxplots before training.
  • Static reports — PNG/PDF slides and email attachments with consistent styling.
  • Faceted comparisons — same chart repeated across segments without copy-paste loops.

Skip Seaborn for real-time dashboards, heavy interactivity (zoom, linked brushing), or charts Seaborn does not model (Sankey, geographic choropleths, custom 3D scenes).

Data model: long-form tables and semantic mapping

Seaborn functions take a data argument plus keyword parameters that name columns: x, y, hue, size, style, col, row. This is tidy data: one row per observation, one column per variable. Wide pivot tables work for heatmaps but most plot functions expect long form. If your data is wide, use pd.melt or DataFrame.melt first.

The hue parameter splits a single plot by category with distinct colors. style varies line markers or dashes; size maps a numeric column to point areas. Combining them encodes four dimensions on one axes — powerful but easy to overdraw; cap category cardinality or facet instead.

Axes-level vs figure-level functions

Axes-level functions (scatterplot, lineplot, histplot, boxplot, etc.) draw onto an existing Matplotlib axes or create one implicitly. Use them when composing multi-panel figures with GridSpec or when you need fine control per subplot.

Figure-level functions (relplot, displot, catplot, lmplot) wrap axes-level plotters in a FacetGrid and return a grid object. They handle faceting via col and row and manage figure size automatically. Prefer them for consistent small-multiple layouts.

Plot families you will use daily

Relational plots

sns.scatterplot shows individual points with optional regression trend lines via sns.regplot on the same axes. sns.lineplot aggregates repeated x values — it computes mean (or median) per bin and draws a confidence band. Use lineplot for time series and ordered categories; use scatterplot when every point matters. sns.relplot is the figure-level wrapper supporting faceting.

Distribution plots

Modern Seaborn unified histograms and KDEs under sns.histplot and sns.kdeplot. histplot supports binwidth, stat normalization (count, density, probability), and stacked or side-by-side hue groups. sns.displot combines hist and kde with faceting. sns.ecdfplot draws empirical cumulative distributions — useful for latency percentiles without choosing bin edges.

Categorical plots

sns.barplot shows point estimates (mean by default) with confidence intervals; sns.countplot counts observations per category. sns.boxplot and sns.violinplot summarize spread — violins add KDE shape but obscure exact quartiles unless paired with strip or swarm overlays (sns.stripplot, sns.swarmplot). sns.catplot facets any categorical estimator. Order categories explicitly with the order parameter so bars appear in business-meaningful sequence, not alphabetical.

Matrix and regression plots

sns.heatmap visualizes correlation matrices or pivot tables; pass annot=True for cell labels and cmap for diverging scales centered at zero. sns.clustermap hierarchically clusters rows and columns — helpful for gene-expression style data or feature correlation discovery. sns.regplot and sns.lmplot fit linear models with confidence bands; set logistic=True for binary y.

Themes, palettes, and Matplotlib integration

Call sns.set_theme() once per notebook to apply Seaborn’s default aesthetics: grid lines, font sizes, and desaturated colors. Variants include darkgrid, whitegrid, ticks, and white. Override context for publication with sns.set_context("paper") or "talk" to scale fonts.

Palettes are first-class: sns.color_palette("husl", 5), sns.color_palette("vlag", as_cmap=True) for heatmaps, and sns.husl_palette for perceptually distinct categorical colors. For colorblind-safe defaults use "colorblind". Pass palette= to any plot function or set sns.set_palette globally.

To combine with Matplotlib: create fig, ax = plt.subplots(), pass ax=ax to axes-level functions, then call ax.set_title or add Matplotlib annotations. Retrieve the figure from a FacetGrid via g.figure. Reset styles in tests with sns.reset_defaults() to avoid polluting other plots.

Faceting without boilerplate

sns.FacetGrid(data, col="region", row="product") builds a grid of axes sharing x and y limits. Call g.map_dataframe(sns.scatterplot, x="revenue", y="churn_rate") or use the modern g.map API. Figure-level functions encode the same pattern: sns.relplot(data=df, x="week", y="active_users", col="platform", kind="line").

Control wrap with col_wrap=3 for many facets in a wrapped grid. Set sharex=False when scales differ legitimately (revenue in USD vs EUR subsidiaries). Use height and aspect on figure-level functions instead of guessing figsize.

Worked example: Harbor Analytics cohort retention

Harbor Analytics tracks weekly retention for users who signed up in January vs February, split by mobile and desktop. The analytics engineer loads a CSV into pandas, filters to the first eight weeks, and builds a faceted line chart:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style="whitegrid", context="notebook")
df = pd.read_csv("harbor_cohort_retention.csv")

g = sns.relplot(
    data=df,
    x="weeks_since_signup",
    y="retention_rate",
    hue="signup_cohort",
    col="platform",
    kind="line",
    marker="o",
    height=4,
    aspect=1.1,
)
g.set_axis_labels("Weeks since signup", "Active user share")
g.set_titles("{col_name} users")
g.figure.suptitle("Harbor cohort retention by platform", y=1.02)
plt.savefig("harbor_retention_q1.png", dpi=150, bbox_inches="tight")

relplot aggregates duplicate week/cohort/platform rows and shades 95% confidence intervals automatically. For a distribution view of session lengths by platform, a companion sns.histplot(data=df, x="session_minutes", hue="platform", multiple="stack", stat="density") reveals whether mobile skews toward short sessions. The PM gets two PNGs in Slack without a dashboard server.

Seaborn vs Matplotlib vs Plotly

Need Seaborn Matplotlib Plotly
Statistical defaults (CI, bins) Built in Manual Partial via Express
Custom illustration Drop to Matplotlib Full control Moderate
Interactive HTML No (static) Limited widgets Native
Faceted small multiples relplot / catplot Manual subplots facet_col in Express
PDF / print pipeline Excellent Excellent Requires Kaleido/static export
pandas integration Column names native Via df.plot Via Express
Learning curve for EDA Low Medium Low for Express

Common pitfalls

  • Wide data passed directly — melt to long form or heatmaps will mislabel axes.
  • Too many hue categories — legends overflow; filter top-N or facet by col.
  • Alphabetical bar order — pass order= for meaningful ranking.
  • Dual y-axes confusion — Seaborn does not encourage twin axes; normalize or facet instead.
  • Deprecated APIs — avoid legacy distplot; use histplot/kdeplot.
  • Global style leakageset_theme affects all subsequent figures in the kernel; reset in tests.
  • Overplotting dense scatter — use alpha=0.3, rasterized=True for PDF, or hexbin via Matplotlib.
  • Trusting defaults for skewed finance data — prefer log scales or percentiles; boxplots hide multimodal tails.

Production checklist

  • Pin seaborn, matplotlib, and pandas versions in project lockfiles.
  • Call sns.set_theme once at notebook or script entry; document palette choices for brand consistency.
  • Validate tidy schema before plotting; assert required columns exist.
  • Set explicit order= on categorical axes for reproducible slides.
  • Export with bbox_inches="tight" and dpi=150 minimum for retina displays.
  • Use rasterized=True on heavy scatter layers embedded in vector PDFs.
  • Compose with Matplotlib when Seaborn lacks a primitive; share one Figure object.
  • Pair static Seaborn exports with Plotly dashboards when stakeholders need drill-down.
  • Unit-test data transforms; snapshot-test PNGs only when stable across OS font stacks.
  • Log figure paths and filter hashes in batch pipelines for auditability.

Key takeaways

  • Seaborn maps tidy DataFrame columns to statistical charts on Matplotlib axes.
  • Plot families — relational, distribution, categorical, matrix — cover most EDA questions.
  • Figure-level functions (relplot, catplot, displot) handle faceting cleanly.
  • Themes and palettes give consistent aesthetics; override per-axes when composing complex figures.
  • Plotly wins interactivity; Matplotlib wins bespoke control; Seaborn wins fast statistical defaults.

Related reading