Guide

Streamlit fundamentals explained

A revenue analyst exports CSVs from three systems every Monday, pastes them into Jupyter, and emails screenshots to leadership. The workflow is correct but brittle — filters do not persist, charts are static, and every new question means another notebook cell. Streamlit turns that same Python logic into a browser dashboard: sliders pick date ranges, dropdowns slice regions, and st.line_chart or Plotly figures update on every interaction. You write a top-to-bottom script; Streamlit reruns it when widgets change and syncs state to the browser. No React, no REST boilerplate, no separate frontend repo. This guide covers the rerun execution model, widgets and layout primitives, session state and forms, caching decorators, multipage apps, deployment options, a Harbor Analytics revenue dashboard worked example, a tooling decision table versus Gradio and Dash, common pitfalls, and a production checklist — assuming baseline pandas fluency.

What Streamlit is (and is not)

Streamlit is an open-source Python framework for data applications: dashboards, internal tools, model monitoring panels, and lightweight CRUD explorers. You import streamlit as st, call functions like st.write() and st.dataframe(), and run streamlit run app.py. The server watches the file, reruns the script from line one on each user interaction, and streams updated DOM fragments to the client.

It is not a general-purpose web framework like Django or FastAPI, nor an ML demo shell optimized for single inference calls. Streamlit excels when the UI is mostly read-heavy analytics with filters, KPI cards, and charts. When you need custom component trees, sub-100ms API latency at high QPS, or complex multi-user write paths, graduate to React plus a backend API.

Core concepts

Rerun — every widget change re-executes the entire script; widget values survive via Streamlit’s widget state machinery.
Widget — st.slider, st.selectbox, st.text_input return the current value on each run.
Session state — st.session_state is a dict-like store for values that are not tied to a single widget (counters, wizard steps, fetched IDs).
Cache — @st.cache_data memoizes pure data transforms; @st.cache_resource shares expensive singletons (DB pools, models).
Fragment — @st.fragment reruns only a section of the app, reducing cost on large dashboards (Streamlit 1.33+).
Multipage — pages/ directory or st.navigation groups related views without one giant script.

Installation and the rerun model

Install with pip: pip install streamlit pandas plotly. A minimal revenue trend viewer:

import streamlit as st
import pandas as pd

st.set_page_config(page_title="Revenue", layout="wide")
st.title("Weekly revenue")

@st.cache_data
def load_data():
    return pd.read_parquet("facts/revenue.parquet")

df = load_data()
regions = st.sidebar.multiselect("Region", df["region"].unique(), default=["NA"])
start, end = st.sidebar.date_input("Range", [df["date"].min(), df["date"].max()])

mask = df["region"].isin(regions) & df["date"].between(start, end)
filtered = df.loc[mask]

st.metric("Total revenue", f"${filtered['revenue'].sum():,.0f}")
st.line_chart(filtered.groupby("week")["revenue"].sum())

Run streamlit run app.py; the app opens at http://localhost:8501. When the user toggles a region, Streamlit reruns the script. Lines above the widget execute again — that is why @st.cache_data on load_data matters: without it, every slider tick re-reads the Parquet file from disk.

Execution order pitfalls

Because the script reruns top to bottom, do not place expensive work before widgets unless cached. Pattern: render sidebar filters first, compute filtered datasets second, render charts last. Use st.stop() to bail early when auth fails or required uploads are missing.

Widgets, session state, and forms

Widgets return Python values on each rerun. Assign them to variables and branch logic normally:

threshold = st.slider("Alert if drop exceeds %", 5, 50, 15)
if week_over_week_delta < -threshold:
    st.error(f"Revenue down {abs(week_over_week_delta):.1f}%")

st.session_state holds arbitrary keys across reruns and users (per browser session). Use it for multi-step wizards, storing API tokens after login, or accumulating rows in an editable table:

if "rows" not in st.session_state:
    st.session_state.rows = []

new_row = st.text_input("Add SKU note")
if st.button("Append") and new_row:
    st.session_state.rows.append(new_row)

Forms and batch submission

Without forms, every keystroke in st.text_input triggers a rerun. Wrap related inputs in st.form("filters") so they submit together when the user clicks Submit — essential for text-heavy filter panels and database query builders. Forms cannot contain other forms; place the submit button inside the form context.

Callbacks

on_change and args on widgets run a function before the rerun completes. Use callbacks sparingly to sync external stores; prefer declarative reads from widget return values when possible — easier to reason about.

Caching, data loaders, and performance

Streamlit provides two decorators with different semantics:

@st.cache_data — for functions that return serializable data (DataFrames, dicts, strings). Streamlit hashes arguments and stores a copy of the return value. Use ttl=3600 for warehouse queries that may stale.
@st.cache_resource — for non-serializable resources: SQLAlchemy engines, gRPC clients, loaded torch models. One instance is shared across reruns and users on the same server process.

Call st.cache_data.clear() from a debug button after upstream ETL refreshes. For large DataFrames, prefer Parquet over CSV and filter before charting — sending a million rows to st.dataframe bloats browser memory.

Fragments for partial reruns

On dashboards with twenty charts, a single date-picker rerun recomputes everything. Annotate independent sections with @st.fragment so only that block reruns when its widgets change. Fragments are the closest Streamlit gets to component-level reactivity without leaving Python.

Layouts, charts, and multipage apps

Layout primitives mirror common dashboard patterns:

st.columns([2, 1]) — side-by-side KPI and detail panels.
st.tabs(["Overview", "Cohorts"]) — group related views without separate URLs.
st.expander("SQL query") — hide advanced filters.
st.container(border=True) — visual grouping with glass-card feel.

Native st.line_chart and st.bar_chart wrap Altair for quick plots. For interactive zoom, hover, and faceting, pass Plotly figures to st.plotly_chart(fig, use_container_width=True). Display tables with st.dataframe (sortable, scrollable) or st.data_editor when analysts need inline edits that sync back to session state.

Multipage structure

Create a pages/ folder beside app.py; Streamlit auto- discovers files and builds sidebar navigation. Prefix filenames with numbers (1_Overview.py, 2_Cohorts.py) to control order. Each page is its own script with independent reruns but shared st.session_state within the session. For programmatic menus, use st.navigation and st.Page (Streamlit 1.36+) when you need dynamic page lists based on user role.

Deployment: Community Cloud, Docker, and secrets

Streamlit Community Cloud connects to a public GitHub repo, installs requirements.txt, and hosts the app at https://<app>.streamlit.app. Configure secrets in the dashboard as TOML — database URLs, API keys — read via st.secrets["db"]["url"]. Free tier suits prototypes; paid teams add private apps and SSO.

For VPC or on-prem hosting, containerize:

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

Run behind nginx or a cloud load balancer with TLS. Set --server.enableCORS=false only when the reverse proxy handles CORS. Use st.authentication (Streamlit 1.42+) or place the app behind corporate SSO via OAuth2 proxy for internal dashboards with PII.

Worked example: Harbor Analytics revenue dashboard

Harbor Analytics replaced a Monday-morning notebook with a multipage Streamlit app for the finance team:

Overview page — sidebar date range and currency toggle; four st.metric cards (MRR, churn, expansion, net new) with delta arrows versus prior period.
Data layer — @st.cache_data(ttl=900) pulls from Snowflake via SQLAlchemy; query parameterized by filters passed as function arguments.
Cohorts page — Plotly heatmap of retention by signup month; @st.fragment isolates cohort window slider reruns.
Drill-down — st.dataframe with column config for currency formatting; clicking a row stores account_id in session state for the Detail page.
Exports — st.download_button generates filtered CSV without a separate export job.
Auth — Community Cloud secrets hold Snowflake creds; app deployed on a private team URL.

Finance cut ad-hoc Slack data requests by roughly half in the first month because filters were self-serve. When external partners needed API access, the SQL and pandas transforms moved unchanged into a FastAPI service — Streamlit proved the metrics definitions.

Tooling decision table

Goal	Favor	Avoid
Internal KPI dashboard with filters	Streamlit multipage app, cache_data on warehouse queries	Hand-built React before metrics stabilize
Shareable ML inference demo	Gradio ChatInterface or Interface	Streamlit when the UI is one button and one model output
Pixel-perfect executive PDF reports	Jupyter + nbconvert or dedicated BI (Looker, Metabase)	Streamlit when layout must match brand guidelines exactly
High-traffic customer-facing product	React/Vue + FastAPI, dedicated auth	Streamlit Community Cloud free tier for production SLAs
Complex Dash-style callbacks	Plotly Dash if callback graph is the core UX	Streamlit when rerun model fights intricate cross-filter wiring
Notebook exploration to dashboard	Streamlit — paste cells, add widgets, deploy	Shipping raw Jupyter notebooks to non-technical stakeholders

Common pitfalls

Uncached heavy loads — re-querying Snowflake on every slider tick burns credits; always cache with explicit TTL.
Mutating cached DataFrames in place — corrupts the cache for other users; return copies or use copy=True in cache_data.
Using session state for widget values — duplicates widget state and causes sync bugs; read from widget returns unless you need non-widget data.
Forms with dynamic widgets — widget count cannot change between form reruns; rebuild forms carefully.
Secrets in git — st.secrets TOML files committed to public repos leak credentials; use platform secret stores.
Giant unfiltered tables — million-row st.dataframe freezes browsers; paginate or aggregate server-side.
Assuming multi-user isolation in cache_resource — shared DB pools are fine; per-user data in a global variable is not.
No set_page_config first — must be the first Streamlit command; calling it after widgets throws an error.

Practitioner checklist

Call st.set_page_config(layout="wide") as the first Streamlit line.
Wrap all data loads and transforms in @st.cache_data with sensible TTL.
Put filters in the sidebar; keep the main area for charts and tables.
Use st.form when text inputs should not rerun on every keystroke.
Prefer Plotly for interactive charts; reserve native charts for quick prototypes.
Split large apps into multipage pages/ or st.navigation.
Apply @st.fragment to expensive sections on mature dashboards.
Store credentials in st.secrets or environment variables, never in source.
Add st.download_button for CSV exports stakeholders expect.
Containerize with pinned streamlit==1.x for reproducible deploys.
Plan graduation to FastAPI + SPA when SLAs, auth, or QPS outgrow rerun semantics.

Key takeaways

Streamlit turns Python scripts into interactive dashboards by rerunning the script on widget changes.
cache_data and cache_resource separate data memoization from shared connections and models.
Session state and forms manage multi-step flows without JavaScript.
Multipage apps scale structure as dashboards grow beyond one file.
Streamlit is ideal for internal analytics and metric validation — production customer products usually need a dedicated API and frontend.