Guide
Streamlit fundamentals explained
A revenue analyst exports CSVs from three systems every Monday, pastes them into
Jupyter,
and emails screenshots to leadership. The workflow is correct but brittle —
filters do not persist, charts are static, and every new question means another
notebook cell. Streamlit turns that same
Python
logic into a browser dashboard: sliders pick date ranges, dropdowns slice regions,
and st.line_chart or
Plotly
figures update on every interaction. You write a top-to-bottom script; Streamlit
reruns it when widgets change and syncs state to the browser. No React, no REST
boilerplate, no separate frontend repo. This guide covers the rerun execution model,
widgets and layout primitives, session state and forms, caching decorators,
multipage apps, deployment options, a Harbor Analytics revenue dashboard worked
example, a tooling decision table versus
Gradio
and Dash, common pitfalls, and a production checklist — assuming baseline
pandas
fluency.
What Streamlit is (and is not)
Streamlit is an open-source Python framework for data applications:
dashboards, internal tools, model monitoring panels, and lightweight CRUD explorers.
You import streamlit as st, call functions like
st.write() and st.dataframe(), and run
streamlit run app.py. The server watches the file, reruns the script
from line one on each user interaction, and streams updated DOM fragments to the
client.
It is not a general-purpose web framework like Django or FastAPI, nor an ML demo shell optimized for single inference calls. Streamlit excels when the UI is mostly read-heavy analytics with filters, KPI cards, and charts. When you need custom component trees, sub-100ms API latency at high QPS, or complex multi-user write paths, graduate to React plus a backend API.
Core concepts
- Rerun — every widget change re-executes the entire script; widget values survive via Streamlit’s widget state machinery.
- Widget —
st.slider,st.selectbox,st.text_inputreturn the current value on each run. - Session state —
st.session_stateis a dict-like store for values that are not tied to a single widget (counters, wizard steps, fetched IDs). - Cache —
@st.cache_datamemoizes pure data transforms;@st.cache_resourceshares expensive singletons (DB pools, models). - Fragment —
@st.fragmentreruns only a section of the app, reducing cost on large dashboards (Streamlit 1.33+). - Multipage —
pages/directory orst.navigationgroups related views without one giant script.
Installation and the rerun model
Install with pip: pip install streamlit pandas plotly. A minimal
revenue trend viewer:
import streamlit as st
import pandas as pd
st.set_page_config(page_title="Revenue", layout="wide")
st.title("Weekly revenue")
@st.cache_data
def load_data():
return pd.read_parquet("facts/revenue.parquet")
df = load_data()
regions = st.sidebar.multiselect("Region", df["region"].unique(), default=["NA"])
start, end = st.sidebar.date_input("Range", [df["date"].min(), df["date"].max()])
mask = df["region"].isin(regions) & df["date"].between(start, end)
filtered = df.loc[mask]
st.metric("Total revenue", f"${filtered['revenue'].sum():,.0f}")
st.line_chart(filtered.groupby("week")["revenue"].sum())
Run streamlit run app.py; the app opens at
http://localhost:8501. When the user toggles a region, Streamlit
reruns the script. Lines above the widget execute again — that is why
@st.cache_data on load_data matters: without it, every
slider tick re-reads the Parquet file from disk.
Execution order pitfalls
Because the script reruns top to bottom, do not place expensive work
before widgets unless cached. Pattern: render sidebar filters first,
compute filtered datasets second, render charts last. Use
st.stop() to bail early when auth fails or required uploads are
missing.
Widgets, session state, and forms
Widgets return Python values on each rerun. Assign them to variables and branch logic normally:
threshold = st.slider("Alert if drop exceeds %", 5, 50, 15)
if week_over_week_delta < -threshold:
st.error(f"Revenue down {abs(week_over_week_delta):.1f}%")
st.session_state holds arbitrary keys across reruns and users
(per browser session). Use it for multi-step wizards, storing API tokens after
login, or accumulating rows in an editable table:
if "rows" not in st.session_state:
st.session_state.rows = []
new_row = st.text_input("Add SKU note")
if st.button("Append") and new_row:
st.session_state.rows.append(new_row)
Forms and batch submission
Without forms, every keystroke in st.text_input triggers a rerun.
Wrap related inputs in st.form("filters") so they submit together
when the user clicks Submit — essential for text-heavy
filter panels and database query builders. Forms cannot contain other forms; place
the submit button inside the form context.
Callbacks
on_change and args on widgets run a function before the
rerun completes. Use callbacks sparingly to sync external stores; prefer declarative
reads from widget return values when possible — easier to reason about.
Caching, data loaders, and performance
Streamlit provides two decorators with different semantics:
@st.cache_data— for functions that return serializable data (DataFrames, dicts, strings). Streamlit hashes arguments and stores a copy of the return value. Usettl=3600for warehouse queries that may stale.@st.cache_resource— for non-serializable resources: SQLAlchemy engines, gRPC clients, loaded torch models. One instance is shared across reruns and users on the same server process.
Call st.cache_data.clear() from a debug button after upstream ETL
refreshes. For large DataFrames, prefer Parquet over CSV and filter before
charting — sending a million rows to
st.dataframe bloats browser memory.
Fragments for partial reruns
On dashboards with twenty charts, a single date-picker rerun recomputes everything.
Annotate independent sections with @st.fragment so only that block
reruns when its widgets change. Fragments are the closest Streamlit gets to
component-level reactivity without leaving Python.
Layouts, charts, and multipage apps
Layout primitives mirror common dashboard patterns:
st.columns([2, 1])— side-by-side KPI and detail panels.st.tabs(["Overview", "Cohorts"])— group related views without separate URLs.st.expander("SQL query")— hide advanced filters.st.container(border=True)— visual grouping with glass-card feel.
Native st.line_chart and st.bar_chart wrap Altair for
quick plots. For interactive zoom, hover, and faceting, pass Plotly figures to
st.plotly_chart(fig, use_container_width=True). Display tables with
st.dataframe (sortable, scrollable) or st.data_editor
when analysts need inline edits that sync back to session state.
Multipage structure
Create a pages/ folder beside app.py; Streamlit auto-
discovers files and builds sidebar navigation. Prefix filenames with numbers
(1_Overview.py, 2_Cohorts.py) to control order. Each
page is its own script with independent reruns but shared
st.session_state within the session. For programmatic menus, use
st.navigation and st.Page (Streamlit 1.36+) when you
need dynamic page lists based on user role.
Deployment: Community Cloud, Docker, and secrets
Streamlit Community Cloud connects to a public GitHub repo, installs
requirements.txt, and hosts the app at
https://<app>.streamlit.app. Configure secrets in the dashboard
as TOML — database URLs, API keys — read via
st.secrets["db"]["url"]. Free tier suits prototypes; paid teams add
private apps and SSO.
For VPC or on-prem hosting, containerize:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
Run behind nginx or a cloud load balancer with TLS. Set
--server.enableCORS=false only when the reverse proxy handles CORS.
Use st.authentication (Streamlit 1.42+) or place the app behind
corporate SSO via OAuth2 proxy for internal dashboards with PII.
Worked example: Harbor Analytics revenue dashboard
Harbor Analytics replaced a Monday-morning notebook with a multipage Streamlit app for the finance team:
- Overview page — sidebar date range and currency toggle; four
st.metriccards (MRR, churn, expansion, net new) with delta arrows versus prior period. - Data layer —
@st.cache_data(ttl=900)pulls from Snowflake via SQLAlchemy; query parameterized by filters passed as function arguments. - Cohorts page — Plotly heatmap of retention by signup month;
@st.fragmentisolates cohort window slider reruns. - Drill-down —
st.dataframewith column config for currency formatting; clicking a row storesaccount_idin session state for the Detail page. - Exports —
st.download_buttongenerates filtered CSV without a separate export job. - Auth — Community Cloud secrets hold Snowflake creds; app deployed on a private team URL.
Finance cut ad-hoc Slack data requests by roughly half in the first month because filters were self-serve. When external partners needed API access, the SQL and pandas transforms moved unchanged into a FastAPI service — Streamlit proved the metrics definitions.
Tooling decision table
| Goal | Favor | Avoid |
|---|---|---|
| Internal KPI dashboard with filters | Streamlit multipage app, cache_data on warehouse queries | Hand-built React before metrics stabilize |
| Shareable ML inference demo | Gradio ChatInterface or Interface | Streamlit when the UI is one button and one model output |
| Pixel-perfect executive PDF reports | Jupyter + nbconvert or dedicated BI (Looker, Metabase) | Streamlit when layout must match brand guidelines exactly |
| High-traffic customer-facing product | React/Vue + FastAPI, dedicated auth | Streamlit Community Cloud free tier for production SLAs |
| Complex Dash-style callbacks | Plotly Dash if callback graph is the core UX | Streamlit when rerun model fights intricate cross-filter wiring |
| Notebook exploration to dashboard | Streamlit — paste cells, add widgets, deploy | Shipping raw Jupyter notebooks to non-technical stakeholders |
Common pitfalls
- Uncached heavy loads — re-querying Snowflake on every slider tick burns credits; always cache with explicit TTL.
- Mutating cached DataFrames in place — corrupts the cache for other users; return copies or use
copy=Truein cache_data. - Using session state for widget values — duplicates widget state and causes sync bugs; read from widget returns unless you need non-widget data.
- Forms with dynamic widgets — widget count cannot change between form reruns; rebuild forms carefully.
- Secrets in git —
st.secretsTOML files committed to public repos leak credentials; use platform secret stores. - Giant unfiltered tables — million-row
st.dataframefreezes browsers; paginate or aggregate server-side. - Assuming multi-user isolation in cache_resource — shared DB pools are fine; per-user data in a global variable is not.
- No
set_page_configfirst — must be the first Streamlit command; calling it after widgets throws an error.
Practitioner checklist
- Call
st.set_page_config(layout="wide")as the first Streamlit line. - Wrap all data loads and transforms in
@st.cache_datawith sensible TTL. - Put filters in the sidebar; keep the main area for charts and tables.
- Use
st.formwhen text inputs should not rerun on every keystroke. - Prefer Plotly for interactive charts; reserve native charts for quick prototypes.
- Split large apps into multipage
pages/orst.navigation. - Apply
@st.fragmentto expensive sections on mature dashboards. - Store credentials in
st.secretsor environment variables, never in source. - Add
st.download_buttonfor CSV exports stakeholders expect. - Containerize with pinned
streamlit==1.xfor reproducible deploys. - Plan graduation to FastAPI + SPA when SLAs, auth, or QPS outgrow rerun semantics.
Key takeaways
- Streamlit turns Python scripts into interactive dashboards by rerunning the script on widget changes.
- cache_data and cache_resource separate data memoization from shared connections and models.
- Session state and forms manage multi-step flows without JavaScript.
- Multipage apps scale structure as dashboards grow beyond one file.
- Streamlit is ideal for internal analytics and metric validation — production customer products usually need a dedicated API and frontend.
Related reading
- Pandas fundamentals explained — DataFrame transforms that power Streamlit tables and charts
- Plotly fundamentals explained — interactive charts inside
st.plotly_chart - Gradio fundamentals explained — ML demo UIs when inference, not analytics, is the focus
- Jupyter fundamentals explained — exploratory analysis before dashboard hardening