Guide
CrewAI fundamentals explained
Many production LLM workflows are not one model call — they are small teams: a researcher gathers facts, a writer drafts copy, an editor enforces tone, and a manager decides who works next. CrewAI is a Python framework that models this pattern explicitly through agents (role, goal, backstory), tasks (description, expected output, assigned agent), and crews that orchestrate execution in sequence or under a hierarchical manager. Where LangGraph expresses control flow as typed state graphs with checkpoints and interrupts, CrewAI optimizes for human-readable role playbooks and YAML-driven crew definitions — a different mental model for the same multi-agent orchestration problem. This guide covers core primitives, process types, tools and memory, CrewAI Flow, observability, a Harbor Marketing content crew worked example, a framework decision table, common pitfalls, and a practitioner checklist.
What CrewAI is (and is not)
CrewAI is a role-based multi-agent orchestration framework. Each agent
receives a persona (role, goal, backstory) that shapes how the underlying LLM reasons.
Tasks are first-class objects with descriptions, expected output schemas, and optional
context from prior task results. A Crew wires agents to tasks and selects a
process: sequential handoffs, hierarchical delegation, or (via CrewAI
Flow) event-driven graphs.
It is not a retrieval framework, a model host, or a replacement for every agent pattern. Single-shot completions, rigid CRUD APIs, and teams already standardized on MCP tool servers with thin clients may need less framework. Reach for CrewAI when work naturally decomposes into specialist roles, when non-engineers need to edit agent personas in YAML, or when sequential pipelines with clear deliverables dominate over arbitrary graph cycles.
Core primitives
- Agent — an LLM-backed worker with
role,goal,backstory, optional tools, and delegation settings. - Task — a unit of work with
description,expected_output, assignedagent, and optionalcontextfrom upstream tasks. - Crew — binds agents and tasks; runs
kickoff()orkickoff_async()to produce a final result. - Process —
sequential(tasks run in order) orhierarchical(a manager agent delegates dynamically). - Tool — Python callables or LangChain-compatible tools agents invoke during task execution.
Agents: roles, goals, and backstories
CrewAI’s distinguishing feature is persona injection. The
role names the job (“Senior Market Research Analyst”), the
goal states success criteria (“Find three verifiable statistics about
container freight rates”), and the backstory adds constraints and tone
(“You cite primary sources; you never invent figures”). These strings are
prepended to the system prompt on every agent turn, so prompt engineering investment
lives in configuration rather than scattered Python strings.
Key agent flags:
allow_delegation— whether the agent can assign subtasks to peers (common for manager agents in hierarchical crews).verbose— logs intermediate reasoning to stdout; useful in development, noisy in production.max_iter— caps tool-call loops per task; prevents runaway spend.llm— override the default model per agent (e.g. cheap model for research, capable model for writing).
Agents can share tools or have role-specific toolkits. Wrap external APIs (search, CRM, spreadsheets) as function-calling tools with clear docstrings — the LLM chooses tools based on role context.
Tasks: decomposition, context, and output contracts
A crew’s quality depends more on task design than on model choice.
Each task should have a single deliverable, an explicit expected_output
(format, length, sections), and — when order matters — a
context list referencing prior tasks so downstream agents receive upstream
artifacts without re-querying the LLM for summaries.
Example decomposition for a product launch brief:
- Research task — gather competitor pricing and feature gaps; output bullet list with URLs.
- Strategy task — context = research; output positioning statement and three messaging pillars.
- Draft task — context = strategy; output 400-word landing page copy.
- Review task — context = draft; output marked-up revision with compliance flags.
Use output_file on tasks to persist artifacts to disk. Set
async_execution only when tasks are truly independent — premature
parallelism loses context chains and complicates debugging.
Crews and process types
Sequential process
The default Process.sequential runs tasks in list order. Each agent completes
its task; output flows to tasks that declare it in context. This mirrors
assembly-line workflows: research, then write, then edit. Simple, predictable, and easy
to trace in logs.
Hierarchical process
Process.hierarchical introduces a manager agent that reads
the crew goal, delegates tasks to specialists, and synthesizes results. The manager
typically has allow_delegation=True; workers do not. Use this when task
order is not known upfront — the manager decides which specialist to invoke based
on intermediate findings. Cost is higher (extra manager LLM calls) but flexibility
increases.
CrewAI Flow
For event-driven or branching logic beyond linear sequences, CrewAI Flow
decorates Python methods with @start, @listen, and
@router to build lightweight state machines. Flows can embed crews as steps
— combining role-based teams with conditional routing (e.g. “if legal
review fails, route back to writer”). When you need durable checkpoints and
human-in-the-loop interrupts at every node, evaluate LangGraph instead; Flow covers
moderate complexity without full graph infrastructure.
Tools, memory, and configuration
Tools
Define tools with the @tool decorator or import LangChain tool classes.
Scope tools narrowly per agent: the researcher gets search and scrape tools; the editor
gets a style-guide lookup, not live web access. Validate tool inputs server-side; agents
will pass malformed arguments.
Memory
CrewAI supports short-term memory (within a crew run), long-term memory (persisted across runs via SQLite or external stores), and entity memory (facts about people, companies, or products). Enable memory when crews run repeatedly against the same accounts; disable it for one-off batch jobs where stale facts cause hallucinated continuity.
YAML crews
Non-engineers can maintain agents.yaml and tasks.yaml with
role definitions and task templates, loaded via CrewBase subclasses. This
separates persona copy from execution code — valuable when marketing or ops teams
iterate prompts without redeploying Python.
LLM configuration
Set a default LLM through environment variables or pass provider-specific objects
(ChatOpenAI, ChatAnthropic, local Ollama wrappers). Mix models
per agent: a fast, cheap model for classification tasks; a frontier model for final
synthesis. Log token usage per agent in production; role-heavy backstories inflate prompt
size on every turn.
Worked example: Harbor Marketing content crew
Harbor Marketing publishes weekly explainers for Harbor Logistics and Harbor Industrial product lines. A four-agent sequential crew produces draft-ready articles:
- Topic Scout (researcher) — scans internal briefs and public trade press; outputs five candidate angles with source links and audience fit notes.
- Outline Architect (strategist) — context = scout output; picks one angle; outputs H2/H3 outline with key claims to verify.
- Draft Writer — context = outline; produces ~1,200-word draft following Harbor style (no hype, cite statistics, family-safe tone).
- Compliance Editor — context = draft; flags unsubstantiated claims, suggests cuts, outputs final markdown with a change log.
The crew runs Process.sequential with memory=False (each article
is independent). Research tools are limited to an approved domain allowlist. The
Compliance Editor has no web tools — it only critiques the draft. Total runtime
target: under four minutes; if the writer exceeds token budget, max_iter on
research agents prevents infinite search loops. Human editors review the output before
publish; the crew accelerates first drafts, not autonomous publishing.
Framework decision table
| Scenario | Prefer | Why |
|---|---|---|
| Role-specialist pipeline with YAML personas | CrewAI sequential crew | Readable configs; minimal graph boilerplate |
| Dynamic delegation when task order is unknown | CrewAI hierarchical process | Built-in manager agent pattern |
| Cycles, interrupts, multi-day human approval | LangGraph | Checkpoint persistence and interrupt() primitives |
| Single retrieval + answer, no roles | LlamaIndex query engine | Retrieval-first; crews add overhead |
| Portable tools across Claude, IDEs, internal hosts | MCP servers | Standard wire protocol; framework-agnostic |
| One chain, three steps, no agents | LangChain LCEL or raw SDK | Less abstraction than a four-agent crew |
Common pitfalls
- Vague expected_output — “Write a good report” produces inconsistent formats; specify sections and length.
- Missing context chains — downstream agents re-research from scratch, duplicating cost and conflicting facts.
- Over-delegation — hierarchical crews where every agent can delegate create circular assignments and token explosions.
- Shared tool sprawl — giving every agent web search when only one role needs it increases hallucination and latency.
- Verbose in production — logging full chain-of-thought to stdout leaks sensitive context and fills disks.
- Ignoring max_iter — tool loops on malformed API responses burn budget until timeout.
- Memory without hygiene — long-term memory stores outdated facts that poison later runs.
- Crew for single-shot tasks — four agents to summarize one paragraph; use a direct completion instead.
Practitioner checklist
- Pin
crewaiand provider SDK versions; test upgrades on a golden crew run. - Write task
expected_outputas if briefing a human contractor. - Chain tasks with explicit
context; never assume agents “remember” prior steps. - Scope tools per role; validate inputs in tool implementations.
- Set
max_iterand per-agent model tiers before production. - Start sequential; add hierarchical or Flow routing only when metrics justify complexity.
- Log final outputs and tool calls; redact PII before persistence.
- Keep a human approval gate for customer-facing or compliance-sensitive deliverables.
- Benchmark crew cost against a single-agent baseline quarterly.
- Revisit LangGraph or MCP if you need durable interrupts more than role personas.
Key takeaways
- CrewAI models multi-agent work as role-based agents completing explicit tasks in orchestrated crews.
- Task design and
expected_outputcontracts matter more than stacking more agents. - Sequential crews suit pipelines; hierarchical crews suit dynamic delegation.
- CrewAI Flow bridges simple crews and graph-style routing without full LangGraph infrastructure.
- Pair crews with MCP or LangGraph when tool portability or durable human-in-the-loop state becomes the bottleneck.
Related reading
- Multi-agent orchestration explained — topologies, handoffs, and cost controls across frameworks
- LangGraph fundamentals explained — stateful graphs, checkpoints, and human-in-the-loop interrupts
- LangChain fundamentals explained — LCEL chains and tool-calling agents under the hood
- LLM function calling explained — designing tools agents invoke safely