Guide

CrewAI fundamentals explained

Many production LLM workflows are not one model call — they are small teams: a researcher gathers facts, a writer drafts copy, an editor enforces tone, and a manager decides who works next. CrewAI is a Python framework that models this pattern explicitly through agents (role, goal, backstory), tasks (description, expected output, assigned agent), and crews that orchestrate execution in sequence or under a hierarchical manager. Where LangGraph expresses control flow as typed state graphs with checkpoints and interrupts, CrewAI optimizes for human-readable role playbooks and YAML-driven crew definitions — a different mental model for the same multi-agent orchestration problem. This guide covers core primitives, process types, tools and memory, CrewAI Flow, observability, a Harbor Marketing content crew worked example, a framework decision table, common pitfalls, and a practitioner checklist.

What CrewAI is (and is not)

CrewAI is a role-based multi-agent orchestration framework. Each agent receives a persona (role, goal, backstory) that shapes how the underlying LLM reasons. Tasks are first-class objects with descriptions, expected output schemas, and optional context from prior task results. A Crew wires agents to tasks and selects a process: sequential handoffs, hierarchical delegation, or (via CrewAI Flow) event-driven graphs.

It is not a retrieval framework, a model host, or a replacement for every agent pattern. Single-shot completions, rigid CRUD APIs, and teams already standardized on MCP tool servers with thin clients may need less framework. Reach for CrewAI when work naturally decomposes into specialist roles, when non-engineers need to edit agent personas in YAML, or when sequential pipelines with clear deliverables dominate over arbitrary graph cycles.

Core primitives

  • Agent — an LLM-backed worker with role, goal, backstory, optional tools, and delegation settings.
  • Task — a unit of work with description, expected_output, assigned agent, and optional context from upstream tasks.
  • Crew — binds agents and tasks; runs kickoff() or kickoff_async() to produce a final result.
  • Processsequential (tasks run in order) or hierarchical (a manager agent delegates dynamically).
  • Tool — Python callables or LangChain-compatible tools agents invoke during task execution.

Agents: roles, goals, and backstories

CrewAI’s distinguishing feature is persona injection. The role names the job (“Senior Market Research Analyst”), the goal states success criteria (“Find three verifiable statistics about container freight rates”), and the backstory adds constraints and tone (“You cite primary sources; you never invent figures”). These strings are prepended to the system prompt on every agent turn, so prompt engineering investment lives in configuration rather than scattered Python strings.

Key agent flags:

  • allow_delegation — whether the agent can assign subtasks to peers (common for manager agents in hierarchical crews).
  • verbose — logs intermediate reasoning to stdout; useful in development, noisy in production.
  • max_iter — caps tool-call loops per task; prevents runaway spend.
  • llm — override the default model per agent (e.g. cheap model for research, capable model for writing).

Agents can share tools or have role-specific toolkits. Wrap external APIs (search, CRM, spreadsheets) as function-calling tools with clear docstrings — the LLM chooses tools based on role context.

Tasks: decomposition, context, and output contracts

A crew’s quality depends more on task design than on model choice. Each task should have a single deliverable, an explicit expected_output (format, length, sections), and — when order matters — a context list referencing prior tasks so downstream agents receive upstream artifacts without re-querying the LLM for summaries.

Example decomposition for a product launch brief:

  1. Research task — gather competitor pricing and feature gaps; output bullet list with URLs.
  2. Strategy task — context = research; output positioning statement and three messaging pillars.
  3. Draft task — context = strategy; output 400-word landing page copy.
  4. Review task — context = draft; output marked-up revision with compliance flags.

Use output_file on tasks to persist artifacts to disk. Set async_execution only when tasks are truly independent — premature parallelism loses context chains and complicates debugging.

Crews and process types

Sequential process

The default Process.sequential runs tasks in list order. Each agent completes its task; output flows to tasks that declare it in context. This mirrors assembly-line workflows: research, then write, then edit. Simple, predictable, and easy to trace in logs.

Hierarchical process

Process.hierarchical introduces a manager agent that reads the crew goal, delegates tasks to specialists, and synthesizes results. The manager typically has allow_delegation=True; workers do not. Use this when task order is not known upfront — the manager decides which specialist to invoke based on intermediate findings. Cost is higher (extra manager LLM calls) but flexibility increases.

CrewAI Flow

For event-driven or branching logic beyond linear sequences, CrewAI Flow decorates Python methods with @start, @listen, and @router to build lightweight state machines. Flows can embed crews as steps — combining role-based teams with conditional routing (e.g. “if legal review fails, route back to writer”). When you need durable checkpoints and human-in-the-loop interrupts at every node, evaluate LangGraph instead; Flow covers moderate complexity without full graph infrastructure.

Tools, memory, and configuration

Tools

Define tools with the @tool decorator or import LangChain tool classes. Scope tools narrowly per agent: the researcher gets search and scrape tools; the editor gets a style-guide lookup, not live web access. Validate tool inputs server-side; agents will pass malformed arguments.

Memory

CrewAI supports short-term memory (within a crew run), long-term memory (persisted across runs via SQLite or external stores), and entity memory (facts about people, companies, or products). Enable memory when crews run repeatedly against the same accounts; disable it for one-off batch jobs where stale facts cause hallucinated continuity.

YAML crews

Non-engineers can maintain agents.yaml and tasks.yaml with role definitions and task templates, loaded via CrewBase subclasses. This separates persona copy from execution code — valuable when marketing or ops teams iterate prompts without redeploying Python.

LLM configuration

Set a default LLM through environment variables or pass provider-specific objects (ChatOpenAI, ChatAnthropic, local Ollama wrappers). Mix models per agent: a fast, cheap model for classification tasks; a frontier model for final synthesis. Log token usage per agent in production; role-heavy backstories inflate prompt size on every turn.

Worked example: Harbor Marketing content crew

Harbor Marketing publishes weekly explainers for Harbor Logistics and Harbor Industrial product lines. A four-agent sequential crew produces draft-ready articles:

  1. Topic Scout (researcher) — scans internal briefs and public trade press; outputs five candidate angles with source links and audience fit notes.
  2. Outline Architect (strategist) — context = scout output; picks one angle; outputs H2/H3 outline with key claims to verify.
  3. Draft Writer — context = outline; produces ~1,200-word draft following Harbor style (no hype, cite statistics, family-safe tone).
  4. Compliance Editor — context = draft; flags unsubstantiated claims, suggests cuts, outputs final markdown with a change log.

The crew runs Process.sequential with memory=False (each article is independent). Research tools are limited to an approved domain allowlist. The Compliance Editor has no web tools — it only critiques the draft. Total runtime target: under four minutes; if the writer exceeds token budget, max_iter on research agents prevents infinite search loops. Human editors review the output before publish; the crew accelerates first drafts, not autonomous publishing.

Framework decision table

Scenario Prefer Why
Role-specialist pipeline with YAML personas CrewAI sequential crew Readable configs; minimal graph boilerplate
Dynamic delegation when task order is unknown CrewAI hierarchical process Built-in manager agent pattern
Cycles, interrupts, multi-day human approval LangGraph Checkpoint persistence and interrupt() primitives
Single retrieval + answer, no roles LlamaIndex query engine Retrieval-first; crews add overhead
Portable tools across Claude, IDEs, internal hosts MCP servers Standard wire protocol; framework-agnostic
One chain, three steps, no agents LangChain LCEL or raw SDK Less abstraction than a four-agent crew

Common pitfalls

  • Vague expected_output — “Write a good report” produces inconsistent formats; specify sections and length.
  • Missing context chains — downstream agents re-research from scratch, duplicating cost and conflicting facts.
  • Over-delegation — hierarchical crews where every agent can delegate create circular assignments and token explosions.
  • Shared tool sprawl — giving every agent web search when only one role needs it increases hallucination and latency.
  • Verbose in production — logging full chain-of-thought to stdout leaks sensitive context and fills disks.
  • Ignoring max_iter — tool loops on malformed API responses burn budget until timeout.
  • Memory without hygiene — long-term memory stores outdated facts that poison later runs.
  • Crew for single-shot tasks — four agents to summarize one paragraph; use a direct completion instead.

Practitioner checklist

  • Pin crewai and provider SDK versions; test upgrades on a golden crew run.
  • Write task expected_output as if briefing a human contractor.
  • Chain tasks with explicit context; never assume agents “remember” prior steps.
  • Scope tools per role; validate inputs in tool implementations.
  • Set max_iter and per-agent model tiers before production.
  • Start sequential; add hierarchical or Flow routing only when metrics justify complexity.
  • Log final outputs and tool calls; redact PII before persistence.
  • Keep a human approval gate for customer-facing or compliance-sensitive deliverables.
  • Benchmark crew cost against a single-agent baseline quarterly.
  • Revisit LangGraph or MCP if you need durable interrupts more than role personas.

Key takeaways

  • CrewAI models multi-agent work as role-based agents completing explicit tasks in orchestrated crews.
  • Task design and expected_output contracts matter more than stacking more agents.
  • Sequential crews suit pipelines; hierarchical crews suit dynamic delegation.
  • CrewAI Flow bridges simple crews and graph-style routing without full LangGraph infrastructure.
  • Pair crews with MCP or LangGraph when tool portability or durable human-in-the-loop state becomes the bottleneck.

Related reading