Почему CrewAI не работает иерархически и как починить

Оркестрация нескольких агентов

Создание команд агентов — одно из самых перспективных применений больших языковых моделей, и CrewAI стал популярным инструментом для таких задач. Однако ключевая функция — иерархический процесс менеджера и рабочих — не работает так, как описано в документации. В реальных сценариях менеджер не координирует агентов должным образом; вместо этого CrewAI выполняет задачи по порядку, что приводит к ошибкам в рассуждениях, лишним вызовам инструментов и большой задержке. Эта проблема обсуждается в онлайн-форумах, но четкого решения нет.

В этой статье показано, почему иерархический процесс в CrewAI дает сбой, приведены доказательства из трассировок Langfuse, и описан воспроизводимый способ сделать паттерн менеджера-рабочий надежным с помощью кастомных промтов.

Оркестрация нескольких агентов

Сначала разберем, что значит оркестрация в контексте агентов. Вкратце, это управление и координация взаимосвязанных задач в процессе. Но разве инструменты для управления workflow, вроде RPA, не делали это всегда? Что же изменилось с появлением больших языковых моделей?

Ответ в том, что модели понимают смысл и намерения из естественного языка, подобно людям в команде. Раньше инструменты были жесткими и основанными на правилах, а теперь агенты на базе моделей могут разобрать запрос пользователя, спланировать шаги с помощью рассуждений, выбрать нужные инструменты, подготовить для них входные данные в правильном формате и собрать промежуточные результаты в точный ответ. Фреймворки оркестрации помогают моделям с промтами для планирования, вызовов инструментов и генерации ответов.

Среди таких фреймворков CrewAI больше всего полагается на понимание языка моделью, поскольку задачи, агенты и команды определяются на естественном языке. В отличие от более детерминированного LangGraph (выходы моделей не полностью предсказуемы), CrewAI упрощает маршрутизацию, обработку ошибок и другие сложности через удобные конструкции с параметрами, которые можно настроить. Это делает его подходящим для прототипов в продуктовых командах и даже для тех, кто не программист.

Только паттерн менеджера-рабочий работает не так, как задумано...

Чтобы показать это, возьмем пример сценария и оценим результат по критериям:

Качество оркестрации
Качество итогового ответа
Понятность процесса
Задержка и стоимость использования

Пример сценария

Представьте команду агентов поддержки клиентов, решающих технические или финансовые тикеты. При поступлении тикета агент триажа классифицирует его и передает специалисту по технике или финансам. Менеджер поддержки координирует команду, распределяет задачи и проверяет качество ответов.

Вместе они решают запросы вроде:

Почему мой ноутбук перегревается?
Почему меня дважды списали деньги в прошлом месяце?
Мой ноутбук перегревается, и меня дважды списали в прошлом месяце?
Сумма в счете неверная после сбоя системы?

Первый запрос чисто технический, так что менеджер должен задействовать только специалиста по технике. Второй — только по финансам. Третий и четвертый требуют вклада от обоих.

Построим такую команду в CrewAI и посмотрим, как она справится.

Иерархический процесс

В документации CrewAI говорится, что иерархический подход создает четкую структуру управления задачами: агент-менеджер координирует процесс, распределяет задачи и проверяет результаты для эффективного выполнения. Менеджера можно создать автоматически или задать вручную. Во втором случае больше контроля над инструкциями. В примере попробуем оба варианта.

Код CrewAI

Вот код для сценария. Использована модель gpt-4o и Langfuse для наблюдения.

Код для примера. Языковая модель — gpt-4o, для отслеживания — Langfuse.

from crewai import Agent, Crew, Process, Task, LLM
from dotenv import load_dotenv
import os
from observe import *  # Langfuse trace

load_dotenv()
verbose = False
max_iter = 4
API_VERSION = os.getenv(API_VERSION')

# Create your LLM
llm_a = LLM(
    model="gpt-4o",
    api_version=API_VERSION,
    temperature = 0.2,
    max_tokens = 8000,
)

# Define the manager agent
manager = Agent(
    role="Customer Support Manager",
    goal="Oversee the support team to ensure timely and effective resolution of customer inquiries. Use the tool to categorize the user query first, then decide the next steps.Syntesize responses from different agents if needed to provide a comprehensive answer to the customer.",
    backstory=(
        """
        You do not try to find an answer to the user ticket {ticket} yourself. You delegate tasks to coworkers based on the following logic: Note the category of the ticket first by using the triage agent. If the ticket is categorized as 'Both', always assign it first to the Technical Support Specialist, then to the Billing Support Specialist, then print the final combined response. Ensure that the final response answers both technical and billing issues raised in the ticket based on the responses from both Technical and Billing Support Specialists. ELSE If the ticket is categorized as 'Technical', assign it to the Technical Support Specialist, else skip this step. Before proceeding further, analyse the ticket category. If it is 'Technical', print the final response. Terminate further actions. ELSE If the ticket is categorized as 'Billing', assign it to the Billing Support Specialist. Finally, compile and present the final response to the customer based on the outputs from the assigned agents. 
        """
    ),
    llm = llm_a,
    allow_delegation=True,
    verbose=verbose,
)

# Define the triage agent
triage_agent = Agent(
    role="Query Triage Specialist",
    goal="Categorize the user query into technical or billing related issues. If a query requires both aspects, reply with 'Both'.",
    backstory=(
        "You are a seasoned expert in analysing intent of user query. You answer precisely with one word: 'Technical', 'Billing' or 'Both'."
    ),
    llm = llm_a,
    allow_delegation=False,
    verbose=verbose,
)

# Define the technical support agent
technical_support_agent = Agent(
    role="Technical Support Specialist",
    goal="Resolve technical issues reported by customers promptly and effectively",
    backstory=(
        "You are a highly skilled technical support specialist with a strong background in troubleshooting software and hardware issues. "
        "Your primary responsibility is to assist customers in resolving technical problems, ensuring their satisfaction and the smooth operation of their products."
    ),
    llm = llm_a,
    allow_delegation=False,
    verbose=verbose,
)

# Define the billing support agent
billing_support_agent = Agent(
    role="Billing Support Specialist",
    goal="Address customer inquiries related to billing, payments, and account management",
    backstory=(
        "You are an experienced billing support specialist with expertise in handling customer billing inquiries. "
        "Your main objective is to provide clear and accurate information regarding billing processes, resolve payment issues, and assist with account management to ensure customer satisfaction."
    ),
    llm = llm_a,
    allow_delegation=False,
    verbose=verbose,
)

# Define tasks
categorize_tickets = Task(
    description="Categorize the incoming customer support ticket: '{ticket} based on its content to determine if it is technical or billing-related. If a query requires both aspects, reply with 'Both'.",
    expected_output="A categorized ticket labeled as 'Technical' or 'Billing' or 'Both'. Do not be verbose, just reply with one word.",
    agent=triage_agent,
)

resolve_technical_issues = Task(
    description="Resolve technical issues described in the ticket: '{ticket}'",
    expected_output="Detailed solutions provided to each technical issue.",
    agent=technical_support_agent,
)

resolve_billing_issues = Task(
    description="Resolve billing issues described in the ticket: '{ticket}'",
    expected_output="Comprehensive responses to each billing-related inquiry.",
    agent=billing_support_agent,
)

# Instantiate your crew with a custom manager and hierarchical process
crew_q = Crew(
    agents=[triage_agent, technical_support_agent, billing_support_agent],
    tasks=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
    # manager_llm = llm_a,  # Uncomment for auto-created manager
    manager_agent=manager,  # Comment for auto-created manager
    process=Process.hierarchical,
    verbose=verbose,
)

Код отражает команду человеческих агентов: менеджер, триаж, технический и финансовый специалисты. Объекты CrewAI вроде Agent, Task и Crew понятны и легко визуализируются. Заметно, что кода на Python мало, а основная логика — рассуждения, планирование и поведение — на естественном языке, зависящем от способности модели извлекать смысл и намерения, затем планировать для цели.

Код CrewAI прост в разработке. Это низкокодовый подход для быстрого создания потоков, где фреймворк берет на себя основную работу по оркестрации, а не разработчик.

Насколько хорошо это работает?

Поскольку тестируем иерархический процесс, параметр process установлен в Process.hierarchical при определении Crew. Проверим разные возможности CrewAI и измерим производительность:

Автоматический менеджер от CrewAI
Кастомный менеджер

1. Автоматический менеджер

Входной запрос: Почему мой ноутбук перегревается?

Вот трассировка Langfuse:

Трассировка Langfuse для запроса о перегреве ноутбука — Почему мой ноутбук перегревается?

Ключевые наблюдения:

Сначала вывод: “Based on the provided context, it seems there is a misalignment between the nature of the issue (laptop overheating) and its categorization as a billing concern. To clarify the connection, it would be important to determine if the customer is requesting a refund for the laptop due to the overheating issue, disputing a charge related to the purchase or repair of the laptop, or seeking compensation for repair costs incurred due to the overheating…” Для явно технического запроса это слабый ответ.
Почему так? Левая панель показывает: сначала триаж, потом техническая поддержка, а затем неожиданно финансовая. График иллюстрирует это:

При ближайшем рассмотрении триаж правильно определил тикет как “Technical”, а технический агент дал хороший ответ:

Но вместо остановки и выдачи этого ответа менеджер Crew пошел к финансовому специалисту и попытался найти несуществующую финансовую проблему в чисто техническом запросе.

В итоге ответ финансового агента перезаписал технический, а менеджер Crew плохо проверил качество итогового ответа по запросу пользователя.

Почему произошло? В определении задач Crew указаны categorize_tickets, resolve_technical_issues, resolve_billing_issues, и хотя процесс иерархический, менеджер не оркестрирует, а просто выполняет все задачи последовательно.

crew_q = Crew(
    agents=[triage_agent, technical_support_agent, billing_support_agent],
    tasks=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
    manager_llm = llm_a,
    process=Process.hierarchical,
    verbose=verbose,
)

Если задать финансовый запрос, ответ покажется правильным, потому что resolve_billing_issues — последняя задача.

А для запроса, требующего обоих, вроде “Мой ноутбук перегревается, и меня дважды списали в прошлом месяце”? Триаж правильно классифицирует как “Both”, агенты дают верные ответы по своим частям, но менеджер не объединяет их в coherentный ответ. Итоговый фокусируется только на финансовом, как на последней задаче.

Задержка и использование: На изображении видно, выполнение заняло почти 38 секунд и 15759 токенов. Итоговый вывод — около 200 токенов. Остальное ушло на размышления, вызовы агентов, промежуточные ответы — для неудовлетворительного результата. Производительность “Плохая”.

Оценка подхода

Качество оркестрации: Плохое
Качество итогового вывода: Плохое
Понятность: Плохая
Задержка и использование: Плохое

Возможно, результат из-за встроенного менеджера CrewAI без кастомных инструкций. В следующем подходе заменим его на кастомного менеджера с детальными указаниями для технических, финансовых или смешанных тикетов.

2. Кастомный агент-менеджер

Менеджер поддержки клиентов определен с очень конкретными инструкциями. Это требует экспериментов, и общий промт из документации CrewAI даст те же ошибки, что и встроенный менеджер.

role="Customer Support Manager",
goal="Oversee the support team to ensure timely and effective resolution of customer inquiries. Use the tool to categorize the user query first, then decide the next steps.Syntesize responses from different agents if needed to provide a comprehensive answer to the customer.",
backstory=(
    """
    You do not try to find an answer to the user ticket {ticket} yourself. You delegate tasks to coworkers based on the following logic: Note the category of the ticket first by using the triage agent. If the ticket is categorized as 'Both', always assign it first to the Technical Support Specialist, then to the Billing Support Specialist, then print the final combined response. Ensure that the final response answers both technical and billing issues raised in the ticket based on the responses from both Technical and Billing Support Specialists. ELSE If the ticket is categorized as 'Technical', assign it to the Technical Support Specialist, else skip this step. Before proceeding further, analyse the ticket category. If it is 'Technical', print the final response. Terminate further actions. ELSE If the ticket is categorized as 'Billing', assign it to the Billing Support Specialist. Finally, compile and present the final response to the customer based on the outputs from the assigned agents. 
    """

В определении Crew используем кастомного менеджера вместо встроенного:

crew_q = Crew(
    agents=[triage_agent, technical_support_agent, billing_support_agent],
    tasks=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
    # manager_llm = llm_a,
    manager_agent=manager,
    process=Process.hierarchical,
    verbose=verbose,
)

Повторим тесты

Входной запрос: Почему мой ноутбук перегревается?

Трассировка:

График трассировки для запроса о перегреве — График запроса о перегреве ноутбука

Главное: для технического запроса поток не пошел к финансовому агенту. Менеджер следовал инструкциям, классифицировал как технический и остановился после ответа специалиста. Предварительный просмотр показывает хороший ответ. Задержка 24 секунды, токены 10k.

Входной запрос: Почему меня дважды списали в прошлом месяце?

Трассировка:

Ответ на финансовый запрос о двойном списании — Ответ на ‘Почему меня дважды списали в прошлом месяце?’

График трассировки для финансового запроса — График запроса о двойном списании

Менеджер пропустил технического специалиста, хотя тот раньше в Crew. Ответ от финансового — качественный. Задержка 16 секунд, токены только 7700.

Входной запрос: Мой ноутбук перегревается, и меня дважды списали в прошлом месяце?

Трассировка показывает: менеджер задействовал обоих и дал комбинированный ответ.

Предварительный просмотр не показывает полный ответ, который объединяет вклады обоих агентов. Задержка 38 секунд, токены 20k — соответствует сложности оркестрации и деталям ответа.

Dear Customer,

Thank you for reaching out to us regarding the issues you are experiencing. We sincerely apologize for any inconvenience caused. Below are the detailed solutions to address your concerns:

**1. Laptop Overheating Issue:**
- **Check for Proper Ventilation**: Ensure your laptop is placed on a hard, flat surface to allow proper airflow. Avoid using it on soft surfaces like beds or couches that can block the vents. Consider using a laptop cooling pad or stand with built-in fans to improve airflow.
- **Clean the Laptop's Vents and Fans**: Dust and debris can accumulate in the vents and fans, restricting airflow. Power off the laptop, unplug it, and use a can of compressed air to gently blow out dust from the vents. If you are comfortable, you can clean the internal fans and components more thoroughly, or take the laptop to a professional technician for internal cleaning.
- **Monitor Running Applications and Processes**: Open the Task Manager (Windows: Ctrl + Shift + Esc, macOS: Activity Monitor) and check for processes consuming high CPU or GPU usage. Close unnecessary applications or processes to reduce the load on the system.
- **Update Drivers and Software**: Update your operating system, drivers (especially graphics drivers), and any other critical software to the latest versions.
- **Check for Malware or Viruses**: Run a full system scan using a reputable antivirus program to detect and remove any malware.
- **Adjust Power Settings**: Adjust your power settings to "Balanced" or "Power Saver" mode (Windows: Control Panel > Power Options, macOS: System Preferences > Energy Saver).
- **Inspect the Laptop's Hardware**: If the laptop is still overheating, there may be an issue with the hardware, such as a failing fan or thermal paste that needs replacement. Consult a professional technician to inspect and replace the thermal paste or faulty hardware components if necessary.
- **Environmental Factors**: Operate the laptop in a cool, well-ventilated environment. Avoid using the laptop in direct sunlight or near heat sources.
- **Consider Upgrading Components**: If the laptop is older, consider upgrading components such as RAM or switching to an SSD to reduce the strain on the system and help with heat management.
- **Monitor Temperature Levels**: Install a temperature monitoring tool (e.g., HWMonitor, Core Temp, or Macs Fan Control) to keep track of the CPU and GPU temperatures. This can help identify if the laptop is consistently running at high temperatures and provide insights into when the overheating occurs.

If the problem persists after trying all the above solutions, please contact the laptop manufacturer’s support team or a certified repair technician for further assistance.

**2. Duplicate Billing Issue:**
- **Verification of Account Details**: I reviewed your account information to ensure accuracy and confirmed that the account is active and matches the reported issue.
- **Billing Records Check**: Upon examining the billing records for last month, I identified two identical charges processed on the same day. This confirms the duplicate billing issue.
- **Payment Method Details**: I verified the payment method used for the transactions and confirmed that both charges were processed using the same payment method.
- **Resolution**:
  - Since the duplicate charge was confirmed, I have initiated a refund for the second transaction. The refund will be processed back to the original payment method within 5-7 business days.
  - Additionally, I have applied a credit to your account as a gesture of goodwill for the inconvenience caused.
- **Customer Communication**: You will receive an email or phone call from us to inform you of the resolution, including the refund timeline and the applied credit.

We sincerely apologize for the inconvenience caused by these issues. Please let us know if you need further assistance or have any additional questions. We are here to help

И последний, более сложный запрос:

Входной запрос: Сумма в счете неверная после сбоя системы?

Трассировка: запрос классифицирован как ‘Both’, задействованы оба агента, дан комбинированный ответ. Задержка 24 секунды, токены 12k.

Ответ на запрос о сбое системы и неверном счете — Ответ на запрос с несколькими агентами

Полный ответ:

Thank you for bringing this issue to our attention. Based on the technical investigation, here are the steps we are taking to resolve the system glitch that caused the incorrect invoice amount:

1. We have verified the details of the incorrect invoice amount and identified the issue in the system.
2. Our technical team has analyzed the system logs and reviewed the invoice calculation logic to pinpoint the root cause of the error.
3. The necessary fixes have been implemented to address the glitch, and the system has been thoroughly tested to ensure the issue is resolved.
4. A corrected invoice has been generated, and we have verified its accuracy.

Additionally, here’s the complete resolution for the billing issue related to the incorrect invoice amount caused by the system glitch:

1. **Corrected Invoice Verification**: The technical team has resolved the system glitch and generated a corrected invoice. The corrected invoice has been personally verified to ensure its accuracy, including cross-checking the amounts, dates, and any applicable charges or credits.
2. **Customer Communication**: We will promptly notify you about the resolution of the issue. This communication includes:
   - An apology for the inconvenience caused by the incorrect invoice.
   - Confirmation that the system glitch has been resolved.
   - Assurance that the corrected invoice has been thoroughly reviewed for accuracy.
   - A copy of the corrected invoice for your records.
3. **Additional Steps Taken**: To prevent similar issues in the future, the technical team has implemented measures to ensure system stability and accuracy in invoice generation.
4. **Account Adjustment (if applicable)**: If the incorrect invoice resulted in any overpayment or underpayment, the necessary adjustments will be made to your account. This includes issuing a refund for any overpayment or providing clear instructions for settling any outstanding balance.
5. **Follow-Up**: We are here to assist you with any further questions or concerns regarding your account or billing. Please do not hesitate to reach out to us, and we will be happy to help. For your convenience, we have provided direct contact information for further communication.

We sincerely apologize for any inconvenience this may have caused and assure you that we are taking steps to prevent similar issues in the future. Thank you for your understanding and patience.

Оценка подхода

Качество оркестрации: Хорошее
Качество итогового вывода: Хорошее
Понятность: Хорошее (ясно, почему сделано именно так)
Задержка и использование: Удовлетворительное (соответствует сложности вывода)

Выводы

В итоге, иерархический паттерн менеджера-рабочий в CrewAI не работает как описано. Основная логика оркестрации слабая: вместо выборочного делегирования задач фреймворк выполняет все последовательно, вызывая ненужных агентов, перезапись выводов и рост задержки/токенов. Сбой из-за внутренней маршрутизации — иерархический режим не обеспечивает условные ветвления или настоящее делегирование, итоговый ответ зависит от последней задачи. Решение — кастомный менеджер с явными пошаговыми инструкциями: он использует результат триажа, вызывает только нужных агентов, объединяет их выводы и останавливается вовремя — это восстанавливает правильную маршрутизацию, улучшает качество и снижает затраты на токены.

Заключение

CrewAI ставит модель в центр оркестрации, полагаясь на нее для основной работы через промты пользователей и встроенные шаблоны фреймворка. В отличие от LangGraph и AutoGen, такой подход меняет детерминизм на удобство для разработчиков. Иногда это приводит к неожиданному поведению в ключевых функциях, вроде паттерна менеджера-рабочий, важного для многих сценариев. Статья показывает путь к нужной оркестрации через точные промты.

Проблемы архитектуры CrewAI и как их исправить

Оркестрация нескольких агентов

Оркестрация нескольких агентов

Пример сценария

Иерархический процесс

Код CrewAI

Насколько хорошо это работает?

1. Автоматический менеджер

Оценка подхода

2. Кастомный агент-менеджер

Повторим тесты

Оценка подхода

Выводы

Заключение

Горячее

Как ИИ модели думают: новое исследование

Самые актуальные AI-носители и гаджеты для покупки

Nano Banana Pro от Google меняет ИИ-арт

Обзор Abacus AI: ChatLLM и DeepAgent

White-Box-Coder: ИИ с самопроверкой кода

Сейчас в тренде