Specification Issues (FC1)
Accounting for 41.77% of failures, these issues stem from flawed system design, ambiguous prompts, and inadequate state management. This leads to agents disobeying tasks, repeating steps, or losing conversation history.
Inter-Agent Misalignment (FC2)
Responsible for 36.94% of failures, this category includes breakdowns in communication and coordination. Problems range from task derailment and information withholding to a critical reasoning-action mismatch.
Task Verification (FC3)
Making up 21.30% of failures, these are the results of insufficient quality control. This includes premature termination, incomplete verification, and the approval of faulty outputs.