Jockey uses a multi-agent system to process complex video tasks efficiently. The system has three main components:
- Supervisor
- Planner
- Workers
The diagram below illustrates how these components interact.
The supervisor coordinates the overall workflow by:
- Receiving user input
- Routing tasks between nodes
- Managing error recovery
- Ensuring adherence to the current plan
- Initiating replanning when necessary
For complex requests, the Supervisor engages the planner. For simpler tasks, it directs work to specific workers.
The planner creates detailed, step-by-step plans for complex user requests. It breaks down tasks into manageable steps for the worker nodes to execute. This component is crucial for multi-step video processing workflows that require a strategic approach.
The worker nodes consists of two components: - Instructor: Generates precise and complete task instructions for individual workers based on the Planner's strategy. - Actual Workers: Agents that ingest the instructions from the instructor and execute them using the tools they have available.