Skip to content

Agent Call

Agent calls are the defining concept of LLLM. Instead of exposing raw LLM completions, every agent implements a deterministic state machine that must transition from an initial dialog to a well-defined output state (or raise an error). This contract keeps downstream systems simple—no consumer needs to guess whether the model "felt done".

Agent call state machine

LLM Call vs. Agent Call

LLM Call Agent Call
Input Flat list of chat messages. Dialog seeded with a top-level Prompt, plus optional prompt arguments.
Output Raw model string plus metadata. Parsed response, updated dialog, and function-call trace.
Responsibility Caller decides whether to retry, parse, or continue. Agent handles retries, parsing, exception recovery, and interrupts until it reaches the desired state.
Determinism Best-effort. Guaranteed next state or explicit exception.

The Agent dataclass in lllm/llm.py exposes call(dialog, extra, args, parser_args) to run the loop. AgentBase (and AsyncAgentBase) wrap this call with logging (StreamWrapper) and a user-friendly .call(task) signature for system builders. Under the hood each agent delegates to a provider implementation (lllm.providers.BaseProvider) so the same loop can target OpenAI’s Chat Completions API or the Responses API by toggling the api_type field on the agent configuration.

State Machine Lifecycle

  1. Seed dialogAgent.init_dialog places the system prompt (usually pulled from PROMPT_REGISTRY via Prompts('agent_root')).
  2. Send invocation prompt – Systems typically send_message with a task-specific prompt (e.g., prompts('task_query')). The Dialog.top_prompt pointer remembers which prompt introduced the latest user turn.
  3. LLM call – The provider decides whether to hit chat.completions or the Responses API depending on api_type (completion vs. response) and prompt needs, attaches registered functions/MCP servers, and sends the dialog history.
  4. Response handling
  5. If the model returned tool calls, LLLM invokes each linked Function, appends a tool message generated by Prompt.interrupt_handler, and loops again.
  6. If the parser raised a ParseError, Prompt.exception_handler is invoked with the error message and the dialog is retried (up to max_exception_retry).
  7. Network/runtime issues trigger exponential backoff, optional "LLM recall" retries, and persist structured error reports.
  8. Completion – As soon as the model returns a valid assistant message without function calls, the loop terminates and the parsed payload plus the final dialog are returned.

Interrupt vs. Exception Handling

Each Prompt can specify inline handlers:

  • Exception handler (Prompt.exception_handler) receives {error_message} whenever parsing or validation fails. These dialog branches are pruned after handling.
  • Interrupt handler (Prompt.interrupt_handler) receives {call_results} after function execution. These branches remain in the dialog for transparency, and a final "stop calling functions" prompt ensures the agent produces a natural-language summary.

Handlers inherit the prompt’s parser, XML/MD tags, and allowed functions, so a single definition covers the entire agent loop.

Function Calls, MCP, and Tools

Prompt instances can bundle:

  • Function objects (structured JSON schemas) that wrap Python callables.
  • MCP server descriptors for OpenAI’s Model Context Protocol tools.
  • Optional allow_web_search and computer_use_config hints that enable OpenAI-native web search or computer-use toolchains when supported by the model card (const.py).

During an agent call, every function call is tracked as a FunctionCall object with arguments, result, result_str, and error_message. The loop prevents duplicate calls by checking FunctionCall.is_repeated.

Selecting the LLM API

Each agent entry in agent_configs accepts api_type:

[agent_configs.researcher]
model_name = "gpt-4o-mini"
system_prompt_path = "research/system"
api_type = "response"  # or "completion"
  • completion (default) uses Chat Completions. If Prompt.format is set, the provider automatically switches to beta.chat.completions.parse so you can keep using Pydantic response formats.
  • response opts into the OpenAI Responses API. When the target model advertises web_search or computer_use features, Prompt.allow_web_search and Prompt.computer_use_config materialize as native OpenAI tools. Responses API tool outputs are surfaced back through Prompt.interrupt_handler as Roles.USER messages so the assistant can summarize the tool transcript.

Classification Helpers

The same machinery powers lightweight classifiers (Agent.classify, Agent.binary_classify). These helpers:

  1. Re-seed the dialog with a classifier prompt.
  2. Ask the model to emit a single token drawn from the provided class list.
  3. Inspect logprobs to return calibrated probabilities.

Because classification still runs through the agent-call loop, you retain exception handling, retries, and structured logging.

Implementing Custom Agents

To ship a new agent:

  1. Subclass AgentBase, set agent_type and agent_group (e.g., ['researcher', 'editor']).
  2. Within __init__, pick prompts via Prompts('<root>') and designate which sub-agent handles each task.
  3. Implement call(self, task, **kwargs) to orchestrate dialogs (see template/example/system/agent/agent.py).

Agents register themselves automatically through AgentBase.__init_subclass__, so once your class is imported it becomes available to build_agent and the templates.