Invokers
Invokers are the execution engines of the LLLM framework. They abstract the underlying LLM APIs, taking the state of a Dialog and executing the API call to generate the next response.
The default invoker is built on LiteLLM, giving you access to 100+ LLMs (OpenAI, Anthropic, Gemini, Vertex, local models via Ollama/vLLM, and more) through a single unified interface.
The BaseInvoker Interface
All invokers inherit from BaseInvoker and implement a single method: call.
class BaseInvoker(ABC):
@abstractmethod
def call(
self,
dialog: Dialog,
model: str,
model_args: Optional[Dict[str, Any]] = None,
parser_args: Optional[Dict[str, Any]] = None,
responder: str = 'assistant',
metadata: Optional[Dict[str, Any]] = None,
api_type: APITypes = APITypes.COMPLETION,
stream_handler: BaseStreamHandler = None,
) -> InvokeResult:
pass
The invoker returns an InvokeResult — a single object that bundles the clean Message (ready for dialog) with per-invocation diagnostics. This separation means the Message that goes into the dialog is free of debug data, while the agent loop has everything it needs for retry logic and analysis.
Arguments
| Argument | Description |
|---|---|
dialog |
Current Dialog. The invoker reads dialog.top_prompt for tools, parser, and output format. |
model |
Model identifier (e.g. 'gpt-4o', 'anthropic/claude-3-5-sonnet'). |
model_args |
Provider-specific arguments (temperature, max_tokens, api_base, etc.). |
parser_args |
Arguments passed to the prompt's output parser. |
responder |
Name of the agent generating the response (for logging and multi-agent routing). |
metadata |
Tracking metadata (e.g. frontend replay data). Attached to the Message but not sent to the LLM. |
api_type |
COMPLETION (Chat Completions) or RESPONSE (OpenAI Responses API). |
stream_handler |
Optional BaseStreamHandler for real-time streaming. |
Return: InvokeResult
@dataclass
class InvokeResult:
raw_response: Any = None # raw API response object
model_args: Dict[str, Any] = ... # actual args sent to the API
execution_errors: List[Exception] = [] # parse/validation errors
message: Optional[Message] = None # the clean conversational message
The agent loop uses invoke_result.has_errors to decide whether to retry, and invoke_result.message for the dialog:
invoke_result = invoker.call(dialog, model, ...)
if invoke_result.has_errors:
# retry with exception handler
raise AgentException(invoke_result.error_message)
dialog.append(invoke_result.message)
Streaming with BaseStreamHandler
LLLM uses a callback pattern for streaming to keep dialog state clean and synchronous. Subclass BaseStreamHandler:
class MyConsoleStreamer(BaseStreamHandler):
def handle_chunk(self, chunk_content: str, chunk_response: Any):
print(chunk_content, end="", flush=True)
# Pass it to the agent or invoker
invoke_result = invoker.call(dialog, model="gpt-4o", stream_handler=MyConsoleStreamer())
The invoker streams chunks to the handler in real-time while still returning the fully constructed InvokeResult at the end.
Using the LiteLLMInvoker
The LiteLLMInvoker is the default invoker. It standardizes input and output schemas across all providers.
1. Model Naming Convention
LiteLLM routes requests based on the model string prefix:
- OpenAI:
openai/gpt-4o(or simplygpt-4o) - Anthropic:
anthropic/claude-3-5-sonnet-20241022 - Google:
gemini/gemini-1.5-pro - Ollama (Local):
ollama/llama3 - Azure:
azure/my-gpt4-deployment
See the LiteLLM model list and LiteLLM providers for all supported models.
2. Setting API Keys
LiteLLM automatically detects standard environment variables:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="AIza..."
See LiteLLM Python SDK for environment variables per provider. Some providers (like Azure) require manual authentication setup — see LiteLLM Proxy Authentication.
3. Provider-Specific Arguments (model_args)
Pass provider-specific or standard arguments via model_args. LLLM automatically drops unsupported parameters safely — if you pass presence_penalty to Anthropic (which doesn't support it), it's quietly dropped rather than throwing an error.
model_args = {
"temperature": 0.2,
"max_tokens": 4096,
"api_base": "http://localhost:11434", # for local deployments
}
See LiteLLM Completion API and LiteLLM Response API for all available arguments.
4. Exception Handling
LiteLLM standardizes all provider errors into the OpenAI exception format. Common exceptions:
RateLimitError(HTTP 429)ContextWindowExceededError(HTTP 400)AuthenticationError(HTTP 401)APIConnectionError(HTTP 500)
The Agent class handles max_llm_recall retries and rate-limit backoffs automatically. Parse/validation errors are captured in InvokeResult.execution_errors and handled by the agent loop's exception handler.
5. Chat Completions vs. Responses API
The invoker supports two API modes via api_type:
APITypes.COMPLETION(default): Uses Chat Completions. IfPrompt.formatis set (Pydantic model or JSON schema), structured output is enabled automatically.APITypes.RESPONSE: Uses OpenAI's Responses API. When the prompt enablesallow_web_searchorcomputer_use_config, these materialize as native OpenAI tools. Tool outputs are surfaced back through the interrupt handler.
Both modes go through the same agent call loop — the difference is only in how the invoker talks to the API.