Lesson 4 — Tools: Giving Agents Superpowers
Tools let an agent call Python functions during a conversation. The LLM decides when to invoke a tool, receives the result, and continues reasoning. LLLM handles the interrupt loop automatically.
The @tool Decorator
The fastest way to define a tool is with the @tool decorator:
from lllm.core.prompt import tool
@tool(
description="Get the current weather for a city",
prop_desc={
"city": "The city name, e.g. 'Paris'",
"unit": "Temperature unit: 'celsius' or 'fahrenheit'",
},
)
def get_weather(city: str, unit: str = "celsius") -> str:
# Replace with a real API call
return f"Sunny, 22°C in {city}"
@tool inspects the function signature to build the JSON Schema that gets sent to the model. Type hints map to JSON types:
| Python | JSON |
|---|---|
str |
"string" |
int |
"integer" |
float |
"number" |
bool |
"boolean" |
list |
"array" |
dict |
"object" |
Parameters without defaults are marked as required automatically.
Attaching Tools to a Prompt
from lllm import Prompt
weather_prompt = Prompt(
path="weather_bot/system",
prompt="You are a weather assistant. Use your tools to answer questions.",
function_list=[get_weather], # list of Function objects
)
When the agent uses this prompt, the model sees the tool schema and can call get_weather.
Full Example: Weather Bot
from lllm import Tactic
from lllm.core.prompt import tool, Prompt
from lllm.invokers import build_invoker
from lllm import Agent
@tool(description="Get current weather for a city")
def get_weather(city: str, unit: str = "celsius") -> str:
return f"Sunny, 22°C in {city}"
weather_prompt = Prompt(
path="weather_bot/system",
prompt="You are a weather assistant.",
function_list=[get_weather],
)
invoker = build_invoker({"invoker": "litellm"})
agent = Agent(
name="weather_bot",
system_prompt=weather_prompt,
model="gpt-4o",
llm_invoker=invoker,
max_interrupt_steps=3, # allow up to 3 tool calls per response
)
agent.open("chat")
agent.receive("What is the weather in Paris and Tokyo?")
response = agent.respond()
print(response.content)
The agent loop:
1. LLM sees the user question and decides to call get_weather(city="Paris").
2. LLLM executes the function and feeds the result back.
3. LLM calls get_weather(city="Tokyo").
4. LLLM executes and feeds back.
5. LLM writes the final answer.
6. respond() returns it.
Controlling the Tool-Call Loop
agent = Agent(
...
max_interrupt_steps=5, # maximum tool calls before forcing a final answer
max_exception_retry=3, # maximum retries when the model output fails validation
)
max_interrupt_steps=0means unlimited (up to 100). Avoid this in production — always set an explicit cap.- When the limit is reached, the agent appends a message asking the model to stop calling tools and write a final answer.
Custom Result Formatting
By default the tool result is formatted as:
Return of calling function get_weather with arguments {'city': 'Paris', 'unit': 'celsius'}:
---
Sunny, 22°C in Paris
---
You can override this with a custom processor:
def my_processor(result, function_call):
return f"[{function_call.name}] → {result}"
@tool(description="Get weather", processor=my_processor)
def get_weather(city: str) -> str:
return "Sunny"
Building a Function Manually
For tools with complex schemas or when you want to declare the schema separately from the implementation:
from lllm import Prompt
from lllm.core.prompt import Function
search_fn = Function(
name="web_search",
description="Search the web for information",
properties={
"query": {"type": "string", "description": "Search query"},
"max_results": {"type": "integer", "description": "Max results (default 5)"},
},
required=["query"],
)
# Attach the implementation separately
search_fn.link_function(lambda query, max_results=5: f"Top {max_results} results for '{query}'")
my_prompt = Prompt(
path="research_bot/system",
prompt="You are a research assistant.",
function_list=[search_fn],
)
MCP Servers
For tools exposed via the Model Context Protocol (MCP):
from lllm.core.prompt import MCP, Prompt
mcp_server = MCP(
server_label="my_tools",
server_url="http://localhost:8080",
require_approval="never",
allowed_tools=["search", "fetch_url"],
)
my_prompt = Prompt(
path="agent/system",
prompt="You are an assistant with web access.",
mcp_servers_list=[mcp_server],
)
Note: MCP support requires an MCP-compatible invoker. Check the Invokers reference for details.
Repeated Tool Call Detection
If the agent calls the same tool with the same arguments twice in one turn, LLLM detects the repetition and injects a warning telling the model not to call it again. This prevents infinite loops caused by a confused model.
Error Handling in Tools
If your tool function raises an exception, LLLM catches it and feeds the error message back to the model as the tool result:
@tool(description="Divide two numbers")
def divide(a: float, b: float) -> str:
if b == 0:
raise ValueError("Division by zero")
return str(a / b)
The model sees "Error: Division by zero" as the result and can decide how to proceed (e.g., ask for different arguments).
Proxy Tools — Structured Access to External APIs
For agents that need to call many API endpoints, the @tool approach scales poorly: dozens of endpoints means dozens of tool definitions and a bloated context. LLLM's proxy system solves this with a two-tool pattern the agent uses lazily:
query_api_doc(proxy_name)— retrieve full endpoint documentation on demandrun_python(code)— execute Python in a persistent interpreter withCALL_APIpre-injected
The agent learns the endpoint structure just-in-time, then calls whatever it needs through CALL_API in Python. LLLM injects both tools and the supporting prompt block automatically when you declare a proxy config — no extra wiring needed.
Step 1: Define a proxy
A proxy wraps an API surface. Endpoints are declared with @BaseProxy.endpoint — this metadata drives the API directory, query_api_doc responses, and auto-testing.
from lllm.proxies import BaseProxy, ProxyRegistrator
@ProxyRegistrator(
path="weather",
name="Weather API",
description="Current weather and forecasts for any city.",
)
class WeatherProxy(BaseProxy):
@BaseProxy.endpoint(
category="current",
endpoint="conditions",
description="Current temperature and conditions for a city.",
params={
"city*": (str, "London"), # * = required
"units": (str, "celsius"),
},
response={"temperature": 18, "condition": "cloudy", "humidity": 72},
)
def conditions(self, params: dict) -> dict:
city = params["city"]
return {"temperature": 18, "condition": "cloudy", "humidity": 72}
@BaseProxy.endpoint(
category="forecast",
endpoint="daily",
description="5-day daily forecast for a city.",
params={
"city*": (str, "London"),
"days": (int, 5),
},
response=[{"date": "2024-01-01", "high": 20, "low": 12}],
)
def daily(self, params: dict) -> list:
city = params["city"]
return [{"date": f"2024-01-0{i}", "high": 18+i, "low": 10+i} for i in range(1, 6)]
Key conventions:
- path in @ProxyRegistrator is what you put in activate_proxies
- Parameters with * suffix are required; the second tuple element is the example value
- response is the example response shown in query_api_doc output
Step 2: Configure the agent
Add a proxy: block to the agent config. With exec_env: interpreter (the default), LLLM builds a ProxyManager, an AgentInterpreter, and injects both tools automatically:
# config.yaml
global:
model_name: gpt-4o
model_args:
temperature: 0.1
agent_configs:
- name: weather_analyst
system_prompt: "You are a weather analyst. Use the Weather API to answer questions."
proxy:
activate_proxies: [weather]
exec_env: interpreter # default — can be omitted
max_output_chars: 5000
timeout: 60.0
Or in Python:
config = {
"global": {"model_name": "gpt-4o"},
"agent_configs": [{
"name": "weather_analyst",
"system_prompt": "You are a weather analyst.",
"proxy": {
"activate_proxies": ["weather"],
"exec_env": "interpreter",
"max_output_chars": 5000,
},
}],
}
Step 3: Run it
from lllm import Tactic
class WeatherTactic(Tactic):
name = "weather"
agent_group = ["weather_analyst"]
def call(self, task: str) -> str:
agent = self.agents["weather_analyst"]
agent.open("main")
agent.receive(task)
return agent.respond().content
tactic = WeatherTactic(config)
print(tactic("What is the current weather in Tokyo and Paris?"))
The agent will typically:
1. Call query_api_doc("weather") to see full endpoint specs
2. Call run_python with code that calls CALL_API("weather/conditions", {"city": "Tokyo"})
3. Call run_python again (reusing variables) to process both cities and format the answer
How the interpreter works
AgentInterpreter uses a persistent namespace passed to every exec() call, so variables survive across calls like a live REPL:
# Agent turn 1: fetch data
data_tokyo = CALL_API("weather/conditions", {"city": "Tokyo"})
data_paris = CALL_API("weather/conditions", {"city": "Paris"})
print(data_tokyo)
# Agent turn 2: data_tokyo is still in scope
comparison = f"Tokyo: {data_tokyo['temperature']}°C, Paris: {data_paris['temperature']}°C"
print(comparison)
- Stdout capture — only
print()output is captured; the last expression is not auto-returned. - Exception handling — uncaught exceptions are returned as formatted tracebacks so the agent can diagnose and retry without losing session state.
- Truncation — output longer than
max_output_charsis cut off with a truncation indicator. The agent is told this in the injected prompt block. - Timeout — each
run_pythoncall runs in a daemon thread;TimeoutErroris raised if it exceedstimeoutseconds.
Global vs per-agent proxy config
global:
model_name: gpt-4o
proxy:
activate_proxies: [fmp, fred]
exec_env: interpreter
agent_configs:
- name: fast_analyst
# inherits global proxy config
- name: notebook_analyst
proxy:
exec_env: jupyter # override: use JupyterSession instead
- name: crypto_analyst
proxy:
activate_proxies: [fmp] # override: only fmp
timeout: 30.0
Per-agent values deep-merge on top of global, field by field.
Full proxy config reference
proxy:
activate_proxies: [fmp, fred] # which proxies to load; empty = all registered
deploy_mode: false # passed to proxy instances (e.g. for rate limiting)
cutoff_date: "2024-01-01" # ISO date; proxies can use this to filter data
exec_env: interpreter # "interpreter" | "jupyter" | null
max_output_chars: 5000 # truncate run_python stdout; 0 = no limit
truncation_indicator: "... (truncated)"
timeout: 60.0 # seconds per run_python call; interpreter only
prompt_template: null # custom template string; null = auto-select
Interpreter vs Jupyter mode
Use exec_env: jupyter when you need notebook files as output artifacts, Matplotlib/Plotly visualizations, or cell-level execution history. In Jupyter mode, only query_api_doc is injected as a tool — the agent writes <python_cell> XML tags that your tactic runs in a JupyterSession.
| Interpreter | Jupyter | |
|---|---|---|
| Overhead | None (in-process exec) | Subprocess per kernel (~2–3s startup) |
| Parallel agents | Safe — isolated namespaces | Heavy — one subprocess per agent |
| Visualizations | Not available | Full matplotlib/plotly support |
| Output artifact | None | .ipynb notebook file |
| Audit trail | None | Cell-by-cell execution history |
| Best for | Data retrieval, computation, API orchestration | Exploratory analysis, charts, reports |
See Proxies & Sandbox reference for a complete Jupyter mode example.
Jupyter Notebook Sandbox — Step by Step
JupyterSession is LLLM's built-in sandbox, optimised for the proxy system. Code executes in a real kernel, plots are saved, variables persist across turns, and the session produces a .ipynb file as an output artifact.
Unlike interpreter mode (where LLLM runs code in-process), in Jupyter mode the split of responsibility is explicit: the agent writes <python_cell> / <markdown_cell> XML tags; your tactic extracts those cells and runs them in a JupyterSession; execution results are fed back so the agent can continue. This lets you intercept every cell, handle errors, and checkpoint state.
Step 1: Write the system prompt and parser
The system prompt explains the notebook interface. A parser enforces the tag format:
from lllm import Prompt
from lllm.utils import find_all_xml_tags_sorted
from lllm.core.const import ParseError
NOTEBOOK_SYSTEM = """
You are a data analyst working in a Jupyter Notebook.
## Interface
- Write Python in `<python_cell>...</python_cell>` tags.
- Write notes and summaries in `<markdown_cell>...</markdown_cell>` tags.
- When your analysis is complete, respond with `<TERMINATE_NOTEBOOK>` alone (no cells in that response).
Rules:
- Cells execute sequentially; variables persist across turns.
- If a cell fails you will see the traceback and must fix that cell only.
- Use `CALL_API(path, params)` to call data APIs — always call `query_api_doc` first.
"""
def notebook_parser(message: str, **kwargs):
matches = find_all_xml_tags_sorted(message)
terminate = "<TERMINATE_NOTEBOOK>" in message
cells = [
(m["tag"], m["content"])
for m in matches
if m["tag"] in ("python_cell", "markdown_cell")
]
if not cells and not terminate:
raise ParseError(
"No cells found. Wrap code in <python_cell>...</python_cell> "
"or markdown in <markdown_cell>...</markdown_cell>, "
"or respond with <TERMINATE_NOTEBOOK>."
)
# If TERMINATE_NOTEBOOK is mixed with python cells, ignore the termination signal
if terminate and any(tag == "python_cell" for tag, _ in cells):
terminate = False
return {"raw": message, "cells": cells, "terminate": terminate}
analyst_system = Prompt(
path="analyst/system",
prompt=NOTEBOOK_SYSTEM,
xml_tags=["python_cell", "markdown_cell"],
signal_tags=["TERMINATE_NOTEBOOK"],
parser=notebook_parser,
)
Step 2: Configure the agent
Set exec_env: jupyter so LLLM injects only query_api_doc (not run_python — the tactic drives execution):
# config.yaml
global:
model_name: gpt-4o
model_args:
temperature: 0.1
agent_configs:
- name: analyst
system_prompt_path: analyst/system
proxy:
activate_proxies: [weather]
exec_env: jupyter
Step 3: Write the tactic
The tactic drives the loop: send task → get cells → run cells → feed output back → repeat:
from lllm import Tactic
from lllm.sandbox.jupyter import JupyterSession
class NotebookAnalysisTactic(Tactic):
name = "notebook_analysis"
agent_group = ["analyst"]
def call(self, task: str, **kwargs) -> str:
agent = self.agents["analyst"]
# Initialise a JupyterSession — CALL_API is wired into the kernel automatically
session = JupyterSession(
name="analysis",
dir="/tmp/notebooks",
metadata={
"proxy": {
"activate_proxies": ["weather"],
"deploy_mode": False,
}
},
)
session.init_session() # injects CALL_API into the kernel namespace
agent.open("main")
agent.receive(task)
for _ in range(10): # max steps
msg = agent.respond()
parsed = msg.parsed
if parsed["terminate"]:
break
# Execute each cell and collect output
outputs = []
for tag, code in parsed["cells"]:
if tag == "python_cell":
output = session.run_cell(code)
outputs.append(f"[output]\n{output}")
else:
session.add_markdown_cell(code)
# Feed results back so the agent can continue
if outputs:
agent.receive("\n\n".join(outputs))
return session.notebook_path # path to the saved .ipynb file
The agent typically:
1. Calls query_api_doc("weather") via the injected tool to see endpoint specs.
2. Writes a <python_cell> that calls CALL_API(...) — the tactic runs it and returns stdout.
3. Receives the output, writes more cells to analyse and visualise the data.
4. Responds with <TERMINATE_NOTEBOOK> when done.
Step 4: Handle cell errors
If a cell fails, feed the traceback back so the agent can fix just that cell:
for tag, code in parsed["cells"]:
if tag == "python_cell":
output = session.run_cell(code)
if session.last_cell_failed:
# Agent sees the error and provides a replacement cell
agent.receive(
f"Cell failed with the following error:\n\n{output}\n\n"
"Please fix the error. Provide one <python_cell> with the corrected code only."
)
fix_msg = agent.respond()
_, fixed_code = fix_msg.parsed["cells"][0]
output = session.run_cell(fixed_code)
outputs.append(f"[output]\n{output}")
Bringing your own execution environment
JupyterSession is the built-in sandbox, but the pattern — agent writes tags, tactic runs code — works with any executor. The prompt and parser stay unchanged; only the sandbox object changes:
class DockerSandbox:
"""Minimal example: run cells in an isolated container."""
def run_cell(self, code: str) -> str:
import subprocess
result = subprocess.run(
["docker", "exec", "my_sandbox", "python3", "-c", code],
capture_output=True, text=True, timeout=30,
)
return result.stdout or result.stderr
def add_markdown_cell(self, text: str) -> None:
pass # no-op for non-notebook environments
last_cell_failed: bool = False # set after run_cell if needed
class DockerAnalysisTactic(Tactic):
name = "docker_analysis"
agent_group = ["analyst"]
def call(self, task: str, **kwargs) -> str:
agent = self.agents["analyst"]
sandbox = DockerSandbox()
agent.open("main")
agent.receive(task)
for _ in range(10):
msg = agent.respond()
parsed = msg.parsed
if parsed["terminate"]:
break
outputs = []
for tag, code in parsed["cells"]:
if tag == "python_cell":
outputs.append(sandbox.run_cell(code))
else:
sandbox.add_markdown_cell(code)
if outputs:
agent.receive("\n\n".join(outputs))
return agent.current_dialog.tail.content
You could replace DockerSandbox with a remote code execution service, a WebAssembly sandbox, a database query runner — anything that accepts code strings and returns output strings.
See Proxies & Sandbox reference for the full JupyterSession API, init_session wiring details, and a complete Jupyter mode example.
Summary
| Concept | API |
|---|---|
| Define a tool | @tool(description=..., prop_desc={...}) |
| Attach to prompt | Prompt(function_list=[my_fn]) |
| Control loop depth | Agent(max_interrupt_steps=N) |
| Custom result format | @tool(processor=my_fn) |
| Manual schema | Function(name=..., properties={...}) |
| MCP server | Prompt(mcp_servers_list=[MCP(...)]) |
| Define a proxy | @ProxyRegistrator + @BaseProxy.endpoint |
| Wire proxy to agent | proxy: {activate_proxies: [...], exec_env: interpreter} |
| Lazy API docs | query_api_doc(proxy_name) (auto-injected) |
| Execute code | run_python(code) with CALL_API (auto-injected) |
| Jupyter sandbox | exec_env: jupyter + JupyterSession in tactic |
| Custom sandbox | Replace JupyterSession with any object that has run_cell() |