LLM Executor¶
The LLM executor is a pluggable responder that handles llm.request events by calling a user-supplied LLM function, then emits llm.response events with the result.
LLMExecutorResponder¶
kando.responders.llm_executor.LLMExecutorResponder(llm_fn)
¶
Factory: returns a Responder that calls llm_fn for every llm.request event.
Usage
def my_llm(messages, model, max_tokens): ... # call Anthropic/OpenAI/etc return response_text, cost_usd
responders = create_kit() + [LLMExecutorResponder(my_llm)]
Source code in kando/responders/llm_executor.py
LLMFn signature¶
LLMFn = Callable[[list[dict], str, int], tuple[str, float]]
# messages model max_tokens -> (text, cost_usd)
The function receives:
| Parameter | Type | Description |
|---|---|---|
messages |
list[dict] |
Chat messages (OpenAI/Anthropic format) |
model |
str |
Model identifier from the llm.request event |
max_tokens |
int |
Max response tokens |
It must return a (response_text, cost_usd) tuple.
Built-in implementations¶
Anthropic¶
Uses the Anthropic SDK directly. Activated when ANTHROPIC_API_KEY is set.
from kando.responders.anthropic_llm import anthropic_llm
from kando.responders.llm_executor import LLMExecutorResponder
responders = create_kit() + [LLMExecutorResponder(anthropic_llm)]
OpenRouter¶
Routes through the OpenRouter API. Activated when OPENROUTER_API_KEY is set (takes priority over Anthropic).
from kando.responders.openrouter_llm import openrouter_llm
from kando.responders.llm_executor import LLMExecutorResponder
responders = create_kit() + [LLMExecutorResponder(openrouter_llm)]
Set OPENROUTER_MODEL to override the default model (anthropic/claude-haiku-4-5).
Usage¶
from kando.responders.llm_executor import LLMExecutorResponder
# Custom LLM function
def my_llm(messages: list[dict], model: str, max_tokens: int) -> tuple[str, float]:
# Call your LLM provider
response = call_api(messages=messages, model=model, max_tokens=max_tokens)
return response.text, response.cost
responders = create_kit() + [LLMExecutorResponder(my_llm)]
runtime = Runtime(ledger=store, responders=responders)
world = runtime.run(seed)
Caching¶
The executor checks world.context["cache"] (an LLMCache) before calling the LLM function. If a cached response exists for the same (messages, model, max_tokens) triple, it is served without an API call. New responses are written to the cache automatically.
How the CLI wires it¶
When you run kando run, the CLI auto-attaches an LLMExecutorResponder if an API key is available:
OPENROUTER_API_KEY→ usesopenrouter_llmANTHROPIC_API_KEY→ usesanthropic_llm- Neither set → no executor attached;
llm.requestevents are logged but unanswered