ADR-007: LLMExecutorResponder as a Pluggable Factory Function¶
Status: Accepted
Date: 2026-06-06
Context¶
The runtime needs to call LLMs but should not be coupled to a specific provider (Anthropic, OpenAI, local models, etc.). Several approaches were considered:
- Hardcode the Anthropic SDK in the executor.
- Environment-variable-driven provider selection (e.g.,
KANDO_LLM_PROVIDER=anthropic). - Pluggable
llm_fncallable injected at runtime construction time.
Decision¶
LLMExecutorResponder(llm_fn: LLMFn) -> Responder is a factory that wraps any callable matching:
LLMFn = Callable[[list[dict], str, int], tuple[str, float]]
# messages model max_tokens text cost_usd
The executor is a plain Responder — it is added to the responder list alongside kit responders:
The executor also consults the LLMCache from world.context["cache"] before calling llm_fn. Cache key is SHA-256 of {messages, model, max_tokens}. On cache hit, the cached (text, cost_usd) is returned without calling the API. On miss, the result is stored.
Consequences¶
Positive:
- Tests use a fake_llm function — no API calls, no mocking required.
- Provider can be swapped without touching any kit code.
- Cache integration is transparent to both kits and the llm_fn implementation.
- Adding retry logic, rate limiting, or observability is done in the llm_fn wrapper, not in the executor.
Negative:
- The LLMFn signature is synchronous. Async LLM SDKs require a sync wrapper (e.g., asyncio.run()). This is a consequence of the synchronous runtime loop (see ADR-011).
- cost_usd must be computed by the caller. If the caller doesn't know the cost (e.g., local models), it should return 0.0.
No Hard LLM Dependency in Core¶
kando/ has zero LLM SDK dependencies. The pyproject.toml optional extras are intentionally empty for LLM SDKs — users bring their own.