Adding a model proxy server
Instead of being handled directly in our code, our model calls are now routed through a LiteLLM Proxy server. This lets us change models on the fly and have retries, fallbacks, budget tracking, and more.
Why change?
LiteLLM Proxy introduces a streamlined approach for handling various large language models (LLMs) through a single interface. We can now manage model endpoints and configurations via a proxy server, replacing the previous method of hard-coding them into the environment of our app.
The benefits of this setup:
- Simplifies codebase by centralizing the model configurations
- Provides the flexibility to switch or update models without altering the core application logic - we can even add and remove models through the proxy's API!
- Multiple instances of AAQ can use the same model server
We now configure models in a config.yaml
file, allowing the proxy to route requests to different LLMs (commercial or self-hosted - full list here). See deployment/docker-compose/litellm_config.yaml
for an example.
The LiteLLM Proxy server also has some extra useful features:
- Consistent input/output formats across different models
- Fallback mechanisms for error handling
- Detailed logging and connectivity to Langfuse and others
- Tracking of token usage and spending.
- Asynchronous handling of requests and caching.
Any downsides?
Potential downsides include:
- Dependency on the LiteLLM project for updates for new models and parameters
- A possible increase in latency
- A new Docker container in our stack which, despite its name, it not "lite"!