LLM Proxy Server
What is it?
AAQ uses the LiteLLM Proxy Server for managing LLM calls, allowing you to use any LiteLLM supported model including self-hosted ones.
This proxy server runs as a separate Docker container with configs read from a config.yaml
file, where you can set the appropriate model
names and endpoints for each LLM task.
Example config
You can see an example litellm_proxy_config.yaml
file below. In our backend code, we refer to the models by their custom task model_name
(e.g. "generate-response"), but
which actual LLM model each call is routed to is set here.
model_list:
- model_name: embeddings
litellm_params:
model: text-embedding-ada-002
api_key: "os.environ/OPENAI_API_KEY"
- model_name: default
litellm_params:
model: gpt-4-0125-preview
api_key: "os.environ/OPENAI_API_KEY"
- model_name: generate-response
litellm_params:
model: gpt-4-0125-preview
api_key: "os.environ/OPENAI_API_KEY"
- model_name: detect-language
litellm_params:
model: gpt-3.5-turbo-1106
api_key: "os.environ/OPENAI_API_KEY"
- model_name: translate
litellm_params:
model: gpt-3.5-turbo-1106
api_key: "os.environ/OPENAI_API_KEY"
- model_name: paraphrase
litellm_params:
model: gpt-3.5-turbo-1106
api_key: "os.environ/OPENAI_API_KEY"
- model_name: safety
litellm_params:
model: gpt-3.5-turbo-1106
api_key: "os.environ/OPENAI_API_KEY"
- model_name: alignscore
litellm_params:
model: gpt-3.5-turbo-1106
api_key: "os.environ/OPENAI_API_KEY"
litellm_settings:
num_retries: 3
request_timeout: 100
telemetry: False
See the Contributing Setup and Docker Compose Setup for how this service is run in our stack.