Adding a model proxy server
Instead of being handled directly in our code, our model calls are now routed through a LiteLLM Proxy server. This lets us change models on the fly and have retries, fallbacks, budget tracking, and more.
Instead of being handled directly in our code, our model calls are now routed through a LiteLLM Proxy server. This lets us change models on the fly and have retries, fallbacks, budget tracking, and more.
We've switched to MaterialUI: Cleaner, easier to build and maintain, more familiar.
In our latest infrastructure update, we decided to transition from Qdrant to pgvector for managing our vector databases. This move is part of our ongoing effort to reduce cost and simplify AAQ’s architecture.
By swapping out Nginx for Caddy, we substantially simplified the deployment steps and the architecture - which means fewer docker containers to run and manage.
Last week we rolled out another safety feature - checking consistency of the response from the LLM with the content it is meant to be using to generate it. This shoud catch hallucinations or when LLM uses it's pre-training to answer a question. But it also catches any prompt injection or jailbreaking - if it somehow got through our other checks.
First, we have added this section that you are currently reading. Each week we'll post what we've rolled out - new features, bug fixes, and performance improvements.
The rest of the docs have now also been restructured to make it easy to parse.