LiteLLM Proxy Integration
Enterprise developers often need a unified routing layer. With LiteLLM, you can easily load-balance and route requests between your local Privane models and fallback cloud APIs (like OpenAI or Anthropic) seamlessly.
This architecture ensures you get the absolute lowest latency and zero-cost inference for 90% of requests (handled locally by Privane), while seamlessly falling back to a massive 70B+ parameter cloud model only for complex queries.
Configuring LiteLLM
Install the LiteLLM proxy:
pip install litellmCreate a config.yaml that routes the default traffic to Privane, and complex traffic to OpenAI:
model_list:
# Route 1: Local Sovereign AI (Zero Cost, Zero Latency)
- model_name: "default-model"
litellm_params:
model: "openai/gemma-2b"
api_base: "http://localhost:8080/v1"
api_key: "privane"
# Route 2: Cloud Fallback
- model_name: "complex-model"
litellm_params:
model: "gpt-4o"
api_key: "os.environ/OPENAI_API_KEY"
router_settings:
routing_strategy: usage-based-routing
fallback_models: ["complex-model"]Running the Proxy
litellm --config config.yaml --port 4000Now, your application simply points to http://localhost:4000. LiteLLM will automatically route standard traffic to the Privane local server running on localhost:8080, vastly reducing your cloud API bills while maximizing data privacy.