San Francisco Compute has partnered with Modular to create Large Scale Inference, the best priced OpenAI-compatible inference in the world. On most open source models, we're 85%+ cheaper than other options. We built LSI in close partnership with a tier-1 AI lab to help them print trillions of tokens of synthetic data, saving them tens of millions of dollars compared to the leading competitor.
Unlike other providers, inference prices are market-based. The token price tracks the underlying market-based compute cost on SFC & current load. In other words, we give you the best available price.
LSI natively supports very large scale batch inference, with far higher rate limits & throughput than other providers. Unlike other services, we don't force you to upload petabytes of data to us. Our batch inference reads & writes to an S3-compatible object store, so your sensitive data isn't stored indefinitely on our servers. LSI natively handles multimodal use cases without forcing you to publicly share links to your content.
LSI is designed for large scale, mostly enterprise, use cases. That lets us be more hands on than traditional, self-serve providers.
Want a deployment behind your private network?
Need to hit specific latency, throughput, or uptime requirements?
Is there a model that's performing better in your evals, but we're not serving it?
We currently support the following models below the cost of every current provider on average. Exact prices, latency, and throughput depend on the use case & current market conditions. For a quote & technical demo, contact us.
Model | Hugging Face Name | Size |
---|---|---|
DeepSeek‑R1 | deepseek-ai/DeepSeek-R1 | 671B |
DeepSeek‑V3 | deepseek-ai/DeepSeek-V3 | 671B |
DeepSeek‑R1‑Distill‑Llama‑70B | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | 70B |
Llama‑3‑70B‑chat | meta-llama/Llama-3-70b-chat-hf | 70B |
Llama‑3.1‑405B‑Instruct | meta-llama/Meta-Llama-3.1-405B-Instruct | 405B |
Llama‑3.1‑70B‑Instruct | meta-llama/Meta-Llama-3.1-70B-Instruct | 70B |
Llama‑3.1‑8B‑Instruct | meta-llama/Meta-Llama-3.1-8B-Instruct | 8B |
Llama‑4‑Scout‑17B‑Instruct | meta-llama/Llama-4-Scout-17B-16E-Instruct | 109B |
Llama‑4‑Maverick‑17B‑128E‑Instruct | meta-llama/Llama-4-Maverick-17B-128E-Instruct | 400B |
Llama 3.2 Vision | meta-llama/Llama-3.2-11B-Vision-Instruct | 11B |
Mistral‑7B‑Instruct | mistralai/Mistral-7B-Instruct-v0.1 | 7B |
Mixtral‑8x7B‑Instruct | mistralai/Mixtral-8x7B-Instruct-v0.1 | 56B |
Mistral‑Small‑24B‑Instruct | mistralai/Mistral-Small-24B-Instruct-2501 | 24B |
Qwen‑2.5‑72B‑Instruct | Qwen/Qwen2.5-72B-Instruct | 72.7B |
Qwen‑2.5‑7B‑Instruct | Qwen/Qwen2.5-7B-Instruct | 7B |
Qwen 3‑14B | Qwen/Qwen3-14B | 14.8B |
Qwen 3‑8B | Qwen/Qwen3-8B | 8.2B |
QwQ‑32B | Qwen/QwQ-32B | 32.5B |
InternVL3‑9B | OpenGVLab/InternVL3-9B | 9B |
InternVL3‑14B | OpenGVLab/InternVL3-14B | 14B |
InternVL3‑38B | OpenGVLab/InternVL3-38B | 38B |
InternVL3‑78B | OpenGVLab/InternVL3-78B | 78B |
Gemma‑3‑12B‑in‑chat | google/gemma-3-12b-it | 12B |
Gemma‑3‑27B‑in‑chat | google/gemma-3-27b-it | 27B |
Need a custom setup? Contact Us