Mixtral-8x7B Instruct v0.1
Sparse Experts, Dense Power

Run Mixtral Efficiently
Cost Efficient
12.9B Active Params
Mixtral-8x7B Instruct v0.1 uses 46.7B total parameters, activates 12.9B per token for 5x less compute than dense 70B models.
Top Benchmarks
Beats GPT-3.5 Turbo
Scores 8.30 on MT-Bench, 70.6% MMLU, 1114 Arena ELO, outperforming GPT-3.5 and Claude 2.1.
Long Context
32K Token Window
Handles extended dialogues and document analysis via sparse expert routing without dense model memory costs.
Examples
See what Mixtral-8x7B Instruct v0.1 can create
Copy any prompt below and try it yourself in the playground.
Code Debug
“<s>[INST] Debug this Python function that calculates Fibonacci numbers inefficiently: def fib(n): if n <= 1: return n else: return fib(n-1) + fib(n-2) [/INST]”
Multilingual Chat
“<s>[INST] Explain quantum entanglement in French, then translate to English. Keep it simple for beginners. [/INST]”
Reasoning Task
“<s>[INST] Step-by-step: If a train leaves at 3 PM traveling 60 mph, and another at 4 PM at 80 mph, when does the second catch the first if 200 miles apart? [/INST]”
Story Continuation
“<s>[INST] Continue this sci-fi story: The last human awoke in a derelict spaceship, stars unfamiliar. Alarms blared as... [/INST]”
For Developers
A few lines of code.
Instruct chat. One call.
ModelsLab handles the infrastructure: fast inference, auto-scaling, and a developer-friendly API. No GPU management needed.
- Serverless: scales to zero, scales to millions
- Pay per token, no minimums
- Python and JavaScript SDKs, plus REST API
import requestsresponse = requests.post("https://modelslab.com/api/v7/llm/chat/completions",json={"key": "YOUR_API_KEY","prompt": "","model_id": ""})print(response.json())
Ready to create?
Start generating with Mixtral-8x7B Instruct v0.1 on ModelsLab.