Documentation
Supported Models
The Fikra API offers access to state-of-the-art open-weights models, accelerated by Groq LPU™ technology for ultra-fast inference. Use the exact string in the Model ID column when making your API requests.
| Model ID | Developer | Context Window | Specs & Best Use Cases |
|---|---|---|---|
| llama3-70b | Meta | 131,072 tokens |
Specs: 70 Billion parameters. High-precision instruction following and deep world knowledge. Best For: Enterprise-grade reasoning, complex RAG (Retrieval-Augmented Generation) pipelines, multi-step logical deduction, and advanced code generation. |
| llama3-8b | Meta | 131,072 tokens |
Specs: 8 Billion parameters. Extremely low latency and highly cost-effective compute. Best For: High-throughput tasks, real-time responsive chatbots, bulk text classification, rapid data extraction, and document summarization at scale. |
| qwen-3-32b | Alibaba Cloud | 131,072 tokens |
Specs: 32 Billion parameters. Superior multilingual capabilities and robust coding proficiency. Best For: Translating across diverse language sets, parsing massive financial or legal document dumps, and analyzing extensive codebases with high contextual awareness. |
| gpt-oss-20b | Open AI | 131,072 tokens |
Specs: 20 Billion parameters. Highly aligned instruction tuning and balanced architecture. Best For: General-purpose chatting, nuanced creative writing, safe and aligned content generation, and dynamic conversational agent personas. |
Exact API Payload Strings
When sending your HTTP POST requests, you must use the exact hardware routing string in the "model" parameter of your JSON payload. If you use the advertised name instead of the routing string, the gateway will return a 400 or 404 error.
| Advertised Model | Required Payload String |
|---|---|
| Llama 3 70B | "llama-3.3-70b-versatile" |
| Llama 3 8B | "llama-3.1-8b-instant" |
| Qwen 3 32B | "qwen-2.5-32b" |
| GPT OSS 20B | "openai/gpt-oss-20b" |