Local Edge Inference

Fikra Python Package

While the Fikra Cloud API relies on server-side compute, the fikra Python package allows you to run intelligence completely offline, locally on your own hardware.

Fikra is engineered around 1.58-bit Ternary Quantization. Instead of using complex 16-bit floats, our models represent neural weights using only three states: -1, 0, 1. This eliminates the need for expensive GPU matrix multiplication, allowing you to run powerful AI models entirely on standard laptop or server CPUs.

Currently Supported Model

The fikra package currently supports Fikra Nano 1B. This is our flagship ternary weight model, requiring a footprint of only 1.2 GB RAM and achieving up to 48 tokens/second on a standard Intel Core i5 processor.

Installation

Install the package directly via pip. Ensure you are using Python 3.10 or higher.

pip install fikra

Python SDK Usage

You can embed Fikra Nano 1B directly into your offline applications with just a few lines of code. The package handles the weight downloading and execution automatically.

from fikra import Fikra

# 1. Initialize (Automatically downloads this model to your machine)
brain = Fikra() 

# 2. Reason (Offline)
answer = brain.reason("If I have 3 apples and eat one, how many are left?")
print(answer)

# Output: "You have 2 apples."

🛠️ Manual Usage (llama.cpp)

If you prefer using llama.cpp directly without our Python SDK, you can download the GGUF weights and run inference straight from your terminal.

./main -m fikra-1b-nano-v0.2-q4_k_m.gguf -n 128 -p "User: Why is the sky blue?\nAnswer:"

Why Edge Inference?

Running models locally via the fikra package ensures total data sovereignty. Because the model executes physically on your CPU, sensitive corporate data, legal documents, or private customer chats never touch the internet. It is the ultimate solution for low-bandwidth environments across the African continent.