Local Edge Inference

Fikra Nano 1B (Ternary)

Fikra Nano 1B is our custom-trained, ultra-efficient small language model. Designed specifically for edge devices, it utilizes 1.58-bit ternary quantization to drastically reduce memory usage without destroying semantic reasoning capabilities.

Hugging Face Repositories

While you can use the fikra pip package to interact with Nano easily, advanced developers and researchers can access the raw model weights directly from our Hugging Face repositories to plug into custom backends like vLLM, llama.cpp, or Ollama.

Base Model (Safetensors): lacesseapp/Fikra-1B-Nano-v0.2
Quantized (GGUF): lacesseapp/Fikra-1B-Nano-v0.2-GGUF

Running with Llama.cpp

If you prefer using the highly optimized llama.cpp framework, you can download the GGUF version and run it directly in your terminal.

# Download the model
wget https://huggingface.co/lacesseapp/Fikra-1B-Nano-v0.2-GGUF/resolve/main/fikra-1b-nano-v0.2-q4_k_m.gguf

# Run via llama.cpp
./main -m fikra-1b-nano-v0.2-q4_k_m.gguf -p "Explain quantum computing simply: " -n 256

Model Details

Parameters: 1.1 Billion
Architecture: Llama-based
Context Window: 2,048 tokens
RAM Requirement: ~1.2 GB
License: Apache 2.0