Local Edge Inference
Fikra Nano 1B (Ternary)
Fikra Nano 1B is our custom-trained, ultra-efficient small language model. Designed specifically for edge devices, it utilizes 1.58-bit ternary quantization to drastically reduce memory usage without destroying semantic reasoning capabilities.
Hugging Face Repositories
While you can use the fikra pip package to interact with Nano easily, advanced developers and researchers can access the raw model weights directly from our Hugging Face repositories to plug into custom backends like vLLM, llama.cpp, or Ollama.
- Base Model (Safetensors): lacesseapp/Fikra-1B-Nano-v0.2
- Quantized (GGUF): lacesseapp/Fikra-1B-Nano-v0.2-GGUF
Running with Llama.cpp
If you prefer using the highly optimized llama.cpp framework, you can download the GGUF version and run it directly in your terminal.
# Download the model wget https://huggingface.co/lacesseapp/Fikra-1B-Nano-v0.2-GGUF/resolve/main/fikra-1b-nano-v0.2-q4_k_m.gguf # Run via llama.cpp ./main -m fikra-1b-nano-v0.2-q4_k_m.gguf -p "Explain quantum computing simply: " -n 256
Model Details
- Parameters: 1.1 Billion
- Architecture: Llama-based
- Context Window: 2,048 tokens
- RAM Requirement: ~1.2 GB
- License: Apache 2.0