![[Guide] Running llama.cpp with Vulkan GPU Acceleration on Termux + Pixel 9 Pro XL (Mali-G715)](https://external-preview.redd.it/7jMZ7XD80oeucmGEaTwktIRZexLtGWvJfKdVD6Wu2SI.png?width=108&crop=smart&auto=webp&s=f76972ebffee08aaf239701abf1fc402c9586496)
[Guide] Running llama.cpp with Vulkan GPU Acceleration on Termux + Pixel 9 Pro XL (Mali-G715)
🚀 llama.cpp + Vulkan GPU Acceleration on Termux (Pixel 9 Pro XL / Mali-G715)
After lots of trial and error, I finally got llama.cpp running with real GPU acceleration on my Pixel 9 Pro XL. Here's the minimal working setup.
> TL;DR: Modern Termux + llama.cpp b9190 + -ngl 99 = automatic Mali GPU acceleration. No manual driver extraction needed (for now).
✅ Prerequisites
- Pixel 9 Pro XL (or any Android 14+/Mali-G715 device)
- Termux installed from F-Droid or GitHub
- Root access (optional, but helpful for GPU frequency tuning)
- ~4 GB free storage for model
🟢 Step 1: Install Dependencies
pkg update && pkg upgrade -y
pkg install git cmake clang make vulkan-loader vulkan-tools vulkan-headers glslang -y
🟢 Step 2: Build llama.cpp with Vulkan
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
# Clean build with Vulkan enabled
rm -rf build
cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build -j4
✅ Verify Vulkan was compiled:
grep "GGML_VULKAN:BOOL" build/CMakeCache.txt
# Should output: GGML_VULKAN:BOOL=ON
ls -lh build/bin/libggml-vulkan.so
# Should show the .so file exists
🟢 Step 3: Download a Model
mkdir -p ~/models
cd ~/models
# Example: Gemma 2 2B (Q5 quantized, ~2GB)
wget https://huggingface.co/google/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q5_K_M.gguf
> 💡 Tip: Use hf-mirror.com if Hugging Face is slow in your region.
🟢 Step 4: Run with GPU Acceleration
cd ~/llama.cpp
./build/bin/llama-cli \
-m ~/models/gemma-2-2b-it-Q5_K_M.gguf \
-p "Hello, introduce yourself in Chinese" \
-n 100 \
-ngl 99 \
--jinja \
--color auto
🔑 Key Parameters:
| Flag | Meaning |
|---|---|
-ngl 99 |
Offload all layers to GPU (Vulkan) |
--jinja |
Enable Jinja2 template engine (fixes chat format warnings) |
--color auto |
Color-coded output |
--verbosity 0 |
Minimal logs (optional) |
🔍 Verify GPU is Actually Working
Method 1: Check startup logs (verbose mode)
./build/bin/llama-cli -m ~/models/your-model.gguf -p "test" -n 10 -ngl 99 --verbosity 3 2>&1 | grep -i "vulkan\|mali\|device"
✅ Look for:using device Vulkan0 (Mali-G715) offloaded XX/XX layers to GPU
Method 2: Speed comparison (most reliable)
# GPU mode
./build/bin/llama-cli -m model.gguf -p "test" -n 50 -ngl 99 --verbosity 0
# CPU-only mode (for comparison)
./build/bin/llama-cli -m model.gguf -p "test" -n 50 -ngl 0 -t 8 --verbosity 0
📊 Expected results on Pixel 9 Pro XL:
| Mode | Speed |
|---|---|
CPU-only (-ngl 0 -t 8) |
~1-2 t/s |
Vulkan GPU (-ngl 99) |
~7-8 t/s ✅ |
> If GPU mode is 5-6x faster, Vulkan is working!
⚡ Optional: Boost GPU Frequency (Root Required)
Mali GPUs often run at conservative frequencies by default. Unlock performance:
# Check current GPU frequency
cat /sys/devices/platform/gpu0/devfreq/gpu0/cur_freq
# Set to performance mode (max frequency)
su -c "echo performance > /sys/devices/platform/gpu0/devfreq/gpu0/governor"
# Verify change
cat /sys/devices/platform/gpu0/devfreq/gpu0/cur_freq
🚀 Expected improvement: 7-8 t/s → 12-15 t/s
> ⚠️ Warning: Higher frequency = more heat + battery drain. Reverts on reboot.
🛠️ Troubleshooting
❌ "Permission denied" when loading model
chmod 644 ~/models/your-model.gguf
❌ Vulkan not detected / still using CPU
- Verify compilation:
grep GGML_VULKAN build/CMakeCache.txt - Rebuild with
-DGGML_VULKAN=ON - Ensure dependencies installed:
pkg list-installed | grep vulkan
❌ Chat template warnings
Add --jinja flag to enable Jinja2 template engine.
❌ Slow performance (<5 t/s with -ngl 99)
- Try reducing context:
-c 512instead of-c 2048 - Enable KV cache quantization:
-ctv q8_0 - Boost GPU frequency (see above)
📦 Recommended Models for Testing
| Model | Size | Expected Speed (Vulkan) |
|---|---|---|
| Qwen2.5-0.5B | ~0.4 GB | 20-30 t/s |
| Qwen2.5-1.5B | ~1.1 GB | 12-18 t/s |
| Gemma-2-2B | ~2.0 GB | 7-10 t/s |
| Llama-3-8B | ~4.9 GB | 4-7 t/s |
> Smaller = faster. Start with Qwen2.5-0.5B for testing.
🔄 Why This Works Now (No Manual Driver Extraction)
Older guides required manually copying vulkan.mali.so and setting LD_LIBRARY_PATH. On modern setups, this is often unnecessary because:
✅ Android 14+ has better Vulkan ICD discovery
✅ Termux's vulkan-loader package auto-detects system drivers
✅ llama.cpp b9190+ has improved Android Vulkan backend
> ⚠️ If auto-detection fails on your device, fall back to manual driver extraction guides.
🎯 Final Working Command (Copy-Paste)
cd ~/llama.cpp
./build/bin/llama-cli \
-m ~/models/gemma-2-2b-it-Q5_K_M.gguf \
-p "Hello, write a short poem about Android" \
-n 150 \
-ngl 99 \
-c 1024 \
--jinja \
--color auto \
--verbosity 0
💬 Questions?
Drop a comment if you run into issues! Happy to help troubleshoot.
Device: Pixel 9 Pro XL
Android: 15
Termux: Latest from F-Droid
llama.cpp: b9190 (main branch)
GPU: Mali-G715 (Vulkan)
Last tested: May 2026 2026.5.17