u/Rickx005x

[Guide] Running llama.cpp with Vulkan GPU Acceleration on Termux + Pixel 9 Pro XL (Mali-G715)

[Guide] Running llama.cpp with Vulkan GPU Acceleration on Termux + Pixel 9 Pro XL (Mali-G715)

🚀 llama.cpp + Vulkan GPU Acceleration on Termux (Pixel 9 Pro XL / Mali-G715)

After lots of trial and error, I finally got llama.cpp running with real GPU acceleration on my Pixel 9 Pro XL. Here's the minimal working setup.

> TL;DR: Modern Termux + llama.cpp b9190 + -ngl 99 = automatic Mali GPU acceleration. No manual driver extraction needed (for now).


✅ Prerequisites

  • Pixel 9 Pro XL (or any Android 14+/Mali-G715 device)
  • Termux installed from F-Droid or GitHub
  • Root access (optional, but helpful for GPU frequency tuning)
  • ~4 GB free storage for model

🟢 Step 1: Install Dependencies

pkg update && pkg upgrade -y
pkg install git cmake clang make vulkan-loader vulkan-tools vulkan-headers glslang -y

🟢 Step 2: Build llama.cpp with Vulkan

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp

# Clean build with Vulkan enabled
rm -rf build
cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build -j4

✅ Verify Vulkan was compiled:

grep "GGML_VULKAN:BOOL" build/CMakeCache.txt
# Should output: GGML_VULKAN:BOOL=ON

ls -lh build/bin/libggml-vulkan.so
# Should show the .so file exists

🟢 Step 3: Download a Model

mkdir -p ~/models
cd ~/models

# Example: Gemma 2 2B (Q5 quantized, ~2GB)
wget https://huggingface.co/google/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q5_K_M.gguf

> 💡 Tip: Use hf-mirror.com if Hugging Face is slow in your region.


🟢 Step 4: Run with GPU Acceleration

cd ~/llama.cpp

./build/bin/llama-cli \
  -m ~/models/gemma-2-2b-it-Q5_K_M.gguf \
  -p "Hello, introduce yourself in Chinese" \
  -n 100 \
  -ngl 99 \
  --jinja \
  --color auto

🔑 Key Parameters:

Flag Meaning
-ngl 99 Offload all layers to GPU (Vulkan)
--jinja Enable Jinja2 template engine (fixes chat format warnings)
--color auto Color-coded output
--verbosity 0 Minimal logs (optional)

🔍 Verify GPU is Actually Working

Method 1: Check startup logs (verbose mode)

./build/bin/llama-cli -m ~/models/your-model.gguf -p "test" -n 10 -ngl 99 --verbosity 3 2>&1 | grep -i "vulkan\|mali\|device"

✅ Look for:using device Vulkan0 (Mali-G715) offloaded XX/XX layers to GPU

Method 2: Speed comparison (most reliable)

# GPU mode
./build/bin/llama-cli -m model.gguf -p "test" -n 50 -ngl 99 --verbosity 0

# CPU-only mode (for comparison)
./build/bin/llama-cli -m model.gguf -p "test" -n 50 -ngl 0 -t 8 --verbosity 0

📊 Expected results on Pixel 9 Pro XL:

Mode Speed
CPU-only (-ngl 0 -t 8) ~1-2 t/s
Vulkan GPU (-ngl 99) ~7-8 t/s ✅

> If GPU mode is 5-6x faster, Vulkan is working!


⚡ Optional: Boost GPU Frequency (Root Required)

Mali GPUs often run at conservative frequencies by default. Unlock performance:

# Check current GPU frequency
cat /sys/devices/platform/gpu0/devfreq/gpu0/cur_freq

# Set to performance mode (max frequency)
su -c "echo performance > /sys/devices/platform/gpu0/devfreq/gpu0/governor"

# Verify change
cat /sys/devices/platform/gpu0/devfreq/gpu0/cur_freq

🚀 Expected improvement: 7-8 t/s → 12-15 t/s

> ⚠️ Warning: Higher frequency = more heat + battery drain. Reverts on reboot.


🛠️ Troubleshooting

❌ "Permission denied" when loading model

chmod 644 ~/models/your-model.gguf

❌ Vulkan not detected / still using CPU

  1. Verify compilation: grep GGML_VULKAN build/CMakeCache.txt
  2. Rebuild with -DGGML_VULKAN=ON
  3. Ensure dependencies installed: pkg list-installed | grep vulkan

❌ Chat template warnings

Add --jinja flag to enable Jinja2 template engine.

❌ Slow performance (<5 t/s with -ngl 99)

  • Try reducing context: -c 512 instead of -c 2048
  • Enable KV cache quantization: -ctv q8_0
  • Boost GPU frequency (see above)

📦 Recommended Models for Testing

Model Size Expected Speed (Vulkan)
Qwen2.5-0.5B ~0.4 GB 20-30 t/s
Qwen2.5-1.5B ~1.1 GB 12-18 t/s
Gemma-2-2B ~2.0 GB 7-10 t/s
Llama-3-8B ~4.9 GB 4-7 t/s

> Smaller = faster. Start with Qwen2.5-0.5B for testing.


🔄 Why This Works Now (No Manual Driver Extraction)

Older guides required manually copying vulkan.mali.so and setting LD_LIBRARY_PATH. On modern setups, this is often unnecessary because:

✅ Android 14+ has better Vulkan ICD discovery
✅ Termux's vulkan-loader package auto-detects system drivers
✅ llama.cpp b9190+ has improved Android Vulkan backend

> ⚠️ If auto-detection fails on your device, fall back to manual driver extraction guides.


🎯 Final Working Command (Copy-Paste)

cd ~/llama.cpp
./build/bin/llama-cli \
  -m ~/models/gemma-2-2b-it-Q5_K_M.gguf \
  -p "Hello, write a short poem about Android" \
  -n 150 \
  -ngl 99 \
  -c 1024 \
  --jinja \
  --color auto \
  --verbosity 0

💬 Questions?

Drop a comment if you run into issues! Happy to help troubleshoot.

Device: Pixel 9 Pro XL
Android: 15
Termux: Latest from F-Droid
llama.cpp: b9190 (main branch)
GPU: Mali-G715 (Vulkan)


Last tested: May 2026 2026.5.17

u/Rickx005x — 5 days ago
▲ 1 r/termux

[Guide] Running llama.cpp with Vulkan GPU Acceleration on Termux + Pixel 9 Pro XL (Mali-G715)

🚀 llama.cpp + Vulkan GPU Acceleration on Termux (Pixel 9 Pro XL / Mali-G715)

After lots of trial and error, I finally got llama.cpp running with real GPU acceleration on my Pixel 9 Pro XL. Here's the minimal working setup.

> TL;DR: Modern Termux + llama.cpp b9190 + -ngl 99 = automatic Mali GPU acceleration. No manual driver extraction needed (for now).


✅ Prerequisites

  • Pixel 9 Pro XL (or any Android 14+/Mali-G715 device)
  • Termux installed from F-Droid or GitHub
  • Root access (optional, but helpful for GPU frequency tuning)
  • ~4 GB free storage for model

🟢 Step 1: Install Dependencies

pkg update &amp;&amp; pkg upgrade -y
pkg install git cmake clang make vulkan-loader vulkan-tools vulkan-headers glslang -y

🟢 Step 2: Build llama.cpp with Vulkan

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp

# Clean build with Vulkan enabled
rm -rf build
cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build -j4

✅ Verify Vulkan was compiled:

grep "GGML_VULKAN:BOOL" build/CMakeCache.txt
# Should output: GGML_VULKAN:BOOL=ON

ls -lh build/bin/libggml-vulkan.so
# Should show the .so file exists

🟢 Step 3: Download a Model

mkdir -p ~/models
cd ~/models

# Example: Gemma 2 2B (Q5 quantized, ~2GB)
wget https://huggingface.co/google/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q5_K_M.gguf

> 💡 Tip: Use hf-mirror.com if Hugging Face is slow in your region.


🟢 Step 4: Run with GPU Acceleration

cd ~/llama.cpp

./build/bin/llama-cli \
  -m ~/models/gemma-2-2b-it-Q5_K_M.gguf \
  -p "Hello, introduce yourself in Chinese" \
  -n 100 \
  -ngl 99 \
  --jinja \
  --color auto

🔑 Key Parameters:

Flag Meaning
-ngl 99 Offload all layers to GPU (Vulkan)
--jinja Enable Jinja2 template engine (fixes chat format warnings)
--color auto Color-coded output
--verbosity 0 Minimal logs (optional)

🔍 Verify GPU is Actually Working

Method 1: Check startup logs (verbose mode)

./build/bin/llama-cli -m ~/models/your-model.gguf -p "test" -n 10 -ngl 99 --verbosity 3 2&gt;&amp;1 | grep -i "vulkan\|mali\|device"

✅ Look for:using device Vulkan0 (Mali-G715) offloaded XX/XX layers to GPU

Method 2: Speed comparison (most reliable)

# GPU mode
./build/bin/llama-cli -m model.gguf -p "test" -n 50 -ngl 99 --verbosity 0

# CPU-only mode (for comparison)
./build/bin/llama-cli -m model.gguf -p "test" -n 50 -ngl 0 -t 8 --verbosity 0

📊 Expected results on Pixel 9 Pro XL:

Mode Speed
CPU-only (-ngl 0 -t 8) ~1-2 t/s
Vulkan GPU (-ngl 99) ~7-8 t/s ✅

> If GPU mode is 5-6x faster, Vulkan is working!


⚡ Optional: Boost GPU Frequency (Root Required)

Mali GPUs often run at conservative frequencies by default. Unlock performance:

# Check current GPU frequency
cat /sys/devices/platform/gpu0/devfreq/gpu0/cur_freq

# Set to performance mode (max frequency)
su -c "echo performance &gt; /sys/devices/platform/gpu0/devfreq/gpu0/governor"

# Verify change
cat /sys/devices/platform/gpu0/devfreq/gpu0/cur_freq

🚀 Expected improvement: 7-8 t/s → 12-15 t/s

> ⚠️ Warning: Higher frequency = more heat + battery drain. Reverts on reboot.


🛠️ Troubleshooting

❌ "Permission denied" when loading model

chmod 644 ~/models/your-model.gguf

❌ Vulkan not detected / still using CPU

  1. Verify compilation: grep GGML_VULKAN build/CMakeCache.txt
  2. Rebuild with -DGGML_VULKAN=ON
  3. Ensure dependencies installed: pkg list-installed | grep vulkan

❌ Chat template warnings

Add --jinja flag to enable Jinja2 template engine.

❌ Slow performance (<5 t/s with -ngl 99)

  • Try reducing context: -c 512 instead of -c 2048
  • Enable KV cache quantization: -ctv q8_0
  • Boost GPU frequency (see above)

📦 Recommended Models for Testing

Model Size Expected Speed (Vulkan)
Qwen2.5-0.5B ~0.4 GB 20-30 t/s
Qwen2.5-1.5B ~1.1 GB 12-18 t/s
Gemma-2-2B ~2.0 GB 7-10 t/s
Llama-3-8B ~4.9 GB 4-7 t/s

> Smaller = faster. Start with Qwen2.5-0.5B for testing.


🔄 Why This Works Now (No Manual Driver Extraction)

Older guides required manually copying vulkan.mali.so and setting LD_LIBRARY_PATH. On modern setups, this is often unnecessary because:

✅ Android 14+ has better Vulkan ICD discovery
✅ Termux's vulkan-loader package auto-detects system drivers
✅ llama.cpp b9190+ has improved Android Vulkan backend

> ⚠️ If auto-detection fails on your device, fall back to manual driver extraction guides.


🎯 Final Working Command (Copy-Paste)

cd ~/llama.cpp
./build/bin/llama-cli \
  -m ~/models/gemma-2-2b-it-Q5_K_M.gguf \
  -p "Hello, write a short poem about Android" \
  -n 150 \
  -ngl 99 \
  -c 1024 \
  --jinja \
  --color auto \
  --verbosity 0

💬 Questions?

Drop a comment if you run into issues! Happy to help troubleshoot.

Device: Pixel 9 Pro XL
Android: 15
Termux: Latest from F-Droid
llama.cpp: b9190 (main branch)
GPU: Mali-G715 (Vulkan)


Last tested: May 2026

u/Rickx005x — 5 days ago

Does anyone know how to use NNAPI in llama.cpp?

Hello everyone, my phone has an NPU processor. I have an idea to use NNAPI, but I haven't seen any official instructions or documentation on how to use it. Does anyone know about it?

reddit.com
u/Rickx005x — 6 days ago
▲ 1 r/pixel+2 crossposts

I use a Pixel phone to set up Gemma and run it locally

Pixel 9 Pro XL + Tensor G4 + 16GB LPDDR5X + Mali-G715: The Power of Tri-Synergy!

This tutorial is designed for **Rooted Android Phones + Termux + Mali GPU (e.g., Pixel, Xiaomi, Samsung, etc.)**.

---

# 📱 Complete Guide: Termux + Root + Mali GPU Accelerated llama.cpp

## ⚠️ Prerequisites

  1. **Phone must be rooted** (Magisk/KernelSU).

  2. **Install Termux** (recommended to download the latest version from F-Droid or GitHub).

  3. **Phone GPU must be Mali** (most MediaTek and Tensor chips use Mali; Qualcomm Snapdragon uses Adreno, which requires a different approach).

---

## 🟢 Phase One: Install Basic Environment & Compile llama.cpp

Run the following commands sequentially in Termux:

```bash

# 1. Update and install dependencies

pkg update && pkg upgrade -y

pkg install git cmake clang make vulkan-loader vulkan-tools -y

# 2. Clone llama.cpp

git clone https://github.com/ggml-org/llama.cpp

cd llama.cpp

# 3. Compile (enable Vulkan and ARM NEON optimizations)

mkdir build && cd build

cmake .. -DLLAMA_VULKAN=ON -DLLAMA_ARM_NEON=ON -DCMAKE_BUILD_TYPE=Release

make -j$(nproc)

```

&gt; 💡 **Tip**: Compilation may take 10-20 minutes; please be patient.

---

## 🔵 Phase Two: Extract System GPU Drivers (Core Step)

**This is the most critical step!** We need to "extract" the system drivers for Termux to use.

### 1. Create Storage Directories

```bash

mkdir -p /data/data/com.termux/files/usr/lib/vulkan

mkdir -p /data/data/com.termux/files/usr/share/vulkan/icd.d

```

### 2. Locate Driver File Paths

Paths vary by device. Run the following commands to search:

```bash

# Search for Vulkan driver

su -c "find /vendor /system -name 'vulkan*.so' 2>/dev/null"

# Search for GLES driver

su -c "find /vendor /system -name 'libGLES_mali.so' 2>/dev/null"

# Search for system Vulkan loader (critical!)

su -c "find /apex /system -name 'libvulkan.so' 2>/dev/null"

```

**👇 Example paths found (for Pixel 9 / Tensor G4):**

* Vulkan ICD: `/vendor/lib64/hw/vulkan.mali.so`

* GLES Lib: `/vendor/lib64/egl/libGLES_mali.so`

* Config Lib: `/vendor/lib64/aconfig_gpu_flags_c_lib.so` (if present)

* System Loader: `/system/lib64/libvulkan.so`

### 3. Copy Drivers to Termux

**Please modify the commands below according to your actual paths found above!**

```bash

# Gain root access and copy files (adjust paths as needed)

su << 'EOF'

# 1. Copy Mali-specific driver

cp /vendor/lib64/hw/vulkan.mali.so /data/data/com.termux/files/usr/lib/vulkan/

cp /vendor/lib64/egl/libGLES_mali.so /data/data/com.termux/files/usr/lib/vulkan/

# 2. Copy potential dependency libraries (ignore if file doesn't exist)

cp /vendor/lib64/aconfig_gpu_flags_c_lib.so /data/data/com.termux/files/usr/lib/vulkan/ 2>/dev/null

# 3. [Critical] Copy system-level Vulkan Loader (resolves missing symbol issues)

cp /system/lib64/libvulkan.so /data/data/com.termux/files/usr/lib/

# 4. Copy C++ standard library

cp /vendor/lib64/libc++.so /data/data/com.termux/files/usr/lib/vulkan/

# 5. Set permissions

chmod 755 /data/data/com.termux/files/usr/lib/vulkan/*.so

chmod 755 /data/data/com.termux/files/usr/lib/libvulkan.so

exit

EOF

```

---

## 🟠 Phase Three: Configure Environment Variables

### 1. Create ICD JSON Configuration File

Tell the Vulkan loader where to find the driver.

```bash

su -c "cat > /data/data/com.termux/files/usr/share/vulkan/icd.d/mali_icd.json << 'JSONEOF'

{

\"file_format_version\": \"1.0.1\",

\"ICD\": {

\"library_path\": \"/data/data/com.termux/files/usr/lib/vulkan/vulkan.mali.so\",

\"api_version\": \"1.4.335\"

}

}

JSONEOF"

chmod 644 /data/data/com.termux/files/usr/share/vulkan/icd.d/mali_icd.json

```

### 2. Set Environment Variables (Permanent)

```bash

# Append to ~/.bashrc

cat >> ~/.bashrc << 'RCPEOF'

# --- Vulkan GPU Acceleration for llama.cpp ---

export LD_LIBRARY_PATH=/system/lib64:/vendor/lib64:/vendor/lib64/egl:/data/data/com.termux/files/usr/lib/vulkan:/data/data/com.termux/files/usr/lib

export VK_ICD_FILENAMES=/data/data/com.termux/files/usr/share/vulkan/icd.d/mali_icd.json

# ---------------------------------------------

RCPEOF

# Apply changes immediately

source ~/.bashrc

```

---

## 🟣 Phase Four: Verification & Testing

### 1. Verify Vulkan Recognizes the GPU

```bash

vulkaninfo | grep -E "deviceName|deviceType"

```

✅ **Success indicator**: You should see `Mali-G715` or `Immortalis` etc., NOT `llvmpipe`.

### 2. Download a Model (e.g., Gemma-4)

Have a friend transfer the model to your phone, e.g., place it in `/storage/emulated/0/Download/`.

### 3. Run Test

```bash

cd ~/llama.cpp/build/bin

./llama-cli \

-m /storage/emulated/0/Download/Gemma-4-E2B--Q5_K_P.gguf \

--temp 1.0 -t 4 -c 4096 --no-mmap -n 20 --verbose 2>&1 | grep -E "Vulkan|GPU|device"

```

✅ **Success indicators**:

* You see `using device Vulkan0 (Mali-...)`

* You see `offloaded 36/36 layers to GPU`

---

## 🔴 Frequently Asked Questions (FAQ)

**Q1: What if my phone uses Qualcomm Snapdragon (Adreno GPU)?**

A: The process is the same, but the driver filename differs.

* Search for drivers: `su -c "find /vendor -name '*adreno*' -o -name '*kgsl*'"`

* The driver is typically named `vulkan.adreno.so`.

**Q2: Getting error `dlopen failed: library xxx not found`?**

A: This indicates missing dependency libraries.

* Use `readelf -d /path/to/vulkan.so | grep NEEDED` to identify missing libraries.

* Locate the corresponding `.so` files in `/vendor/lib64` or `/system/lib64` and copy them over.

**Q3: No speed improvement, still running on CPU?**

A: Check the `llama-cli` output.

* If you see `assigned to device CPU`, the environment variables may be misconfigured.

* Ensure you executed `source ~/.bashrc` before running.

* Verify thatvulkaninfocan detect your GPU.

This proves that:

- ✅ **GPU Model**: Mali-G715 successfully recognized

- ✅ **VRAM Usage**: 15456 MiB (~15 GB) - Indicates the model is fully loaded onto the GPU

- ✅ **Model Footprint**: 1641 MiB - Model weights residing in VRAM

- ✅ **Compute Buffer**: 545 MiB - GPU computation buffers

---

## 📊 Final Scorecard

| Metric | CPU Mode | GPU Mode | Improvement |

|--------|----------|----------|-------------|

| **Generation Speed** | 4.1 t/s | **7.3 t/s** | **+78%** 🚀 |

| **Processing Speed** | 8.4 t/s | **10.7 t/s** | **+27%** |

| **Device** | CPU | **Mali-G715** | ✅ |

| **Memory** | System RAM | **15 GB VRAM** | ✅ |

---

## 🎁 Now You Can:

### 1. **Create a Quick-Launch Script**

```bash

cat > ~/run-gemma.sh << 'EOF'

#!/data/data/com.termux/files/usr/bin/bash

cd ~/llama.cpp/build/bin

./llama-cli \

-m /storage/emulated/0/Download/Gemma-4-E2B--Q5_K_P.gguf \

--simple-io --jinja \

--temp 1.0 --top-p 0.95 --top-k 64 \

-t 4 -c 4096 --no-mmap \

--interactive-first

EOF

chmod +x ~/run-gemma.sh

```

From now on, simply run `~/run-gemma.sh` to launch!

### 2. **Try Larger Models**

Since your GPU has 15GB of VRAM, you can experiment with:

- **Llama-3-8B** (Q4_K_M quantized)

- **Qwen2.5-7B** (Q5_K_M quantized)

- **Mistral-7B** (Q6_K quantized)

&gt; 💡 **Pro Tip**: Larger models with higher quantization levels (like Q6_K) offer better quality but require more VRAM. Monitor your memory usage with `vulkaninfo` or Android's developer options to avoid out-of-memory crashes.

u/Rickx005x — 8 days ago
▲ 1 r/help

App login not working after 2FA enabled (u/Rickx005x)

Hello Reddit Support Team,

My account is u/Rickx005x. I have successfully enabled 2FA and can log in perfectly via the web (reddit.com) using my username, password, and 6-digit 2FA code.

But on the official Android Reddit app, I cannot log in at all:

  • It either gets stuck on a black loading screen,
  • Or forces me to use an email magic link instead of the 2FA flow,
  • And when I try to enter my username/password, it says "invalid credentials" even though they work on web.

I've already:

  1. Reinstalled the Reddit app to the latest version
  2. Used a stable US VPN
  3. Tried logging in with both username and email

Please help me fix this app login restriction. Thank you!

Best regards, Rickx005x

reddit.com
u/Rickx005x — 10 days ago
▲ 1 r/Pixel9Pro+1 crossposts

Hey everyone,

​I’m currently on my Pixel 9 Pro XL and I've been intentionally blocking the April update (CP1A.260405.005) because of the widespread reports regarding eSIM functionality failure. My cellular connection is my lifeline, so I couldn't risk the "No SIM" bug.

​Now that the May patch is rolling out, has anyone who previously had eSIM issues confirmed if this new build fixes it? I’m hesitant to hit that update button until I know for sure the modem firmware is stable this time.

​Any feedback from those who have already updated would be much appreciated!

​I am seeking confirmation regarding the modem/eSIM stability in the latest May update. I skipped the previous April build after encountering/reading about the critical eSIM bug that caused the device to fail to recognize digital SIMs.

​Device: Pixel 9 Pro XL

Previous Issue: eSIM failing to activate/connect.

​Does the May update include a specific fix for the radio/modem firmware related to eSIM? If you were affected by the previous bug, please let me know if your service was restored after this update. Thanks!

reddit.com
u/Rickx005x — 17 days ago