u/Rickx005x

[Guide] Running llama.cpp with Vulkan GPU Acceleration on Termux + Pixel 9 Pro XL (Mali-G715)

🚀 llama.cpp + Vulkan GPU Acceleration on Termux (Pixel 9 Pro XL / Mali-G715)

After lots of trial and error, I finally got llama.cpp running with real GPU acceleration on my Pixel 9 Pro XL. Here's the minimal working setup.

> TL;DR: Modern Termux + llama.cpp b9190 + -ngl 99 = automatic Mali GPU acceleration. No manual driver extraction needed (for now).

✅ Prerequisites

Pixel 9 Pro XL (or any Android 14+/Mali-G715 device)
Termux installed from F-Droid or GitHub
Root access (optional, but helpful for GPU frequency tuning)
~4 GB free storage for model

🟢 Step 1: Install Dependencies

pkg update &amp;&amp; pkg upgrade -y
pkg install git cmake clang make vulkan-loader vulkan-tools vulkan-headers glslang -y

🟢 Step 2: Build llama.cpp with Vulkan

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp

# Clean build with Vulkan enabled
rm -rf build
cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build -j4

✅ Verify Vulkan was compiled:

grep "GGML_VULKAN:BOOL" build/CMakeCache.txt
# Should output: GGML_VULKAN:BOOL=ON

ls -lh build/bin/libggml-vulkan.so
# Should show the .so file exists

🟢 Step 3: Download a Model

mkdir -p ~/models
cd ~/models

# Example: Gemma 2 2B (Q5 quantized, ~2GB)
wget https://huggingface.co/google/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q5_K_M.gguf

> 💡 Tip: Use hf-mirror.com if Hugging Face is slow in your region.

🟢 Step 4: Run with GPU Acceleration

cd ~/llama.cpp

./build/bin/llama-cli \
  -m ~/models/gemma-2-2b-it-Q5_K_M.gguf \
  -p "Hello, introduce yourself in Chinese" \
  -n 100 \
  -ngl 99 \
  --jinja \
  --color auto

🔑 Key Parameters:

Flag	Meaning
`-ngl 99`	Offload all layers to GPU (Vulkan)
`--jinja`	Enable Jinja2 template engine (fixes chat format warnings)
`--color auto`	Color-coded output
`--verbosity 0`	Minimal logs (optional)

🔍 Verify GPU is Actually Working

Method 1: Check startup logs (verbose mode)

./build/bin/llama-cli -m ~/models/your-model.gguf -p "test" -n 10 -ngl 99 --verbosity 3 2&gt;&amp;1 | grep -i "vulkan\|mali\|device"

✅ Look for:using device Vulkan0 (Mali-G715) offloaded XX/XX layers to GPU

Method 2: Speed comparison (most reliable)

# GPU mode
./build/bin/llama-cli -m model.gguf -p "test" -n 50 -ngl 99 --verbosity 0

# CPU-only mode (for comparison)
./build/bin/llama-cli -m model.gguf -p "test" -n 50 -ngl 0 -t 8 --verbosity 0

📊 Expected results on Pixel 9 Pro XL:

Mode	Speed
CPU-only (`-ngl 0 -t 8`)	~1-2 t/s
Vulkan GPU (`-ngl 99`)	~7-8 t/s ✅

> If GPU mode is 5-6x faster, Vulkan is working!

⚡ Optional: Boost GPU Frequency (Root Required)

Mali GPUs often run at conservative frequencies by default. Unlock performance:

# Check current GPU frequency
cat /sys/devices/platform/gpu0/devfreq/gpu0/cur_freq

# Set to performance mode (max frequency)
su -c "echo performance &gt; /sys/devices/platform/gpu0/devfreq/gpu0/governor"

# Verify change
cat /sys/devices/platform/gpu0/devfreq/gpu0/cur_freq

🚀 Expected improvement: 7-8 t/s → 12-15 t/s

> ⚠️ Warning: Higher frequency = more heat + battery drain. Reverts on reboot.

🛠️ Troubleshooting

❌ "Permission denied" when loading model

chmod 644 ~/models/your-model.gguf

❌ Vulkan not detected / still using CPU

Verify compilation: grep GGML_VULKAN build/CMakeCache.txt
Rebuild with -DGGML_VULKAN=ON
Ensure dependencies installed: pkg list-installed | grep vulkan

❌ Chat template warnings

Add --jinja flag to enable Jinja2 template engine.

❌ Slow performance (<5 t/s with -ngl 99)

Try reducing context: -c 512 instead of -c 2048
Enable KV cache quantization: -ctv q8_0
Boost GPU frequency (see above)

📦 Recommended Models for Testing

Model	Size	Expected Speed (Vulkan)
Qwen2.5-0.5B	~0.4 GB	20-30 t/s
Qwen2.5-1.5B	~1.1 GB	12-18 t/s
Gemma-2-2B	~2.0 GB	7-10 t/s
Llama-3-8B	~4.9 GB	4-7 t/s

> Smaller = faster. Start with Qwen2.5-0.5B for testing.

🔄 Why This Works Now (No Manual Driver Extraction)

Older guides required manually copying vulkan.mali.so and setting LD_LIBRARY_PATH. On modern setups, this is often unnecessary because:

✅ Android 14+ has better Vulkan ICD discovery
✅ Termux's vulkan-loader package auto-detects system drivers
✅ llama.cpp b9190+ has improved Android Vulkan backend

> ⚠️ If auto-detection fails on your device, fall back to manual driver extraction guides.

🎯 Final Working Command (Copy-Paste)

cd ~/llama.cpp
./build/bin/llama-cli \
  -m ~/models/gemma-2-2b-it-Q5_K_M.gguf \
  -p "Hello, write a short poem about Android" \
  -n 150 \
  -ngl 99 \
  -c 1024 \
  --jinja \
  --color auto \
  --verbosity 0

💬 Questions?

Drop a comment if you run into issues! Happy to help troubleshoot.

Device: Pixel 9 Pro XL
Android: 15
Termux: Latest from F-Droid
llama.cpp: b9190 (main branch)
GPU: Mali-G715 (Vulkan)

Last tested: May 2026 2026.5.17

u/Rickx005x — 5 days ago

▲ 1 r/termux

[Guide] Running llama.cpp with Vulkan GPU Acceleration on Termux + Pixel 9 Pro XL (Mali-G715)

🚀 llama.cpp + Vulkan GPU Acceleration on Termux (Pixel 9 Pro XL / Mali-G715)

After lots of trial and error, I finally got llama.cpp running with real GPU acceleration on my Pixel 9 Pro XL. Here's the minimal working setup.

> TL;DR: Modern Termux + llama.cpp b9190 + -ngl 99 = automatic Mali GPU acceleration. No manual driver extraction needed (for now).

✅ Prerequisites

Pixel 9 Pro XL (or any Android 14+/Mali-G715 device)
Termux installed from F-Droid or GitHub
Root access (optional, but helpful for GPU frequency tuning)
~4 GB free storage for model

🟢 Step 1: Install Dependencies

pkg update &amp;&amp; pkg upgrade -y
pkg install git cmake clang make vulkan-loader vulkan-tools vulkan-headers glslang -y

🟢 Step 2: Build llama.cpp with Vulkan

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp

# Clean build with Vulkan enabled
rm -rf build
cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build -j4

✅ Verify Vulkan was compiled:

grep "GGML_VULKAN:BOOL" build/CMakeCache.txt
# Should output: GGML_VULKAN:BOOL=ON

ls -lh build/bin/libggml-vulkan.so
# Should show the .so file exists

🟢 Step 3: Download a Model

mkdir -p ~/models
cd ~/models

# Example: Gemma 2 2B (Q5 quantized, ~2GB)
wget https://huggingface.co/google/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q5_K_M.gguf

> 💡 Tip: Use hf-mirror.com if Hugging Face is slow in your region.

🟢 Step 4: Run with GPU Acceleration

cd ~/llama.cpp

./build/bin/llama-cli \
  -m ~/models/gemma-2-2b-it-Q5_K_M.gguf \
  -p "Hello, introduce yourself in Chinese" \
  -n 100 \
  -ngl 99 \
  --jinja \
  --color auto

🔑 Key Parameters:

Flag	Meaning
`-ngl 99`	Offload all layers to GPU (Vulkan)
`--jinja`	Enable Jinja2 template engine (fixes chat format warnings)
`--color auto`	Color-coded output
`--verbosity 0`	Minimal logs (optional)

🔍 Verify GPU is Actually Working

Method 1: Check startup logs (verbose mode)

./build/bin/llama-cli -m ~/models/your-model.gguf -p "test" -n 10 -ngl 99 --verbosity 3 2&gt;&amp;1 | grep -i "vulkan\|mali\|device"

✅ Look for:using device Vulkan0 (Mali-G715) offloaded XX/XX layers to GPU

Method 2: Speed comparison (most reliable)

# GPU mode
./build/bin/llama-cli -m model.gguf -p "test" -n 50 -ngl 99 --verbosity 0

# CPU-only mode (for comparison)
./build/bin/llama-cli -m model.gguf -p "test" -n 50 -ngl 0 -t 8 --verbosity 0

📊 Expected results on Pixel 9 Pro XL:

Mode	Speed
CPU-only (`-ngl 0 -t 8`)	~1-2 t/s
Vulkan GPU (`-ngl 99`)	~7-8 t/s ✅

> If GPU mode is 5-6x faster, Vulkan is working!

⚡ Optional: Boost GPU Frequency (Root Required)

Mali GPUs often run at conservative frequencies by default. Unlock performance:

# Check current GPU frequency
cat /sys/devices/platform/gpu0/devfreq/gpu0/cur_freq

# Set to performance mode (max frequency)
su -c "echo performance &gt; /sys/devices/platform/gpu0/devfreq/gpu0/governor"

# Verify change
cat /sys/devices/platform/gpu0/devfreq/gpu0/cur_freq

🚀 Expected improvement: 7-8 t/s → 12-15 t/s

> ⚠️ Warning: Higher frequency = more heat + battery drain. Reverts on reboot.

🛠️ Troubleshooting

❌ "Permission denied" when loading model

chmod 644 ~/models/your-model.gguf

❌ Vulkan not detected / still using CPU

Verify compilation: grep GGML_VULKAN build/CMakeCache.txt
Rebuild with -DGGML_VULKAN=ON
Ensure dependencies installed: pkg list-installed | grep vulkan

❌ Chat template warnings

Add --jinja flag to enable Jinja2 template engine.

❌ Slow performance (<5 t/s with -ngl 99)

Try reducing context: -c 512 instead of -c 2048
Enable KV cache quantization: -ctv q8_0
Boost GPU frequency (see above)

📦 Recommended Models for Testing

Model	Size	Expected Speed (Vulkan)
Qwen2.5-0.5B	~0.4 GB	20-30 t/s
Qwen2.5-1.5B	~1.1 GB	12-18 t/s
Gemma-2-2B	~2.0 GB	7-10 t/s
Llama-3-8B	~4.9 GB	4-7 t/s

> Smaller = faster. Start with Qwen2.5-0.5B for testing.

🔄 Why This Works Now (No Manual Driver Extraction)

Older guides required manually copying vulkan.mali.so and setting LD_LIBRARY_PATH. On modern setups, this is often unnecessary because:

✅ Android 14+ has better Vulkan ICD discovery
✅ Termux's vulkan-loader package auto-detects system drivers
✅ llama.cpp b9190+ has improved Android Vulkan backend

> ⚠️ If auto-detection fails on your device, fall back to manual driver extraction guides.

🎯 Final Working Command (Copy-Paste)

cd ~/llama.cpp
./build/bin/llama-cli \
  -m ~/models/gemma-2-2b-it-Q5_K_M.gguf \
  -p "Hello, write a short poem about Android" \
  -n 150 \
  -ngl 99 \
  -c 1024 \
  --jinja \
  --color auto \
  --verbosity 0

💬 Questions?

Drop a comment if you run into issues! Happy to help troubleshoot.

Device: Pixel 9 Pro XL
Android: 15
Termux: Latest from F-Droid
llama.cpp: b9190 (main branch)
GPU: Mali-G715 (Vulkan)

Last tested: May 2026

u/Rickx005x — 5 days ago

▲ 2 r/LocalLLM

Does anyone know how to use NNAPI in llama.cpp?

Hello everyone, my phone has an NPU processor. I have an idea to use NNAPI, but I haven't seen any official instructions or documentation on how to use it. Does anyone know about it?

reddit.com

u/Rickx005x — 6 days ago

▲ 1 r/pixel+2 crossposts

I use a Pixel phone to set up Gemma and run it locally

Pixel 9 Pro XL + Tensor G4 + 16GB LPDDR5X + Mali-G715: The Power of Tri-Synergy!

This tutorial is designed for **Rooted Android Phones + Termux + Mali GPU (e.g., Pixel, Xiaomi, Samsung, etc.)**.

---

# 📱 Complete Guide: Termux + Root + Mali GPU Accelerated llama.cpp

## ⚠️ Prerequisites

**Phone must be rooted** (Magisk/KernelSU).
**Install Termux** (recommended to download the latest version from F-Droid or GitHub).
**Phone GPU must be Mali** (most MediaTek and Tensor chips use Mali; Qualcomm Snapdragon uses Adreno, which requires a different approach).

---

## 🟢 Phase One: Install Basic Environment & Compile llama.cpp

Run the following commands sequentially in Termux:

```bash

# 1. Update and install dependencies

pkg update && pkg upgrade -y

pkg install git cmake clang make vulkan-loader vulkan-tools -y

# 2. Clone llama.cpp

git clone https://github.com/ggml-org/llama.cpp

cd llama.cpp

# 3. Compile (enable Vulkan and ARM NEON optimizations)

mkdir build && cd build

cmake .. -DLLAMA_VULKAN=ON -DLLAMA_ARM_NEON=ON -DCMAKE_BUILD_TYPE=Release

make -j$(nproc)

```

> 💡 **Tip**: Compilation may take 10-20 minutes; please be patient.

---

## 🔵 Phase Two: Extract System GPU Drivers (Core Step)

**This is the most critical step!** We need to "extract" the system drivers for Termux to use.

### 1. Create Storage Directories

```bash

mkdir -p /data/data/com.termux/files/usr/lib/vulkan

mkdir -p /data/data/com.termux/files/usr/share/vulkan/icd.d

```

### 2. Locate Driver File Paths

Paths vary by device. Run the following commands to search:

```bash

# Search for Vulkan driver

su -c "find /vendor /system -name 'vulkan*.so' 2>/dev/null"

# Search for GLES driver

su -c "find /vendor /system -name 'libGLES_mali.so' 2>/dev/null"

# Search for system Vulkan loader (critical!)

su -c "find /apex /system -name 'libvulkan.so' 2>/dev/null"

```

**👇 Example paths found (for Pixel 9 / Tensor G4):**

* Vulkan ICD: `/vendor/lib64/hw/vulkan.mali.so`

* GLES Lib: `/vendor/lib64/egl/libGLES_mali.so`

* Config Lib: `/vendor/lib64/aconfig_gpu_flags_c_lib.so` (if present)

* System Loader: `/system/lib64/libvulkan.so`

### 3. Copy Drivers to Termux

**Please modify the commands below according to your actual paths found above!**

```bash

# Gain root access and copy files (adjust paths as needed)

su << 'EOF'

# 1. Copy Mali-specific driver

cp /vendor/lib64/hw/vulkan.mali.so /data/data/com.termux/files/usr/lib/vulkan/

cp /vendor/lib64/egl/libGLES_mali.so /data/data/com.termux/files/usr/lib/vulkan/

# 2. Copy potential dependency libraries (ignore if file doesn't exist)

cp /vendor/lib64/aconfig_gpu_flags_c_lib.so /data/data/com.termux/files/usr/lib/vulkan/ 2>/dev/null

# 3. [Critical] Copy system-level Vulkan Loader (resolves missing symbol issues)

cp /system/lib64/libvulkan.so /data/data/com.termux/files/usr/lib/

# 4. Copy C++ standard library

cp /vendor/lib64/libc++.so /data/data/com.termux/files/usr/lib/vulkan/

# 5. Set permissions

chmod 755 /data/data/com.termux/files/usr/lib/vulkan/*.so

chmod 755 /data/data/com.termux/files/usr/lib/libvulkan.so

exit

EOF

```

---

## 🟠 Phase Three: Configure Environment Variables

### 1. Create ICD JSON Configuration File

Tell the Vulkan loader where to find the driver.

```bash

su -c "cat > /data/data/com.termux/files/usr/share/vulkan/icd.d/mali_icd.json << 'JSONEOF'

{

\"file_format_version\": \"1.0.1\",

\"ICD\": {

\"library_path\": \"/data/data/com.termux/files/usr/lib/vulkan/vulkan.mali.so\",

\"api_version\": \"1.4.335\"

}

JSONEOF"

chmod 644 /data/data/com.termux/files/usr/share/vulkan/icd.d/mali_icd.json

```

### 2. Set Environment Variables (Permanent)

```bash

# Append to ~/.bashrc

cat >> ~/.bashrc << 'RCPEOF'

# --- Vulkan GPU Acceleration for llama.cpp ---

export LD_LIBRARY_PATH=/system/lib64:/vendor/lib64:/vendor/lib64/egl:/data/data/com.termux/files/usr/lib/vulkan:/data/data/com.termux/files/usr/lib

export VK_ICD_FILENAMES=/data/data/com.termux/files/usr/share/vulkan/icd.d/mali_icd.json

# ---------------------------------------------

RCPEOF

# Apply changes immediately

source ~/.bashrc

```

---

## 🟣 Phase Four: Verification & Testing

### 1. Verify Vulkan Recognizes the GPU

```bash

vulkaninfo | grep -E "deviceName|deviceType"

```

✅ **Success indicator**: You should see `Mali-G715` or `Immortalis` etc., NOT `llvmpipe`.

### 2. Download a Model (e.g., Gemma-4)

Have a friend transfer the model to your phone, e.g., place it in `/storage/emulated/0/Download/`.

### 3. Run Test

```bash

cd ~/llama.cpp/build/bin

./llama-cli \

-m /storage/emulated/0/Download/Gemma-4-E2B--Q5_K_P.gguf \

--temp 1.0 -t 4 -c 4096 --no-mmap -n 20 --verbose 2>&1 | grep -E "Vulkan|GPU|device"

```

✅ **Success indicators**:

* You see `using device Vulkan0 (Mali-...)`

* You see `offloaded 36/36 layers to GPU`

---

## 🔴 Frequently Asked Questions (FAQ)

**Q1: What if my phone uses Qualcomm Snapdragon (Adreno GPU)?**

A: The process is the same, but the driver filename differs.

* Search for drivers: `su -c "find /vendor -name '*adreno*' -o -name '*kgsl*'"`

* The driver is typically named `vulkan.adreno.so`.

**Q2: Getting error `dlopen failed: library xxx not found`?**

A: This indicates missing dependency libraries.

* Use `readelf -d /path/to/vulkan.so | grep NEEDED` to identify missing libraries.

* Locate the corresponding `.so` files in `/vendor/lib64` or `/system/lib64` and copy them over.

**Q3: No speed improvement, still running on CPU?**

A: Check the `llama-cli` output.

* If you see `assigned to device CPU`, the environment variables may be misconfigured.

* Ensure you executed `source ~/.bashrc` before running.

* Verify thatvulkaninfocan detect your GPU.

This proves that:

- ✅ **GPU Model**: Mali-G715 successfully recognized

- ✅ **VRAM Usage**: 15456 MiB (~15 GB) - Indicates the model is fully loaded onto the GPU

- ✅ **Model Footprint**: 1641 MiB - Model weights residing in VRAM

- ✅ **Compute Buffer**: 545 MiB - GPU computation buffers

---

## 📊 Final Scorecard

|--------|----------|----------|-------------|

| **Generation Speed** | 4.1 t/s | **7.3 t/s** | **+78%** 🚀 |

| **Processing Speed** | 8.4 t/s | **10.7 t/s** | **+27%** |

| **Device** | CPU | **Mali-G715** | ✅ |

---

## 🎁 Now You Can:

### 1. **Create a Quick-Launch Script**

```bash

cat > ~/run-gemma.sh << 'EOF'

#!/data/data/com.termux/files/usr/bin/bash

cd ~/llama.cpp/build/bin

./llama-cli \

-m /storage/emulated/0/Download/Gemma-4-E2B--Q5_K_P.gguf \

--simple-io --jinja \

--temp 1.0 --top-p 0.95 --top-k 64 \

-t 4 -c 4096 --no-mmap \

--interactive-first

EOF

chmod +x ~/run-gemma.sh

```

From now on, simply run `~/run-gemma.sh` to launch!

### 2. **Try Larger Models**

Since your GPU has 15GB of VRAM, you can experiment with:

- **Llama-3-8B** (Q4_K_M quantized)

- **Qwen2.5-7B** (Q5_K_M quantized)

- **Mistral-7B** (Q6_K quantized)

> 💡 **Pro Tip**: Larger models with higher quantization levels (like Q6_K) offer better quality but require more VRAM. Monitor your memory usage with `vulkaninfo` or Android's developer options to avoid out-of-memory crashes.

u/Rickx005x — 8 days ago

▲ 1 r/help

App login not working after 2FA enabled (u/Rickx005x)

Hello Reddit Support Team,

My account is u/Rickx005x. I have successfully enabled 2FA and can log in perfectly via the web (reddit.com) using my username, password, and 6-digit 2FA code.

But on the official Android Reddit app, I cannot log in at all:

It either gets stuck on a black loading screen,
Or forces me to use an email magic link instead of the 2FA flow,
And when I try to enter my username/password, it says "invalid credentials" even though they work on web.

I've already:

Reinstalled the Reddit app to the latest version
Used a stable US VPN
Tried logging in with both username and email

Please help me fix this app login restriction. Thank you!

Best regards, Rickx005x

reddit.com

u/Rickx005x — 10 days ago

▲ 1 r/Pixel9Pro+1 crossposts

Hey everyone,

I’m currently on my Pixel 9 Pro XL and I've been intentionally blocking the April update (CP1A.260405.005) because of the widespread reports regarding eSIM functionality failure. My cellular connection is my lifeline, so I couldn't risk the "No SIM" bug.

Now that the May patch is rolling out, has anyone who previously had eSIM issues confirmed if this new build fixes it? I’m hesitant to hit that update button until I know for sure the modem firmware is stable this time.

Any feedback from those who have already updated would be much appreciated!

I am seeking confirmation regarding the modem/eSIM stability in the latest May update. I skipped the previous April build after encountering/reading about the critical eSIM bug that caused the device to fail to recognize digital SIMs.

Device: Pixel 9 Pro XL

Previous Issue: eSIM failing to activate/connect.

Does the May update include a specific fix for the radio/modem firmware related to eSIM? If you were affected by the previous bug, please let me know if your service was restored after this update. Thanks!

reddit.com

u/Rickx005x — 17 days ago