u/GsxrGuy80s

Hey r/droidappshowcase — I’ve been building Pocket Node, an Android app that turns your phone into a local AI inference node.

It runs GGUF models directly on-device, so prompts stay local instead of being sent to a cloud API.

RC2 is live here:
https://github.com/Zero-Cloud-Tax/pocket-node-releases/releases/tag/v0.1.0-rc2

This is a free public prerelease APK. Public Pro key issuance/payment is not open yet.

What works in RC2:

Local GGUF model loading/inference on Android
Model Hub for downloading/importing models
Built-in Operator model download, about 1.68 GB
Device profiles for Fold 6, Snapdragon 8 Gen 3, and Tensor G3
OpenAI-compatible Edge API path included in the app
Streaming/non-streaming API compatibility fixes
top_p / top_k request passthrough

Requirements:

Android 9.0+ / API 28+
arm64-v8a device
Enough storage for GGUF model files
Wi-Fi recommended for model downloads

Tested device:

Samsung Galaxy Z Fold 6

Important RC2 notes:

This is a prerelease, so expect rough edges.
RC2 is free to test.
Some Pro/licensing paths exist in the app, but public Pro key issuance/payment is not open yet.
The Edge API has no authentication in RC2. Use it only on a trusted LAN. Do not expose it to the internet.
Auto-updater is disabled for RC2.
Vision/RAG are not implemented yet.
Community GGUF models may show as “Unverified” if their hash is not in the local registry.

SHA-256 — verify before installing:

f1fe2887dd9f7ab0f9bd62021857bae0efdcb98c090b489a626b960154964126

I’d love to hear what people think of the app idea, especially from anyone interested in local AI, offline-first Android apps, or using old/new phones as small edge compute nodes.

Built a small Android app called Pocket Node that runs llama.cpp inference

on-device. Here's what it actually does and what it doesn't.

**What it does**

* Loads a GGUF model (SmolLM3 Q4_0, ~1.1B params) directly on the Fold6

* Uses the Vulkan/OpenCL backend via llama.cpp — not CPU-only

* Streams tokens to a native Jetpack Compose UI

* Handles Stop during prefill, not just decode: tapping Stop during the

prefill phase sets the native abort flag, cancels the JNI call, resets

the UI, and lets you send a follow-up prompt normally

* SHA-256 verifies the model file against a local registry on first load;

if the hash doesn't match, inference is blocked and the UI shows a

recovery path (Rescan / Re-import / Choose another)

* Reports model state and health to a homelab monitoring stack so I can

see at a glance whether the phone is up and inference is ready

**The stack**

* App: Kotlin + Jetpack Compose, llama.cpp via JNI, Vulkan/OpenCL backend

* Model: SmolLM3 Q4_0 (1.1B) — SHA-256 verified on load

* Homelab side: Python monitoring service polls the phone's health endpoint

and includes it in a daily digest alongside the other nodes

* The phone exposes an OpenAI-compatible API on Tailscale — direct calls

work; it's not registered in the LiteLLM routing layer yet, so automatic

routing doesn't apply. That's the next config step.

* Debug build, Android 16

**What it doesn't do**

* Not a replacement for a desktop GPU or a Mac Studio. SmolLM3 at Q4_0

on a phone handles short tasks but context is limited and longer prompts

are slow.

* No persistent memory or RAG. Each conversation is independent.

* Battery and thermal: short runs are fine. Sustained generation heats the

device. Don't leave it in a benchmark loop.

* Not tested on other Android hardware. Vulkan driver quality varies by

device. I can't say it works on your phone.

* Not a public server. The API is Tailscale-gated, LAN only.

**Why bother**

For short tasks — quick classification, a local chat response that doesn't

need to leave the device — it works. The goal isn't to match a frontier

model on a phone. It's zero cloud cost for the tasks that don't need cloud.

The verification step mattered more than I expected. Knowing the model file

matches a known-good SHA-256 before running it is the kind of thing you

want when you're running a model you downloaded months ago.

**Screenshots in gallery:** chat UI with inference status, diagnostics, stop-in-progress state, P20 health digest.

Happy to answer questions about the llama.cpp JNI layer, the stop/prefill

handling, or the homelab monitoring side.

---

*Clarification pre-emptively: "Vulkan/OpenCL" means the backend llama.cpp

selects on this device. I'm not doing anything custom on the GPU side beyond

what llama.cpp exposes.*

Pocket Node: Local AI on Android

Galaxy Z Fold6 as a local inference node — llama.cpp/Vulkan, homelab telemetry, SHA-256 model verification