u/Ju8Code

ASUS ZenBook A16 Snapdragon X2 llama.cpp guff ToPs
▲ 0 r/ASUS

ASUS ZenBook A16 Snapdragon X2 llama.cpp guff ToPs

Hey everyone,

i'm just wondering what is the speed by ASUS ZenBook A16 with llama.cpp gguf t/s for

Gemma4-31B-it-GGUF: https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/resolve/main/gemma-4-31B-it-UD-Q4_K_XL.gguf?download=true

and

Qwen3.6-35B-A3B-MTP-GGUF: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-MTP-GGUF?show_file_info=Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf

maybe someone can start the llama.cpp (https://github.com/ggml-org/llama.cpp) using this parameters:

Gemma4: llama-server -m gemma-4-31B-it-UD-Q4_K_XL.gguf -ngl 999 -fa on -t 8 -c 128000 -np 1 --no-mmap --reasoning on --reasoning-budget 16768 --mlock -ctk iq4_nl -ctv iq4_nl --temp 1 --top_p 0.95 --top_k 64 --min_p 0.0 --presence-penalty 0.0 --repeat-penalty 1.0 --port 8181

Qwen3.6: llama-server -m Qwen3.6-35B-A3B-MTP-GGUF -ngl 999 -fa on -t 8 -c 128000 -b 2048 -ub 1024 -np 1 --no-mmap --reasoning on --reasoning-budget 16768 --mlock --port 8181 -ctk iq4_nl -ctv iq4_nl --temp 0.6 --top_p 0.95 --top_k 20 --min_p 0.0 --presence-penalty 0.0 --repeat-penalty 1.0

Then, just open http://127.0.0.1:8181/ and use the same short prompt for both—for example, 'Do you know why --n-gpu-layers 999 --n-cpu-more 21 doesn't work in Llama?'—and then paste the token count, the time it took, and the speed.

For example:

https://preview.redd.it/e1jo8szegc1h1.png?width=205&format=png&auto=webp&s=fc1a44ea1a0f1e43484831a8fb8c7445004943da

Thanks for your help! I'm collecting information to find out which iGPU is best for local AI in real-world tests. It's kinda hard to rely on content creators, a.k.a. YouTubers, since you never know who paid them :) I'll do a post later, when all data is collected.

reddit.com
u/Ju8Code — 7 days ago

Galaxy Book 6 Ultra B390 llama.cpp .guff ToPs

Hey everyone,

i'm just wondering what is the speed by Galaxy Book 6 Ultra B390 with llama.cpp gguf t/s for

Gemma4-31B-it-GGUF: https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/resolve/main/gemma-4-31B-it-UD-Q4_K_XL.gguf?download=true

and

Qwen3.6-35B-A3B-MTP-GGUF: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-MTP-GGUF?show_file_info=Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf

maybe someone can start the llama.cpp (https://github.com/ggml-org/llama.cpp) using this parameters:

Gemma4: llama-server -m gemma-4-31B-it-UD-Q4_K_XL.gguf -ngl 999 -fa on -t 8 -c 128000 -np 1 --no-mmap --reasoning on --reasoning-budget 16768 --mlock -ctk iq4_nl -ctv iq4_nl --temp 1 --top_p 0.95 --top_k 64 --min_p 0.0 --presence-penalty 0.0 --repeat-penalty 1.0 --port 8181

Qwen3.6: llama-server -m Qwen3.6-35B-A3B-MTP-GGUF -ngl 999 -fa on -t 8 -c 128000 -b 2048 -ub 1024 -np 1 --no-mmap --reasoning on --reasoning-budget 16768 --mlock --port 8181 -ctk iq4_nl -ctv iq4_nl --temp 0.6 --top_p 0.95 --top_k 20 --min_p 0.0 --presence-penalty 0.0 --repeat-penalty 1.0

Then, just open http://127.0.0.1:8181/ and use the same short prompt for both—for example, 'Do you know why --n-gpu-layers 999 --n-cpu-more 21 doesn't work in Llama?'—and then paste the token count, the time it took, and the speed.

For example:

https://preview.redd.it/e1jo8szegc1h1.png?width=205&format=png&auto=webp&s=fc1a44ea1a0f1e43484831a8fb8c7445004943da

Thanks for your help! I'm collecting information to find out which iGPU is best for local AI in real-world tests. It's kinda hard to rely on content creators, a.k.a. YouTubers, since you never know who paid them :) I'll do a post later, when all data is collected.

reddit.com
u/Ju8Code — 7 days ago