u/Gas-Ornery — reddlx

Intersting part in comparing (rcom/vulkan/HIP) perfs on differents model

https://preview.redd.it/9gqydh0lea6h1.png?width=1905&format=png&auto=webp&s=a38bd4ff6f709545bcd8fbf3e309507d9b1c56b8

7900 XT 20 GB
results are really differents, it seems that some models are optimized on a specific version ?
full results : ## Results

|-------|-------|---------|---------|------|--------|----------|----------|

| Mellum2-12B-A2.5B-Thinking | Q8_0 | llama-b9553-bin-win-vulkan-x64 | Vulkan | 12.03 GiB | 12.15 B | 4736.46 ± 170.68 | 206.71 ± 0.15 |

| Mellum2-12B-A2.5B-Thinking | Q8_0 | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 12.03 GiB | 12.15 B | 3815.50 ± 148.24 | 150.44 ± 0.84 |

| gemma-4-12b-it-qat | q4_0 | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 6.48 GiB | 11.91 B | 1724.07 ± 8.31 | 69.10 ± 0.33 |

| gemma-4-12b-it-qat | q4_0 | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 6.48 GiB | 11.91 B | 1703.76 ± 79.93 | 63.78 ± 3.92 |

| gemma-4-12b-it | UD-Q4_K_XL | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 6.85 GiB | 11.91 B | 1589.32 ± 13.43 | 54.72 ± 0.08 |

| gemma-4-12b-it-qat | q4_0 | llama-b9553-bin-win-vulkan-x64 | Vulkan | 6.48 GiB | 11.91 B | 1581.32 ± 20.40 | 74.00 ± 0.07 |

| gemma-4-12B-it | Q4_K_M | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 6.86 GiB | 11.91 B | 1577.04 ± 13.19 | 54.41 ± 0.10 |

| gemma-4-12b-it | UD-Q4_K_XL | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 6.85 GiB | 11.91 B | 1559.10 ± 51.10 | 55.12 ± 1.49 |

| gemma-4-12B-it | Q4_K_M | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 6.86 GiB | 11.91 B | 1542.48 ± 69.48 | 53.37 ± 1.25 |

| gemma-4-12B-it | Q4_K_M | llama-b9553-bin-win-vulkan-x64 | Vulkan | 6.86 GiB | 11.91 B | 1316.44 ± 11.77 | 68.02 ± 0.03 |

| gemma-4-12b-it | UD-Q4_K_XL | llama-b9553-bin-win-vulkan-x64 | Vulkan | 6.85 GiB | 11.91 B | 1314.98 ± 13.56 | 67.67 ± 0.10 |

| Qwen3.6-27B | IQ4_XS | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 14.62 GiB | 27.32 B | 810.95 ± 10.25 | 35.82 ± 0.04 |

| Qwen3.6-27B | IQ4_XS | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 14.62 GiB | 27.32 B | 807.37 ± 41.61 | 34.19 ± 0.19 |

| Qwen3.6-27B | Q4_K_M | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 15.92 GiB | 27.32 B | 745.51 ± 2.85 | 25.44 ± 0.04 |

| Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive | IQ4_XS | llama-b9553-bin-win-vulkan-x64 | Vulkan | 17.43 GiB | 34.66 B | 730.58 ± 31.93 | 51.68 ± 0.66 |

| Qwen3.6-27B | Q4_K_M | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 15.92 GiB | 27.32 B | 724.65 ± 6.40 | 25.26 ± 0.20 |

| Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive | IQ4_XS | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 17.43 GiB | 34.66 B | 614.89 ± 2.27 | 77.96 ± 0.10 |

| Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive | IQ4_XS | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 17.43 GiB | 34.66 B | 446.64 ± 7.20 | 65.67 ± 0.41 |

| Qwen3.6-27B | UD-Q4_K_XL | llama-b1270-windows-rocm-gfx110X-x64 | ROCm | 16.67 GiB | 27.32 B | 207.42 ± 6.39 | 24.46 ± 0.30 |

| Qwen3.6-27B | UD-Q4_K_XL | llama-b9553-bin-win-hip-radeon-x64 | ROCm | 16.67 GiB | 27.32 B | 201.31 ± 4.00 | 24.97 ± 0.03 |

| Qwen3.6-27B | IQ4_XS | llama-b9553-bin-win-vulkan-x64 | Vulkan | 14.62 GiB | 27.32 B | 177.47 ± 1.86 | 16.08 ± 0.08 |

| Qwen3.6-27B | Q4_K_M | llama-b9553-bin-win-vulkan-x64 | Vulkan | 15.92 GiB | 27.32 B | 79.56 ± 0.52 | 7.51 ± 0.02 |

| Qwen3.6-27B | UD-Q4_K_XL | llama-b9553-bin-win-vulkan-x64 | Vulkan | 16.67 GiB | 27.32 B | 69.99 ± 0.17 | 6.17 ± 0.03 |

reddit.com

u/Gas-Ornery — 1 day ago

▲ 28 r/ROCm

I made a Windows GUI to manage, benchmark and compare multiple llama.cpp builds — handy for AMD GPU users

https://preview.redd.it/1u1y97m5p96h1.png?width=2560&format=png&auto=webp&s=83bea3be57582b26587e1aee9a4f1bd98103de90

I have an AMD GPU and testing different llama.cpp builds (Vulkan, ROCm, HIP) across models and parameters was a mess. So I built LlamaPilot — a lightweight WPF app that lets you:

Switch between multiple llama.cpp builds and models via dropdowns
Configure all server parameters in a GUI (ngl, ctx-size, flash-attn, cache, sampling, speculative decoding…)
Save/load profiles so you don't reconfigure every time
Paste an existing command to auto-fill all fields
Benchmark all model × build combos and get a sorted Markdown results table

C# / .NET 8 / Windows. Dark theme, live console, one-click start/stop.

GitHub: https://github.com/Hamrounmh/llamapilot

Feedback welcome!

Here are my best results with different versions of LLAMACPP :

https://preview.redd.it/0cp3craqv96h1.png?width=1905&format=png&auto=webp&s=c430e621d061969eb3b7701dee6273a8030a3113

reddit.com

u/Gas-Ornery — 1 day ago

▲ 1 r/LocalLLM

I made a Windows GUI to manage, benchmark and compare multiple llama.cpp builds — handy for AMD GPU users

I have an AMD GPU and testing different llama.cpp builds (Vulkan, ROCm, HIP) across models and parameters was a mess. So I built LlamaPilot — a lightweight WPF app that lets you:

Switch between multiple llama.cpp builds and models via dropdowns
Configure all server parameters in a GUI (ngl, ctx-size, flash-attn, cache, sampling, speculative decoding…)
Save/load profiles so you don't reconfigure every time
Paste an existing command to auto-fill all fields
Benchmark all model × build combos and get a sorted Markdown results table

C# / .NET 8 / Windows. Dark theme, live console, one-click start/stop.

GitHub: https://github.com/Hamrounmh/llamapilot

Feedback welcome!

https://preview.redd.it/q4k43enqo96h1.png?width=2560&format=png&auto=webp&s=c8920bacfd8f373b8db71ed822b239ff1138e758

reddit.com

u/Gas-Ornery — 1 day ago

▲ 3 r/ollama

I built a Windows GUI launcher to benchmark and manage multiple llama.cpp builds (useful for AMD GPU users juggling Vulkan/ROCm/HIP builds)

https://preview.redd.it/u834au5n9a6h1.png?width=1802&format=png&auto=webp&s=bd69c7b5f6b86a19534e54c92bc356b9d6256683

I have an AMD GPU and testing different llama.cpp builds (Vulkan, ROCm, HIP) across models and parameters was a mess. So I built LlamaPilot — a lightweight WPF app that lets you:

Switch between multiple llama.cpp builds and models via dropdowns
Configure all server parameters in a GUI (ngl, ctx-size, flash-attn, cache, sampling, speculative decoding…)
Save/load profiles so you don't reconfigure every time
Paste an existing command to auto-fill all fields
Benchmark all model × build combos and get a sorted Markdown results table

C# / .NET 8 / Windows. Dark theme, live console, one-click start/stop.

GitHub: https://github.com/Hamrounmh/llamapilot

Feedback welcome!

https://preview.redd.it/g1fks71ln96h1.png?width=2560&format=png&auto=webp&s=afe803f45b854f10d6f4b0dc7dc45135c2089a23

reddit.com

u/Gas-Ornery — 1 day ago