u/Outrageous-Nobody-87

vLLM vs llama.cpp vs olama
▲ 16 r/Vllm

vLLM vs llama.cpp vs olama

I thought I would share some benchmarking results I made with gpt-oss:20b and gemma-4-26b-qat AI models. I'm using very budget setup (2 x rtx 5060 Ti 16GB).

Full article: Benchmarking AI Models | personal wiki

gpt-oss:20b

Edit: decided to repeat gpt-oss:20b test (on new hardware). Also, added sglang for comparison.

https://preview.redd.it/y0af28s4o28h1.png?width=1487&format=png&auto=webp&s=33ca2cf1456f924e05c88f363ab9f20283b12cd8

gemma-4-26b-qat

https://preview.redd.it/lr5r2t2ywg7h1.png?width=1252&format=png&auto=webp&s=c219444abdee9512484a805bac4e6a2784864a01

reddit.com
u/Outrageous-Nobody-87 — 13 days ago