
▲ 16 r/Vllm
vLLM vs llama.cpp vs olama
I thought I would share some benchmarking results I made with gpt-oss:20b and gemma-4-26b-qat AI models. I'm using very budget setup (2 x rtx 5060 Ti 16GB).
Full article: Benchmarking AI Models | personal wiki
gpt-oss:20b
Edit: decided to repeat gpt-oss:20b test (on new hardware). Also, added sglang for comparison.
gemma-4-26b-qat
u/Outrageous-Nobody-87 — 13 days ago