u/Glittering_Wait_3552

Hi everyone, finally I could install llama.cpp it was really difficult principally due to CUDA with my NVIDIA GTX 1060 Max-Q (6 GB VRAM) Pascal architecture. I am not techie, so it might be easy, but for me it was pretty difficult. But I cannot obtain those nice results I see some people obtained. Could you help me a bit please? PD: It is a bit weird, but I obtain better results in LM Studio. In fact I want to use the LLM for Agentic uses (it is evident I am doing something wrong). It is extrange, but in llama.cpp at the beggining it was 6 t/s but over time it gradually increased up to 9,6 t/s. Thank you in advance for your help!!!

I have a laptop Dell G5 15 5587
* **CPU:** Intel Core i7-8750H

* 6 cores / 12 threads

* Base frequency: 2.2 GHz

* Turbo: up to 4.1 GHz

* **GPU:** NVIDIA GTX 1060 Max-Q (6 GB VRAM) Pascal architecture

* **RAM:** 2 x 8 ddr4 =16 GB

* **Storage:**

* **Disk C SSD 239 GB NVMe PC SN520 NVMe WD**
* **Disk D SSD 466 GB CT500BX500SSD1**

This is the config:

D:\IA\llama.cpp\build\bin\Release\llama-server.exe ^
-m D:\IA\models\Qwen3.6-35B-A3B-UD-IQ3_S.gguf ^
-c 45000

--n-gpu-layers 999

-- n-cpu-moe 29

--prio 3

--prio-batch 3

--poll 100

--poll-batch 1

-Cr 0-6

-Crb 0-6

--cpu-strict 1

--cpu-strict-batch 1

--reasoning on

-fa on

-t 6

-tb 6

-np 1

--no-mmap

--mlock

\-b 1024 -ub 512 \\

\--cache-type-k q4\_0 \\

\--cache-type-v q4\_0 \\

\--flash-attn on \\

\--cont-batching \\

\--threads 6 --threads-batch 6 \\

\--jinja \\

\--reasoning auto \\

\--ctx-checkpoints 10 \\

\--top-k 64 --top-p 0.75 \\

\--temp 0.7 \\

\--repeat-penalty 1.0 \\

\--cache-prompt

https://preview.redd.it/7nmmcrd0tw0h1.png?width=1920&format=png&auto=webp&s=549456aaac795a1b41ea747b821e5d561b520d25

https://preview.redd.it/in1rhy60pw0h1.png?width=1920&format=png&auto=webp&s=0ac15b95efe268c547928e0e7fc5be1785b9effa

https://preview.redd.it/p4k8ocx0pw0h1.png?width=1920&format=png&auto=webp&s=d43be91ae22af2a49edf91bba970cf72b0426458

https://preview.redd.it/ed10lfb4pw0h1.png?width=1920&format=png&auto=webp&s=f5e0eca03daea8c7f681cadf2e3d798e8c1f9579

https://preview.redd.it/adcb3so3rw0h1.png?width=1920&format=png&auto=webp&s=5551e0da69e581310745e7ab695be07b0bb016ef

https://preview.redd.it/mte0we4brw0h1.png?width=1920&format=png&auto=webp&s=095e9a76d2b66424de60a6ef6206eed748194912

And I have another question, I would like to buy a PC/MAC/MINI PC/MAC MINI/ETC. to run only AI for agentic uses, but totally local LLMs. What would be your suggestion nowadays investing from 2500 to 5500 USD options. I'm from Colombia, it would be between 10,000,000 and 20,000,000 COP PD: I do not have the money, but I need to show the evidence (ROI) of the chosen alternative.

Thank you all in advance!!!

Me gustaría comprar una /MINI PC/MAC MINI, para ejecutar IA exclusivamente en aplicaciones de agentes, pero con LLMs locales. ¿Me podrían ayudar? Muchas gracias.

Optimization Qwen3.6-35B-A3B in Dell G5 15 5587 RAM: 2 x 8 ddr4 =16 GB /GPU: NVIDIA GTX 1060 Max-Q (6 GB VRAM) Pascal architecture