u/Best_Debt5223

I'm pretty new to this space and am really just trying to learn by doing.

I have a laptop (Dell G15) below are my specs

Device name DESKTOP-VJQ24FC

Processor 12th Gen Intel(R) Core(TM) i5-12500H (2.50 GHz)

Installed RAM 8.00 GB (7.69 GB usable)

Graphics card NVIDIA GeForce RTX 3050 Laptop GPU (4 GB)

Intel(R) UHD Graphics (128 MB)

Storage 430 GB of 477 GB used

Device ID 042E6B54-4BC4-488E-8571-5C094F916860

Product ID 00356-24680-94609-AAOEM

System type 64-bit operating system, x64-based processor

Pen and touch No pen or touch input is available for this display

I know it's not a lot, and I'm not expecting to do anything complex either. What I really wanted was to run some models, play around with APIs, and just get some good understanding of how this space work.

Here's what I've tried so far on ollama

qwen3.5:0.8b 1.0 GB
smollm:360m 229 MB
qwen3.5:2b 2.7 GB
gemma4:e2b 7.2 GB
gemma4:e4b 9.6 GB
gemma4:latest 9.6 GB

Other than smollm, nothing really worked and smoll too ended up in a few loops.

My questions

My mental model so far was, if a model is 1GB (like qwen 3.5:0.8b) it would be reasonable to expect it to run if I have 8Gigs of ram and 4gigs of VRAM (assuming there's more than a gig free). However, that failed and I probably got this wrong. How does it really work and how exactly do I estimate if a model will run on my hardware or no.
How are models really loaded? I mean are they loaded to the RAM and then the VRAM? I've sometimes got messages saying memory is low "needed 3GB have 1GB" which reflects my RAM utilization, and then sometimes I get errors saying "needed 4GB have 3.8GB" which aligns with VRAM utilization.
I'm considering upgrading my RAM, but not sure if that's going to make much of a difference as I can't really upgrade my VRAM. A little low on financials, so wanna ensure I'm making a sensible decision if I decide to upgrade my RAM
I've seen some negative opinions around ollama, so thinking of trying vllm. Is ollama partially the culprit with my challenges

Sorry for loading in a bunch of questions, just wanted to ensure I'm covering all my thoughts around how hardware and models really interact with each other.

How does one estimate hardware requirements for a model