u/MarcusAurelius68

Anyone else not watching episodes right away in S5?

For S2-S4 I’d eagerly await the release of a new episode and watch it that night.

With Ed’s passing, and Margo out of the picture, I’ve found for the last few episodes I’m no longer in a rush to watch the new episode. I watched the last one last night, 4 days after it was released.

Anyone else doing the same? I WILL watch it, I am enjoying it, but it’s no longer “must see TV” for me.

reddit.com
u/MarcusAurelius68 — 2 days ago

LM Studio / Windows / Vulkan possible to prioritize GPU order?

With CUDA you can prioritize GPU usage which worked well with a 3090ti and 3060 12GB. Under 24GB, fastest, under 36GB, slower, >36GB moving some layers to CPU so slowest.

I just added a R9700 so while my GPU VRAM has increased greatly to 68GB I need to use Vulkan as I’m mixing green and red. The only option showing is to distribute layers across cards so now everything is a bit slower. It does work, however.

Aside from upgrading the 3060 to increase the GPU with slowest speed, is there a way to prioritize GPUs in Vulkan?

reddit.com
u/MarcusAurelius68 — 11 days ago

3 GPUs in an ASUS PRIME B550-PLUS AC-HES?

Building out a local LLM server for AI and have no problems with 2 GPUs but can’t get 3 to boot.

I don’t care about x1 vs x4 vs x16 - and the 3 cards I’m trying physically fit (3090ti, 3060, R9700) as long as I put the 3090ti in the bottom slot (blocks other slots otherwise).

Power supply is a Corsair 1000W so that shouldn’t be an issue.

Suggestions? Thoughts?

reddit.com
u/MarcusAurelius68 — 12 days ago

Right now in the 2nd slot I have a 3060 12GB, giving me 36GB of VRAM at an acceptable speed. My system ram is 128GB so I have plenty of headroom for slow hybrid work. I have the 3090ti in the x16 slot, which covers up all but a x16/x1 slot for the 2nd GPU.

If I wanted to change out the 3060 (I can repurpose it elsewhere) I can think of a few scenarios:

  1. another 3090/3090ti. Advantage is it’s well-supported, disadvantage is $1000+ for a card that could have been worked hard for years.

  2. a RTX Pro 4000. Advantage is its new, another NVIDIA card, disadvantage is $1600 for 24GB. I could move the 3090ti to the bottom slot which might free up a 3rd slot for later as 4000 is 2 slots in size instead of 3.

  3. a R9700 with 32GB, I can get one for $1200. Can I mix and match with the 3090ti easily in LM Studio?

  4. an Arc Pro B60 with 24GB for $600. Can I mix and match with the 3090ti easily in LM Studio?

  5. just keep what I have and overflow to system RAM.

Thanks…

reddit.com
u/MarcusAurelius68 — 14 days ago

I’ve read a bit about how Q6 can be slightly better for coding, but how about for creative writing and research?

I just added a 3060 to my 3090ti and get around 70t/s in LM Studio with Q6 and a reasonable context size (128K). If I go any bigger it offloads some to CPU and performance plummets obviously.

Apologies for the newbie question but for creative writing what does Q6 give me vs Q4 for my purposes? Are there other models and quantization levels I should consider to fit into 36GB VRAM?

I’m upgrading system RAM to 128GB tomorrow, so are there bigger models (with batch performance, not interactive) that I should consider to fit into a total of 164GB?

I’m thinking of having 3 scenarios:

  1. 27B or 35B Q4_K_M that fits into the 3090ti 24GB VRAM for maximum token rate

  2. the best model that will fit into 36GB VRAM

  3. a slow best model that fits into the combined 164GB

Thanks for any suggestions here.

reddit.com
u/MarcusAurelius68 — 15 days ago

OK, I've got my new research and writing assistant LLM Server "Bob" up and running, running Qwen-3.6-35B-A3B with Q4_K_M as a test on LM Studio. 3090ti, 5800XT, 48GB memory, 2TB SSD and a GT720 as a display adapter that I had laying around in a box. I'll be at 128GB memory within a week after I move things around. Results so far are encouraging and I added Blaze web search.

I just grabbed a cheap 3060 with 12GB (as in really cheap), which I can use in a couple of ways, either in this system or elsewhere.

If I use it in this system, I have enough PSU to drive it, and that would give me 36GB slower VRAM. I'd take out the GT720.

In my initial test there's not a ton of VRAM left for context, so would I be able to prioritize the 3090ti and then use the 3060 as overflow? I see I can set the allocation strategy to prioritize the 3090ti.

For smaller models it will be faster and just use the 3090ti, and for ones that overflow it will likely be what, 3x slower but will run if it fits into 48GB? Any suggested models or approaches to try? Maybe keep Qwen but increase to Q5? Anything else?

I realize I'm a newbie here so appreciate any feedback or thoughts.

reddit.com
u/MarcusAurelius68 — 20 days ago

The character seems like she’s always stoned on Xanax or something. Very flat affect.

It could be due to underlying issues that brought her to Mars and away from Earth, or just being personally jaded, but there’s no emotion.

reddit.com
u/MarcusAurelius68 — 20 days ago

I realize there are many good restaurants (I used to live in NYC) but many solid touristy Italian and Chinese places are loud.

I’m going to be traveling with my son and am looking for quieter places that are still amazing, for end of May. We will be staying in the Murray Hill area.

Thanks in advance for any suggestions.

reddit.com
u/MarcusAurelius68 — 23 days ago