u/cafedude

Anyone tried StepFun 3.5-flash on Strix Halo?

I tried a Q4 quant of StepFun 3.5-flash, it started out using 107GB (with 150K context), but with each prompt the memory use grew until it hit 120GB and soon after was OOM. Has anyone run this model longer than about 20 minutes and if so what llama.cpp settings did you use?

reddit.com
u/cafedude — 6 days ago

License renewal - why is this so difficult?

I got my FRN from CORES then went back to ULS and the system was down. Tried again the next day and it seemed to be broken in a different way. Now tried this morning and I click on File Online link (here: https://www.fcc.gov/wireless/universal-licensing-system ) and I get Access Denied. Is their system borked or is there some hoop I haven't jumped through correctly?

Thing is, about a year ago I jumped through all the hoops and got to some kind of payment page and was pretty sure I paid, but apparently it never went through so here I am again (license expired last June so I guess I've still got another year to figure out out ;-)

EDIT: Ok, now I'm getting to a login where it asks for my FRN and password. Jumped through the ULS questions and then it sends me back to CORES to pay and then found the link to payment and have submitted the credit card info. We'll see if it works this time. Like I said above, I went through all these hoops last year just prior to expiration and it didn't take. Only found out recently that my license was still expired. This time I took a screenshot of the "Online Payment Transaction Initiated" page.

u/cafedude — 7 days ago

Will there be any more Qwen3.6 series models?

I'm still hoping we see a Qwen3.6-122B or a Qwen3.6-coder, but my hopes are dimming. Seems like we would have seen/heard something by now, even if just tantalizing hints from the Qwen folks.

reddit.com
u/cafedude — 10 days ago

I've got a 128GB Strix Halo box. Yesterday I wanted to try out Step-3.5-flash. It's a model that barely fits in my system as is - I found a bartowski Q4_XS that's 105GB. With about 150K context it takes to about 108GB. That leaves about 20GB minus what linux is taking so more like 17GB left. I ran opencode --continue so that I could try this model out in previous context. What I noticed was that with each query the memory (monitored in htop) bumped up but never completely went back to the previous use. So after a while it was up to 120GB. I figured that maybe doing a /compact would free up some of that memory, but no, it stayed at 120GB. I unloaded the model before the system ran out of memory.

I guess I would have thought that the memory use (weights + context) would be mostly fixed so that it would stay under about 110GB. But this gradually increasing memory use seems indicative of a memory leak.

I'm using llama.cpp 2.13.0 vulkan backend through LM Studio.

reddit.com
u/cafedude — 15 days ago