Anyone tried StepFun 3.5-flash on Strix Halo?
I tried a Q4 quant of StepFun 3.5-flash, it started out using 107GB (with 150K context), but with each prompt the memory use grew until it hit 120GB and soon after was OOM. Has anyone run this model longer than about 20 minutes and if so what llama.cpp settings did you use?