How do local users run large models locally?
Just as the title says, the furthest I can go is 31B. But I'm curious how people are able to run larger models at respectable quants with seemingly modest hardware.
Or are those setups only "technically" able to run them, with slow text generation and prefill speeds?
I'd like to be able to run larger than 31b models so I'm looking for ways to do so.
Thanks!