
Hi everyone,
I was tired of seeing local AI becoming a 'rich man's game' requiring 48GB VRAM cards. So I developed NexaQuant, an inference engine designed from the ground up for extreme optimization on old CPUs and low-RAM devices.
Key Innovations:
- Zero-RAM Mapping: Deep integration with
mmapto treat the disk as a transparent RAM extension. - Multiplication-Free Kernels: Custom ternary kernels (1.58-bit) using only ADD/SUB operations, perfect for old CPUs.
- Dynamic Layer Offloading: Runs models 10x larger than your physical RAM by managing layers one-by-one.
- Peak Performance: >500,000 layers/sec on a standard old-gen CPU.
It's open-source (GPL v3) and I'd love to get some feedback from the community. Let's fix the RAM crisis together!
u/WeAreNex4_ — 13 days ago