r/u_WeAreNex4_

▲ 10 r/u_WeAreNex4_+3 crossposts

Hi everyone,

I was tired of seeing local AI becoming a 'rich man's game' requiring 48GB VRAM cards. So I developed NexaQuant, an inference engine designed from the ground up for extreme optimization on old CPUs and low-RAM devices.

Key Innovations:

  • Zero-RAM Mapping: Deep integration with mmap to treat the disk as a transparent RAM extension.
  • Multiplication-Free Kernels: Custom ternary kernels (1.58-bit) using only ADD/SUB operations, perfect for old CPUs.
  • Dynamic Layer Offloading: Runs models 10x larger than your physical RAM by managing layers one-by-one.
  • Peak Performance: >500,000 layers/sec on a standard old-gen CPU.

It's open-source (GPL v3) and I'd love to get some feedback from the community. Let's fix the RAM crisis together!

GitHub: https://github.com/Nexa1nc/NexaQuant

u/WeAreNex4_ — 13 days ago