u/Glimpglomb

▲ 29 r/Zig

Tensor manipulation lib in zig

Hello, I was working on my tensor lib and I thought maybe its time to see what other people think:

https://github.com/RubyBit/aion

this has been a project ive been working on about half a year now and I am really happy with its current state. My goal was to take the approach of onnx, as in making portable models launchable anywhere but without the multi vendor approach they took and also provide a high level api to build models, similar to pytorch. This is an inference library though currently, no training.

Most of my focus has been on making the tiling strategy (to maximize cache locality) and on optimized kernels. Towards that goal, I really like zigs portable simd (@Vector) as it allows me to keep a high level attitude towards how to make the kernels and reuse them across cpus.

My landmark goals was to be able to run some models of interest to me with performance near onnx and pytorch such as gemma 4 (e2b) and silero-vad (ON CPU):

On my hardware (i9 14900hx, ddr5 5600mt/s), the current version of the lib is able to run silero with performance better than onnx and pytorch (~65ns per chunk / 576 samples 16khz 1 thread, check out examples, while onnx runs at ~100ns). Gemma 4 q8_0 runs at ~13tokens/s 32 threads (tbh I havent benched it with another lib yet). Next models im looking at will be some asr ones of interest (and also kolmogorov arnold networks / neural operators).

If you want to run the models yourself I can post the .aion versions in hf (there are also scripts to convert them yourself in the repo).

I have also used AI in this project as of course I wouldnt be able to make so much progress in 6 months, so this post might be removed but hopefully I pass the effort benchmark.

My question to you is mainly what you think of my tiling strategy/arch (checkout tiled tensor in src/storage and src/graph for details), my aot approach to optimized kernels and also which other models would be of interest. Next steps for me is to add some control flow nodes in the runtime and improve the api (and maaaybe look at gpu support).

Check it out and sorry for the top level read me (checkout bindings/python), I havent have time to write down proper docs.

reddit.com
u/Glimpglomb — 3 days ago
▲ 9 r/WebRTC+2 crossposts

This is kind of a revenge of the sith post, since I've done one more post in this subreddit quite a few months ago with the same project but I am really excited with how it has turned out. Last time, I didn't have any of the other dsp goodies such as noise suppression or gain control (2) but I have since added them.

I've also added in 0.2 (breaking) a graph based builder with the nodes being pretty much the processing steps of the audio. My last interface was good enough, but I had trouble with it on my own projects since it was very rigid, so I have discarded it for the aforementioned event driven dag one. I really like it as it has made by own processing very elegant but I've attached a generic linear pipeline that matches what the previous interface did with the new graph stuff.

Next thing is a bit of divergence with the actual webrtc project but I want to develop a better noise suppression module since the current one is very barebones (similar to the upgrade of webrtc from agc to agc2) but im happy with where the current project is at.

Check it out: https://github.com/RubyBit/aec3-rs

reddit.com
u/Glimpglomb — 23 days ago