u/Hot_Consideration155 — reddlx

Jacob playing in NYC?

Jacob just received NYU’s doctorate. Is he still in Manhattan? Maybe he shows up playing somewhere?

If you've ever generated PRBS streams in an FPGA for testing, you know gzip is useless on that data — near-maximum Shannon entropy, looks like noise to any statistical compressor.

I built a codec that runs Berlekamp-Massey over GF(2⁸) to recover the minimal LFSR polynomial, then stores (coefficients + seed + sparse XOR residual) instead of raw bytes. For a pure m-sequence with L=1, that's 2 bytes of model for any length stream. For noisy data there's a multi-stage approximate pipeline covering up to ~28% byte error rate.

The --analyze flag is probably the most useful thing for FPGA work — given any binary file it identifies the generator polynomial, LFSR order, noise level, and period per segment. Useful for reverse engineering unknown PRBS patterns or verifying your LFSR implementation is producing the right sequence.

Benchmarks:

GF(2⁸) m-sequence (L=1, perfect): 333:1
Mixed LFSR (L=1/2/3, 8% noise) 1MB: 91:1
Falls back to raw passthrough for non-LFSR data

npm install pade-compress (needs gcc/clang for the C addon) GitHub: https://github.com/gonll/algex

reddit.com

u/Hot_Consideration155 — 18 days ago

▲ 0 r/compression

Shannon entropy is the wrong metric for compressibility. A GF(2⁸) m-sequence has ~8 bits/byte entropy — statistically indistinguishable from random noise. gzip stores 1MB of it in ~1MB. My compressor stores it in 3KB.

The approach: run Berlekamp-Massey over GF(2⁸) to find the shortest LFSR that generated the byte stream. If LFSR order L is small relative to sequence length N, you store (L coefficients + L seed bytes + sparse XOR residual) instead of raw bytes. For a pure m-sequence with L=1, that's 2 bytes of model and an empty residual — for any length stream.

For noisy data there's a multi-stage approximate pipeline: brute-force all 255 GF coefficients for L=1, voting over GF quadruples for L=2, quintuple-pair voting for L=3, sub-sequence BM voting for L=4/5.

Benchmarks on M-series Mac:

GF(2⁸) geometric sequence (L=1): 333:1
Mixed LFSR (L=1/2/3, 8% noise) 1MB: 91:1
/bin/ls: 5:1
Natural language: ~1:1 (detected, falls through to raw passthrough)

The structure detector (--analyze) identifies the generator polynomial, LFSR order, and noise level per segment — useful even when you're not compressing.

GitHub: https://github.com/gonll/algex | npm: npm install pade-compress

reddit.com

u/Hot_Consideration155 — 18 days ago