r/compression

▲ 0 r/compression+1 crossposts

How do you privately validate a novel compression architecture without burning patent rights?

I’m looking for advice from people with serious experience in data compression, information theory, technical diligence, or IP strategy.

I started building a deterministic CPU-based AI architecture a few years ago because mainstream probabilistic models did not give me the guarantees I needed and were too GPU-dependent for my goals. During development, it became clear that part of the architecture had compression implications. That led me into deeper research around information theory, Kolmogorov complexity, the pigeonhole principle, and compression benchmarks.

I believe I have developed a novel compression-related architecture that is not a conventional entropy encoder and not part of the usual LZ/Huffman/arithmetic/ANS/PPM/BWT family. I am intentionally not describing the mechanism, transformation structure, or internal method publicly because I am still working through patent protection and international novelty risk.

The problem is validation.

A public prize like the Hutter Prize would require source disclosure, but the source would expose the core mechanism. That same mechanism is also foundational to a broader deterministic AI system I am building. I do not want to create public prior art against myself or hand the method to larger companies before the IP position is protected.

I am looking for guidance on the safest credible path to private validation.

Specifically:

  1. How can a novel compression claim be evaluated privately without public source release?
  2. Are there reputable researchers, labs, attorneys, or technical diligence groups that handle this kind of review under NDA?
  3. Are there alternatives to public-code prizes for validating compression systems?
  4. What should I avoid saying publicly before patents are filed?
  5. Are there funding paths specifically for patent protection and private hard-tech validation?

I understand that extraordinary compression claims are usually met with skepticism, and rightly so. I am not asking anyone to accept the claim from a post. I am asking how to get the work reviewed and protected without accidentally disclosing the core invention.

The broader project includes deterministic AI and low-cost information infrastructure, but the immediate proof surface is compression because compression is measurable.

Any serious guidance on IP-safe validation paths would be appreciated.

reddit.com
u/Conscious_Quit_1805 — 2 days ago
▲ 16 r/compression+1 crossposts

Encoding raw image library into JXL

Hi all,

I'm an amateur photographer and throughout the years have accumulated vast amounts of raw files from my cameras. I'm now considering to free up some disk space by converting my older raw files into higher quality photos.

I'm using RawTherapee to do the processing, but unfortunately it does not support JXL. So I'm planning to export into uncompressed TIFF (16-bit) from RT. Is there some tool that I could use to batch-convert the TIFFs into JXL in one go?

Lossless JXL might be overkill and I think it might take even more space than the raw files themselves. In past I shot with a point-and-shoot camera and DSLR which produced raw files of about 8MB and 17MB.

Which JXL settings, quality levels, etc. would you recommend? When I did exports in JPEG format I used JPEG quality level 90 % which seemed to produce files of about 3-4 MB. Are the JPEG and JXL quality settings somehow comparable?

Thanks!

reddit.com
u/UniversityUpstairs93 — 4 days ago
▲ 2 r/compression+2 crossposts

I built a client-side image compressor that never uploads your files

Most image compression tools send your files to a server. I wanted one that didn't, so I built nosend.io.

Everything runs in the browser using the Canvas API. JPG, PNG, WEBP, GIF, and HEIC are all supported. Nothing is transmitted anywhere - the compression happens locally on your device.

A few technical details for those curious:

- Compression uses the browser's native Canvas API (drawImage + toBlob with quality parameter)

- HEIC conversion uses heic2any, loaded on demand

- Batch processing supported, no file size limits

- Works offline after first load

Would be curious to hear from people in this sub about the quality tradeoffs - particularly around PNG (which is lossless via canvas) vs JPG at various quality settings. Happy to answer questions.

https://nosend.io

u/Anonimoste — 5 days ago

Trinity atcl video compress

Hi everyone,

I’m preparing video footage for my Trinity ATCL digital exam.

My recording is 38 minutes long with an original size of around 4GB, and I need to compress it down to roughly 980MB to fit the official file size limit.

All common online compression tools only support files under 500MB, so I intend to transfer the video via Google Drive and finish compression on my computer instead.

One crucial requirement here:

The finished video has to be standard MP4 or MOV format. It must be playable directly without any decompression tools needed at all.

I wonder if it is feasible to shrink this video down to the target size whilst retaining decent visual and audio quality suitable for exam review.

Would anyone with relevant video processing or Trinity exam submission experience kindly share some useful PC software or proper compression settings? Many thanks in advance.

reddit.com
u/randomgurlfromhk — 6 days ago

[Seeking Review] SPX: A Lossless Image Codec using RCT + MED + Sharding + rANS

Hi all,

I've spent the last few months developing a lossless image compressor called SPX, aiming to balance compression density and encoding speed, that is, maintaining compression rate higher than .webp (m6) but lower than .jxl (e7) while significantly enhancing encoding speed.

I did some testing and the performance seems consistent in most datasets but compression savings aren't that consistent.

https://preview.redd.it/kg6xfriluuzg1.png?width=1231&format=png&auto=webp&s=0a2f2a5e4de4c1df3059f6f24536cad59bcf9d92

I think I've hit my limit as a self-taught amateur developer knowing a little Python. I can't come up with any new idea to improve it anymore so Gemini suggested coming here for professional advice.

It's an Apache 2.0 open source project. Any suggestion on how to improve compression rate without losing too much speed is highly appreciated! Thank you!

GitHub: https://github.com/nonkilife/SPX-Image-Lossless-Compression

Quick Start: pip install spx-codec

==

// The Architecture:

SPX isn't a fundamental breakthrough, but a streamlined 4-part pipeline designed for modern CPU throughput:

  1. RCT: Reversible Color Transform (Green-sub).
  2. MED: Branchless Median Edge Detector.
  3. Stateless Sharding: Pixels are allocated into 42 shards based on local gradient (v), luminance (i), and direction (t). These 3 parameters can be adjusted to accommodate different types of images to obtain better performance.
  4. Entropy Coding: Rust-based 4-way Interleaved rANS.

// Customization & Extensibility:

  • Dynamic Sharding: The (i, v, t) boundaries for pixel classification are not hard-coded. They can be easily re-tuned to accommodate specialized image distributions.
  • Flexible Entropy Modeling: The rANS probability modes are stored in .npz format. This allows users to swap or retrain templates for specific datasets without re-compiling the core Rust engine.
  • Adaptive Framework: While current design is a common solution, the architecture is designed to be a "compression sandbox" for specific domain needs.

// The Performance (Snapshot on AMD Ryzen 5 3500X):

  • Encoding Speed: ~12 MB/s on Kodak, peaking at 44 MB/s on standard synthetic sets.
  • Compression Ratio: Consistently 25-30% smaller than PNG; sits between WebP (M6) and JXL (E7) most of the time.
  • Validation: Bit-perfect verification (MSE = 0) with an integrated unified benchmark suite.
  • Target Data: Tested on CLIC, DIV2K, Tecnick, ICI, and Kodak (primarily natural photography).
  • Limitation: Validation on synthetic images is currently limited, so consistency in those specific domains remains a known unknown.
  • Comparative Benchmark: https://github.com/nonkilife/SPX-Image-Lossless-Compression/blob/main/technical/BENCHMARK.md

// The Bottleneck:

I've reached a point where manual optimizations (branchless logic, LUT, SIMD-friendly structures) are no longer yielding significant gains.

I've experimented with:

  • Predictors: Swapping MED for GAP or Paeth (MED still wins on speed/ratio balance).
  • Context: Adding UR, UU, LL pixel data to MED (speed tumbled, ratio improvement was negligible).
  • Sharding: Tested >5,000 shard combinations up to ~60 shards using Monte Carlo Simulation; the current 42-shard model seems to be the "sweet spot" for speed. Adaptive sharding based on image unique fingerprints (eg. H-entropy, AAD, size, R:G:B proportion, etc) was also tested but compression improvement was minor and experienced significant speed loss.
  • rANS PDF: High-bit modes proved too overhead-heavy for most shards after analyzed Clic 2021 dataset.

While 90% of approaches are proven failure, there is still unexplored territory:

  • 8-way Interleaving: I've considered scaling the rANS core to 8-way interleaving. However, initial analysis suggests my current Zen 2 architecture (3500X) might suffer from cache port contention or register pressure at that level. I've stuck with 4-way as a stable, high-efficiency baseline.
  • C++ & AVX-512: The current engine is a Python/Rust hybrid. I suspect a pure C++ implementation leveraging AVX-512 could push the throughput slightly higher, but that currently exceeds my personal technical stack.
reddit.com
u/Nonkilife — 14 days ago

Built a tool to stop paying twice for the same LLM tokens

Six months of heavy API usage and my bills felt higher than they should be. Finally sat down and traced exactly where the tokens were going.

Turned out most of it was repetition. Every API call resends the full context window, the whole conversation history, the system prompt, all of it. The context resets each call. You're paying for the same information over and over, every single request.

Built ContextPilot to fix it. It sits between your code and the API and compresses context before each call.

Saving around 60% on API costs at my usage level. MIT licensed, no account needed, works with OpenAI and Anthropic.

Still early, v0.2.2 on PyPI. Would genuinely appreciate feedback from anyone who gives it a try, especially on edge cases or integrations I haven't thought about.

github.com/msousa202/ContextPilot

contextpilot.org
u/Ok_Alternative_3007 — 11 days ago