u/truehaZker

I've created the Repairable AI Interchange Format for structured data that saves 10% tokens using vLLM plugin
▲ 1 r/Vllm+1 crossposts

I've created the Repairable AI Interchange Format for structured data that saves 10% tokens using vLLM plugin

I've recently been curious about how well JSON fits the LLM nature at all. Is it optimized for non-deterministic processes? I thought there should be a better approach and better format that can handle the LLM quirks and just be more efficient for LLMs instead of being made for humans.

And I've created RAIF. It's not only standard, but a multi-level system that has standard, LoRAs, and vLLM plugin.

On the benchmark that has a lot of different types of JSON, the avg token saving is 10%, but this number fluctuates based on the JSON type and tokenizer type. One of the best performing types is when JSON has a lot of repetitive data, and in this case, savings went up to 70% of tokens.

The coolest part in my opinion is that this RAIF thing is compatible with all the existing clients and harnesses, because the vLLM plugin converts RAIF that's an LLM's output to JSON deterministically before it reaches the client, so you're getting a fully compatible API as it was before.

The only problem I got is response_format streaming. It outputs plain RAIF, only if you turn off the streaming, the response_format will become JSON.

Also, for it to work, I've fine-tuned some models using LoRAs and created 3 of them for now:

Even the tiniest model performs nicely with RAIF.

There's an option to run RAIF without a plugin, so clients will get pure RAIF instead of JSON. For this, I've created light Python and TypeScript packages without any dependencies.

Right now, I really want to get the feedback, and I want to see how well it fits the existing needs. It's more like an experiment for self-education, so idk if there's a real use case, that's why I'm calling for the community's help.

https://reddit.com/link/1ugymq4/video/b7glacjcus9h1/player

reddit.com
u/truehaZker — 1 day ago