u/Former-Ad-5757 — reddlx

Any thoughts on adding speculators ( https://github.com/vllm-project/speculators ) support, I would think (not tested yet) that it would be an additional add-on to training.

If you train on a dataset then you could (I believe) also automatically create a custom draft-model with super good speculation (as it is based on the same dataset) and then you could transform both models to gguf and run it on your own hardware.

I would imagine that even for general usage a person could create a dataset from their own chats, then run a real shallow finetune with that dataset (just to set the personality and get a little speed up for the same sort of chat-messages). Then run speculators over the fine-tuned model with the dataset from your chats. Then convert it to gguf and take it for local interference.

That way everybody could with a new model immediately get a 3 to 4x speedup as long as that they chat in the same way as they used to do in the past. Everybody could build his own draft-model (maybe they would need better hardware than at home to train it, but you a gguf at the end, so a user could get a temporary runpod or alike and for 10 dollar they can make their 3 to 4x faster local interference).