u/EasyAbbreviations757 — reddlx

Basically I finetuned a model on a dataset that contained information related to general queries asked in a service center and the responses where how those procedures where performed and what were the policies. Now when I am chatting directly to this model, its asking relevant questions and not assuming things about the user. But, when I performed RAG to make sure the responses are accurate, it is hallucinating and assuming things about the user, plus sometimes even spitting the prompt in the chat itself for some reason. The model is meta llama 8b instruct, I finetuned it using unsloth and downloaded it and quantized it to Q6, and am using LM Studio to host it. Any suggestions or advice would be highly appreciated.