u/Hackerstreak — reddlx

▲ 2 r/LLM

I'm trying to build a chat agent with a persona that can perform RAG from a PDF (converted to chunked embeddings for easier search). Using Llama 3.2B, tried to give a detailed system prompt about the persona and the basic things the bot needs to know about itself and how it should answer. Explicitly stating that it should acknowledge that it doesn't know something if the information is not contained in the PDF content only works up to an extent.

I read somewhere that apps like NotebookLM use routing of the prompt by intent classification and strict mathematical gating of information from RAG. So, I started routing by getting to know the intent of the user first. If they said "hello there billy", the router sends it to the LLM for a response instead of doing RAG. But this breaks the persona of the bot every now and then when the user asks something like "how's the day feeling?" which gets wrongly routed to RAG and the bot ends up saying "I don't know" as instructed in the system prompt.

I am new to this and I'm asking here for suggestions after exploring a bunch of different system prompts, different models (Llama, Gemma different size versions of them under 8B). Is it a limitation of the model size itself? I get that NotebookLM might be using a million-context model but should I take the route of Open-notebook or similar methods for even this simple conversational bot?

reddit.com

u/Hackerstreak — 25 days ago

▲ 82 r/computervision

Hey guys!

Visualizing the loss landscape of a neural network is notoriously tricky since we can't naturally comprehend million-dimensional spaces. We often rely on basic 2D contour analogies, which don't always capture the true geometry of the space or the sharpness of local minima.

I built an interactive browser experiment https://www.hackerstreak.com/articles/visualize-loss-landscape/ to help build better intuitions for this. It maps these spaces and lets you actually visualize the terrain.

To generate the 3D surface plots, I used the methodology from Li et al. (NeurIPS 2018). This is entirely a client-side web tool. You can adjust architectures (ranging from simple 1-layer MLPs up to ResNet-8 and LeNet-5), swap between synthetic or real image datasets, and render the resulting landscape.

A known limitation of these dimensionality reductions is that 2D/3D projections can sometimes create geometric surfaces that don't exist in the true high-dimensional space. I'd love to hear from anyone who studies optimization theory and how much stock do you actually put into these visual analysis when analysing model generalization or debugging.

u/Hackerstreak — 25 days ago

▲ 30 r/deeplearning

Hey guys!

I built an interactive browser experiment https://www.hackerstreak.com/articles/visualize-loss-landscape/ to help build better intuitions for this. It maps how different optimizers navigate these spaces and lets you actually visualize the terrain.

u/Hackerstreak — 25 days ago

▲ 49 r/machinelearningnews

Hey guys,

u/Hackerstreak — 25 days ago

▲ 0 r/technology

u/Hackerstreak — 25 days ago

▲ 46 r/learnmachinelearning

Hey guys!

u/Hackerstreak — 25 days ago

▲ 144 r/MachineLearning

Hey r/MachineLearning,

u/Hackerstreak — 25 days ago