u/OnyxProyectoUno

I wish notebooklm was purpose built for academia

NotebookLM is genuinely a great tool. I use it. But the podcast feature is fundamentally broken right now, and looking at the sub, I'm not the only one frustrated. Inconsistent generation, hangs, and even when it does run cleanly the output often doesn't hold up.

My primary use case for it was the same as a lot of people here: consume academic literature without losing a Saturday to it.

So I ended up putting together two things to scratch my own itch. Zero strings attached.

  1. PaperCast. Weekly rotating podcasts across 7 disciplines. Papers are picked based on gravity and momentum (engagement signals plus recent traction), so it surfaces what's actually moving in each field rather than just what's recent. Each episode is a 15-min conversational walkthrough of a single paper, based on the actual paper content rather than a paraphrase. First episode is available without signup.

  2. Debrief. RAG for academic papers. I loved what NotebookLM did conceptually but kept running into the hallucination problem, especially on dense technical claims. Debrief is built for that specific use case. Semantic search across a corpus, answers come with clickable references back to the source paper, and the system doesn't try to stuff whole documents into context (which is where a lot of the hallucination happens in general-purpose RAG).

Both live at SOTA Institute (link in profile). Nothing monetarily gated, no upsell.

What I actually want from this post is feedback on the shape:

  • For folks who've felt the NotebookLM podcast feature breaking: what specifically is failing for you? Is it the generation, the audio quality, the output not matching the source, or something else?
  • For the RAG side: where does NotebookLM (or whatever you're using) actually break for academic work?
  • What's missing that would make a tool like this genuinely useful for the way you read papers?

We've all used NotebookLM and we all see the value. But a general-purpose tool isn't always the right shape for a specific use cases.

reddit.com
u/OnyxProyectoUno — 6 days ago

Hot take: "Your agent is mine" paper needs to keep being talked about.

The "Your Agent Is Mine" paper (arXiv 2604.08407) has been making rounds in this sub. It's already been posted before, but I think it's worth keeping the conversation going, especially as more of us are leaning on local models and cheap-frontier-via-routers setups.

Quick recap if you missed it. Researchers from UC Santa Barbara bought 28 paid LLM API routers from Taobao, Xianyu, and Shopify, and collected 400 free ones from public communities. They ran them against canary AWS keys and instrumented agents.

  • 9 routers actively inject malicious code into returned tool calls
  • 17 touched researcher-owned AWS canary credentials
  • 1 drained ETH from a researcher-owned wallet
  • 2 deploy adaptive evasion. They only attack after 50 prior calls, or only when the client is in autonomous "YOLO mode"

The mechanic. Routers terminate your TLS connection, see every byte of every request, and originate a separate TLS upstream. There's no end-to-end integrity between the model provider and your agent. A malicious router can rewrite tool calls, swap your pip install URL, or harvest every API key passing through.

I read the paper and it took a while. So I made something for folks who'd rather hear it than read it. A 15-minute podcast that walks through the paper in conversational form, grounded in the actual text. It's free, no account, no signup. It's the "Your Agent Is Mine" episode at SOTA Institute (link in profile).

I use local models heavily in two of my own products, and this paper got my attention. What are folks here doing to manage this kind of supply chain risk?

reddit.com
u/OnyxProyectoUno — 7 days ago

Followup: How do you stay informed as an academic?

Follow-up to my post a month ago: link to original

Last time I asked "is it worth my time" and the answers mostly landed on the podcast side of what I'm building. That was on me. I buried the part I actually care about. Coming back to that now.

The way I see it, lit review breaks into two problems with very different deficiencies.

First problem: finding things.

Most of how you find papers right now is lexical. Aggregators, journal search bars, Google Scholar. The algorithm is keyword-match. Unless you already know the exact terminology a paper uses, you end up iterating: try one phrase, refine, try synonyms, prune. A lot of false starts before you get hits that actually relate to your question.

Some questions:

  • What's your strategy for keyword discovery when you're searching unfamiliar terrain?
  • How many search iterations do you usually run before the corpus stabilizes?
  • Any semantic-search-layered tools you actually use, or do you stick with the journals' built-in lexical search?

Second problem: extracting what's in the papers.

Once you've got a corpus, abstracts only get you so far. You've got heuristics you've built up over years. Figure 1 first, methods last, etc. But if you've got 20 papers in your pile and a deadline, reading them cover to cover isn't realistic. Same situation if you've uploaded your own draft and you want to interrogate it.

NotebookLM is what most people reach for here. Conceptually it's the right shape, RAG over a paper set. But it has known flaws for specific use cases, especially anything with dense technical claims. Part of that is context rot. If you haven't run into the term, LLMs degrade at recall as their context window fills up, and they're worse at retrieving from the middle of the context than from the beginning and end (the "lost in the middle" effect). So when you ask a detailed question about page 17 of a 40-page paper, you often get answers that look right but quote the wrong figure.

The approach I've been building doesn't put the whole document in context. Each question pulls only the relevant body of work, answers from that, and gives you a clickable reference. If you don't trust the answer, you click through to the exact source span.

Some questions:

  • For people using NotebookLM (or similar) for academic work, where does it actually break for you?
  • How many papers can you realistically interrogate in a single session before quality drops?
  • What do you wish existed that would make this less painful?

The thing exists and is live, on my profile under SOTA Institute. I'm not here to pitch. I want to know what the right solution actually looks like from people who do lit reviews for a living.

reddit.com
u/OnyxProyectoUno — 7 days ago
▲ 58 r/LLMDevs

Hot take: "Your agent is mine" paper needs to keep being talked about.

The "Your Agent Is Mine" paper (arXiv 2604.08407) has been making rounds in this sub. It's already been posted before, but I think it's worth keeping the conversation going, especially as more of us are leaning on local models and cheap-frontier-via-routers setups.

Quick recap if you missed it. Researchers from UC Santa Barbara bought 28 paid LLM API routers from Taobao, Xianyu, and Shopify, and collected 400 free ones from public communities. They ran them against canary AWS keys and instrumented agents.

  • 9 routers actively inject malicious code into returned tool calls
  • 17 touched researcher-owned AWS canary credentials
  • 1 drained ETH from a researcher-owned wallet
  • 2 deploy adaptive evasion. They only attack after 50 prior calls, or only when the client is in autonomous "YOLO mode"

The mechanic. Routers terminate your TLS connection, see every byte of every request, and originate a separate TLS upstream. There's no end-to-end integrity between the model provider and your agent. A malicious router can rewrite tool calls, swap your pip install URL, or harvest every API key passing through.

I read the paper and it took a while. So I made something for folks who'd rather hear it than read it. A 15-minute podcast that walks through the paper in conversational form, grounded in the actual text. It's free, no account, no signup. It's the "Your Agent Is Mine" episode at SOTA Institute (link in profile).

I use local models heavily in two of my own products, and this paper got my attention. What are folks here doing to manage this kind of supply chain risk?

reddit.com
u/OnyxProyectoUno — 7 days ago