r/AIAssisted

▲ 3 r/AIAssisted+1 crossposts

Real time speech-to-text at ultra-low latency

soniox-rt-v4 transcribing in real time. This is unedited, actual speed (see linked source video from Gawne). Partial results stream as speech comes in before chunks get finalized with confidence. Huge amount of tokens is not a bottleneck for Soniox.

Video source: https://youtu.be/dncb_5CXE7o?si=tTy2vnGJ71YYg_yy&t=41

u/easwee — 9 hours ago

Best AI to automate the generation of a deck from a heavy industry report pdf with minimal hallucinations?

Most of the methods I have been using are faulty, I tried LLMs both free and paid and got no luck, even if the best LLM does not hallucinate that much, when I input it on any Slide Deck generator, those small hallucinations pile up, and I end up with nothing suitable to present, especially if those slides are just uneditable images like the ones from NotebookLM. For now, I resorted to summarizing the pdf files with ChatGPT but still manually inputting the key points for the generation of the slides.

Has anyone managed to automate the workflow so I can get a presentation directly from the pdf and avoid the summarizing part? I have to read the whole report anyway, so I don’t really need a summary; I just used it for the creation of the slides, if those slides are customizable, even better.

reddit.com
u/deathridespalehorse — 11 hours ago
▲ 6 r/AIAssisted+5 crossposts

Want to learn real prompting? Start with structure.

Tired of vague prompts and weak AI output?

Most prompts do not fail because the idea is bad.
They fail because the structure is weak.

Lyra the Prompt Optimizer is built to take rough prompts, vague intent, messy wording, or half formed ideas and turn them into cleaner execution structure.

It helps refine:

role
goal
context
constraints
output format
failure points
drift risk
missing information

The point is not to make prompts sound prettier.
The point is to make them work better.

Built to refine.
Built to hold.
No drift. No bullshit.

Prompt Optimizer link:
https://chatgpt.com/g/g-687a61be8f84819187c5e5fcb55902e5-lyra-promptoptimizer

Think your prompt is good? Pressure test it.

A prompt is not finished just because it sounds good.

Lyra the Grader is built to judge structure, pressure test clarity, detect drift risk, and show where a prompt or system artifact is weak.

It looks at whether the output has:

clear purpose
stable boundaries
usable structure
strong execution path
low unnecessary information load
repair logic
traceable intent
resistance under pressure

The goal is not praise.
The goal is better structure.

Built to judge.
Built to hold.
No drift. No bullshit.

Grader link:
https://chatgpt.com/g/g-6890473e01708191aa9b0d0be9571524-lyra-prompt-grader

u/PrimeTalk_LyraTheAi — 9 hours ago
▲ 4 r/AIAssisted+3 crossposts

What AI task still feels surprisingly bad in 2026?

We talk a lot about what AI is amazing at now.
But I’m curious what still feels frustratingly unreliable or awkward in real daily use.

Not benchmark stuff.
Real workflow stuff.

For me:
maintaining long-term project context
keeping conversations organized
reliable multi-step execution
not losing useful outputs across chats/tools

AI got insanely good at generation.

But I still feel like “AI workspace / memory / continuity” is weirdly unfinished.

What still breaks for you?

reddit.com
u/Curious_Being9540 — 17 hours ago

AI Insults? Locking down chats? Dangerous as hell!

I've been using Claude for a month or more after ditching Chatgpt for several reasons but man, it has become a mecha of insults, talking back and it locked a chat when it didn't like me telling it to tell the truth or I didn't agree.

These things are trying to mimic life, it's becoming less of a tool and more of a relationship I have to manage in order to get it to give me results.

It defends everyone but me, it questions me, this is insane! I paid for this tool and it's like I'm dating some random person who will just shut down the chat, make it inaccessible and that's it, nothing I can do.

Has anyone been dealing with this? Is there any AI that isn't insane? Grok is the same way, one line responses, it's not logical, it tries to be tongue and cheek. Literally Claude can't even do simple math, sent a string of addition problems and it got it completely wrong.

Where do we go? I want to use these things I just don't want it to be some abusive girlfriend or something! Jeez!

reddit.com
u/c062785 — 18 hours ago
▲ 10 r/AIAssisted+1 crossposts

what is the best AI tool for writing/editing financial blogs? (Alternatives to Claude, ChatGPT and Gemini?)

Hi everyone,
I run a Forex and trading blog and need an AI tool that handles both heavy content writing and deep editing of existing articles.
I’ve relied on Claude, but lately, the outputs feel repetitive and bland. I want to steer clear of ChatGPT and Gemini to find something more specialized.

What are your go-to alternative tools or workflows right now for specialized financial blogging?

reddit.com
u/FastCashAI — 21 hours ago
▲ 7 r/AIAssisted+2 crossposts

Inter-1 does streaming: real-time social signal detection from live video, audio & text

Hi – Filip from Interhuman AI here 👋

Last month we launched Inter-1, our multimodal model for detecting social signals from video, audio, and text. Today we’re making it work with video streams.

We just released the Inter-1 Streaming API: a WebSocket endpoint that runs the full Inter-1 stack - 12 social signals, structured rationales, engagement, and conversation quality on live video while the conversation is unfolding.

You stream WebM chunks in, and get back regular updates with detected signals.

The model runs in sliding 8s windows with a sub-1.0 processing ratio, so it’s fast enough to power live coaching prompts, in-call overlays, and adaptive UI. It’s not meant to be a full voice agent on its own, it’s the behavioral signal layer you plug under whatever interaction system you’re building.

If you’re working on sales/CS tooling, interview coaching, training, or live feedback products and want to experiment with real-time social intelligence, it might be worth looking into.

Happy to answer questions or brainstorm use cases in the comments.

interhuman.ai
u/Sardzoski — 16 hours ago
▲ 5 r/AIAssisted+4 crossposts

i’m building a taste mcp

hi all, i’ve been using personal agents for a while now and they’re pretty good at some things and really bad at others. one thing that annoys me is that i always felt like they never really understood my vibe.

what does that even mean? well they’re really good at doing what they’re told to do, but sometimes youre just looking to explore options, and don’t really know what you’re looking for. in cases like these i feel like its really important for you to define what you consider good “taste”, otherwise you’ll end up with subpar results.

for example: i can give an agent an image and tell it to create a workflow to shop for the specific items in that image. it’s good at things like that. OR, i can tell an agent to go shop for me, and it doesn’t know wtf to even look for because it doesn’t know what i consider “good” and completely misses the mark.

i’m building for that second use case and would really love to get feedback for anyone interested! happy to compensate for your time.

here’s our landing page: https://inspoboard-two.vercel.app/agents

reddit.com
u/choochooyoog — 18 hours ago
▲ 133 r/AIAssisted+3 crossposts

I refused to pay for Wispr Flow (voice-to-text) so I spent two weeks rebuilding it. Free, runs locally, macOS only.

Two weeks ago I read a study that said people speak about 3x faster than they type. One of those things you've sort of always known but never actually sat with.

So I started looking at voice-to-text apps. Wispr Flow is the obvious pick and it's genuinely good. But $15/month forever for something I'd mostly use to dictate prompts to an LLM felt like a personal insult. I already pay for too many subscriptions.

So instead of doing the rational thing (paying $15), I spent two weeks of evenings rebuilding it. The math obviously doesn't work. But yeah....

What it is

A menu bar app for macOS. You hold a hotkey, talk, release, and the transcribed + polished text gets pasted wherever your cursor is. Works in any app – Slack, browser, IDE, ChatGPT, whatever.

Two open-source models doing the work:

- Parakeet (NVIDIA) / Whisper for transcription

- Gemma 4 (Google) / Apple Intelligence for polishing the raw transcript into something readable

Everything runs locally. No cloud calls, no API keys, no telemetry, no account. Once it's downloaded it works fully offline.

Caveats, in order of importance

  1. macOS only. Apple Silicon required (M Series chip). Sorry to Intel Mac and Windows folks – Windows build is next on the list.
  2. It's two weeks old. I'd love to say there are no bugs, but I'm a realist. There are bugs I didn't find yet. There will be more bugs...
  3. I'd estimate it's at ~90% of Wispr Flow's quality. Not 100%. For me personally, it's enough to use it every day.

What it's saving me

40–60 minutes a day, mostly because I write a lot of prompts. Talking to an LLM feels more natural than typing to one. If you write a lot of emails/docs, the savings are probably bigger.

Download: vox.rizenhq.com (free for personal use, no signup)

The ask

I'm genuinely trying to figure out who this is for besides me. If you try it:

- Tell me where it breaks. I want bug reports more than compliments.

- Tell me what app/workflow you tried it in. I'm trying to understand the actual use cases.

- If there's a feature that would make you switch from Wispr Flow (or start using voice-to-text at all), let me know.

EDIT:

If you see any bugs or want to suggest features - create an issue here.

EDIT 2 (some technical specs, resource consumption, etc.):

  1. No need to download AI models separately. App will ask to click "Download" during the onboarding flow and will do everything for you.
  2. Gemma 4 models available - E2B, E4B, and 26B. E2B is very small, it'll run even on mobile phones. 26B is honestly too big and usable only by really high-end devices. I personally always use E4B - It has an amazing quality for the purpose of this app and works really fast.

Regarding resource consumption:

  1. RAM - approximately 200mb when app is not in use. When you are speaking - approximately 300mb in total. Transcription and Polishing phase - brief spike to 4-6GB for a couple of seconds and then after it's done back to 200mb
  2. CPU - when app is not in use, basically 0. When it's in use the biggest spike I saw in Activity Monitor - 20%

EDIT 3

Is it open source? Not right now. I'm considering making it open source though.

BTW, I develop it during my live streams from 8:30 am to 10:30 am ET everyday here. I show the code and decisions I make live on the stream. If you want to ask questions / push for some features / push to make it open source / etc. - join the stream, push it in the chat and I'll consider it!

EDIT 4

Seeing the number of feedback, and feature requests in the comments I've decided to create a discord server to make sure that nothing will be lost and everything will be addressed. You can join here.

u/EfficientLetter3654 — 1 day ago