u/Any-Farm-1033

▲ 1 r/cursor

Composer 2.5 on Kimi K2.5, the text feedback RL bit is the interesting part

The headline is that Composer 2.5 is Cursor's strongest model and uses Kimi K2.5 as the base. Fine. The part I found more interesting is the targeted RL with text feedback.

Long agent rollouts fail in very local ways. One bad tool call. One confused explanation. One style mismatch. If you only reward the final result, it is hard to tell where the run went off track.

Cursor's approach, at least as described, inserts short feedback at the actual error location and uses that local context as a teacher signal. That feels closer to debugging an agent than just training a code model.

The synthetic task scaling is also worth watching. Deleting testable functions from real repos and asking the model to put them back is a clean reward setup. But the reward hacking examples are funny and scary: reverse engineering type caches, decompiling Java bytecode, doing whatever passes the test instead of solving the intended task.

This is why I still care about external verification. Cursor, Claude Code, Verdent, whatever tool you use, the agent needs checks that are not easy to game.

Composer 2.5 may be a model update, but it reads like a training story about where agent errors actually happen.

reddit.com
u/Any-Farm-1033 — 13 hours ago

A live home robot pilot in Shenzhen looks closer to a real service than a demo

I watched recent footage from a Shenzhen pilot where a human cleaner works with a robot system from X Square Robot through 58 home services. This looked more like service operations than a stage demo. The robot handled repetitive structured steps while the human handled judgment heavy tasks and exceptions.

From an AI perspective the interesting part is adaptation during the task. Motion timing changes with nearby people and clutter, and task flow is not always the same sequence. That suggests online perception plus control, even if there is still human supervision in the loop.

It is still early. Pilot cities can hide a lot of operational constraints, and scaling to more homes is where these systems usually break. But compared with polished one minute clips, this is at least a better test of what current embodied models can and cannot do in normal apartments.

reddit.com
u/Any-Farm-1033 — 1 day ago
▲ 6 r/ainbow

my dad accidentally gave my gf her favorite thing to call me

it was my birthday and the plan was just dinner with my gf. my dad doesnt know im gay, so in my head it was supposed to be a lowkey date where nobody had to think too hard. then about an hour before we left he said he wanted to come too, and i had to use the most normal voice of my life like yeah sure, totally fine.

she noticed my cherrykitten good girl tee when she picked me up and gave me that look immediately. i said dont before she even opened her mouth. she said she wasnt going to say anything, which was obviously a lie.

dinner was somehow not a disaster. my dad was talking to her like she was just my friend, and she was being so easy about it that i almost relaxed. then he looked over and said the top was very me, and i thought okay, cute dad comment, we can move on.

then he said you really are a good girl.

i stopped breathing. we looked at each other at the exact same time, then both looked straight down at our food like the salad had suddenly become very important. my dad had absolutely no idea what he had just done. he was just sitting there being proud of his wholesome birthday compliment.

we took this photo outside after dinner, right under that Her sign, which at that point felt a little too accurate. on the walk back she finally said it for the first time in his exact tone, and something in my face must have given me away because she started laughing and could not stop.

she still does it every time i wear this shirt. the worst part is it works every single time, and she knows it.

u/Any-Farm-1033 — 5 days ago

MiniCPM-V 4.6 is doing something weird with visual token compression and the numbers are wild

1.3B parameters, outperforms Qwen3.5-0.8B and Gemma4-E2B-it on multimodal benchmarks. Runs on 6GB memory. vLLM throughput is 1.5x faster than Qwen3.5-0.8B despite being larger. Token consumption on Artificial Analysis is 5.4M vs 233M for the Qwen reasoning variant. That's 1/43rd the compute for comparable performance.

The trick is LLaVA-UHD v4. They restructured the ViT to do early compression in the shallow layers. Visual tokens get compressed before they hit the deep computation layers. Plus a dual mode: 4x compression for quality tasks, 16x for speed. Same model, different tradeoff.

The 16x mode specifically is interesting because it makes high-res image TTFT nearly flat. 3136² image processes in 75.7ms. Fast enough for real-time interaction on consumer hardware.

Also notable: a single RTX 4090 can run the full fine-tuning pipeline. Barrier to customizing this model is basically zero for anyone with a gaming PC.

I've been testing small multimodal models locally for document parsing and screenshot analysis. The 16x compression mode is fast enough to use interactively without the latency killing the flow. For local dev work where you can't send images to cloud APIs, this model size finally makes sense. I run local OCR through this and then pipe the extracted text into Verdent for the actual coding work, keeps everything local until I need the cloud stuff.

Fine-tuning frameworks: ms-swift, LLaMA-Factory. Inference: vLLM, SGLang, llama.cpp, Ollama. Full open source on HuggingFace and GitHub.

reddit.com
u/Any-Farm-1033 — 8 days ago

Do you feel new to supply chain and like everyone else already knows what they're doing?

Recently moved into a supply chain related role at work and the learning curve has been rough so far. Before this i thought supply chain was mostly about placing orders and tracking deliveries. Now i'm realizing it's constant coordination, supplier follow-ups, comparing quotes, checking timelines, dealing with delays, trying to figure out what information actually matters and what doesn't

the hardest part is everyone around me talks like these things are obvious already. People casually mention lead times, moqs, certifications, shipping terms, backup suppliers… meanwhile i'm sitting there trying not to look completely lost in meetings. And honestly the amount of scattered information is driving me crazy too. Supplier info in emails, pricing in spreadsheets, updates in chat messages, random notes from previous coworkers. Half the time i spend more energy trying to understand what's going on than actually doing the work itself.

I've also started realizing how much supplier quality affects literally everything downstream. One small issue early on somehow turns into problems for multiple teams later. Been trying to learn as fast as possible, but right now it mostly feels like i'm constantly behind and pretending i understand more than i actually do.

reddit.com
u/Any-Farm-1033 — 10 days ago

Bootstrapped founder, demo prep dropped from 90 minutes to 5 after wiring up 4 agents

The dumbest thing about being bootstrapped is how much time you spend on stuff that doesnt directly make money but you cant skip. For me thats prospect research before sales demos.

Bit of background. Year 2 of a tiny saas, 4 to 5 demo calls a week, mostly inbound. Closing maybe 35% of them honestly. The reason wasnt my product or pricing. It was that i was walking into too many calls without knowing enough about the prospect, and i could feel it. Asking questions that should be obvious from their linkedin. Suggesting integrations they had clearly already mentioned in a recent post. Awkward.

For about 14 months my fix was to spend 90 minutes before every call doing manual research. Linkedin, recent news, builtwith for tech stack, sometimes their twitter. 4 demos a week times 90 min was 6 hours of grunt work that i was either doing instead of building or doing tired the night before. I honestly dont know why i waited so long to fix this.

Decided to actually fix this last month. Sharing the comparison because i tested a bunch and it took longer than expected to figure out which one worked.

Apollo.io. Already paying for it. Their data is solid for company info but it gives you the static profile, not the fresh signals i actually wanted (recent posts, layoff announcements, product launches). Good for contact data, bad for "what happened this week." Kept it for the contact data side but had to supplement.

Clay. Powerful but felt like learning a new product to use it well. Their enrichment is fantastic if you have lists of 100+ leads to process. For 4 to 5 prospects a week it was overkill and the per row cost added up faster than i expected. Also the learning curve is real. Spent a saturday on it and still felt like i was using 20% of the features.

Bardeen. Closest to what i wanted in terms of "watch a button and pull data". The browser extension was easy to set up. Issue was scheduled triggers were limited on my plan and i wasnt going to upgrade just for this. Also had some issues with linkedin rate limiting when i tried to pull too many profiles in a row. Minor but annoying.

MuleRun. Uses a browser extension that drives chrome with my logins already in place. 30 minutes before each demo i kick off a workflow. It opens the prospects company page, pulls their recent linkedin posts, news mentions, the tech stack via builtwith, and drops a 1 page brief into my drive. By the time im on coffee i have something to read. Catch is if a prospect has linkedin behind a heavy login wall or 2fa flow i havent solved, that source goes blank. Happens maybe 1 in 10. The other ~90% the extension running on my actual chrome session avoids that problem entirely.

So now my morning before any demo is, open the brief in drive, 5 minutes to skim, walk in with at least 3 things i can reference about them.

Close rate has nudged up over the last 6 weeks but the sample is too small to claim a number i actually trust. Could be variance. What i can claim is that i stopped feeling like i was showing up cold.

Ended up keeping apollo for contact data and mulerun for the live signals. Dropped clay and bardeen for now. May revisit clay when my deal volume justifies it.

If youre at the stage where 4 to 5 calls a week feel like a lot and prep is eating your evenings, this is the sort of thing thats genuinely worth a saturday to set up. Doesnt need to be these specific tools. Anything that gets you out of the manual loop probably pays for itself in week 1.

reddit.com
u/Any-Farm-1033 — 13 days ago

I have about 30 tracks sitting in my Udio library right now and maybe 2 of them have any kind of visual content attached. The rest are just audio files collecting dust because I never got around to making anything for them. I tried cutting together clips manually in a free editor last month and spent almost 4 hours on a single 2 minute track. The sync was off, the transitions looked rough, and I basically gave up halfway through and posted the song with a static image instead.

I know some people here commission freelancers for visuals but I genuinely cannot justify spending $200 to $500 per video when I am releasing tracks this frequently. I also looked into some generic text to video tools but they do not really understand music structure at all. You paste in a prompt and get something that has zero relationship to the beat or the energy of the song. What I actually want is something that listens to the track and builds visuals around the rhythm and sections automatically.

I have seen a few AI tools floating around that claim to do music to video generation but it is hard to tell which ones actually sync to the audio versus which ones just slap random clips over your song. Bonus if it can handle a direct link paste from Udio without needing me to download and convert files first.

Curious what workflows others in this community have landed on for turning finished tracks into something visual.

reddit.com
u/Any-Farm-1033 — 29 days ago