IA News & Research
Suivi en temps réel de la révolution IA : modèles, outils et recherche.
do people genuinely think that widely available flying cars are a good idea?
Indonesian office staff members hit by a Unitree G1
Gpt 5.6 discovered new math according to Sam Altman
Is the current Open Weight LLM model viable in the long term?
I've been thinking about this lately. The Qwen team has released several new models recently, but they appear to be holding back the 122B, 35B, 27B, and 9B versions for now.
One possible reason is that these larger models performed so strongly that the team chose not to release them immediately as open weights. If that's the case, they will likely wait until they have even more capable models before making them available.
Recent analyses suggest open-source models are currently lagging 2–4 months behind state-of-the-art systems. With Qwen now adding further 1–2 month delays (or longer) before releasing open weights, I'm concerned the gap could continue to widen. Could this eventually lead to another significant shift in the open-source landscape, similar to what happened with Meta-Llama models?
To clarify my focus: I'm particularly interested in Qwen models because they currently offer the best performance among models that can realistically run on consumer-grade hardware.
While I understand some community members maintain more substantial local setups capable of running 500B or bigger models, my question is aimed at those of us working with standard consumer GPUs.
Who Has The “Jankiest” Local LLM Setup? | Non-Official | Fun Contest | No Prizes
Had an idea for a fun no prize/non official competition to see who has the “Jankiest” local LLM setup.
NOTE: This is NOT an official competition. There are NO prizes. This is just for fun.
Rules:
- One Submission via comment per person
- Has to be your current setup or your previous setup.
- Submission comment cannot be modified after posting to ensure no photo swapping occurs.
- No prizes. To ensure there is less incentive to attempt to rig the competition and since this is not an official contest.
- Highest upvoted submission that doesn’t violate reddit tos, /r/locallama rules, or this non official competition rules will be declared the winner after 24 hours from this post being posted.
Requirements for submission:
- Photo of the local llm setup,
- Any explanation/benchmarks/etc (optional) that you want to include
You matter, you were warm and alive, and someone noticed you
Why U.S. Companies Are Quietly Being Run On Chinese AI
youtube.comDo you agree with Palantir CEO Alex Karp that the enterprise "tokenmaxxing" business model has "gone completely wrong" with minimal ROI? Will open-weight models inevitably win?
Palantir CEO Alex Karp recently went on CNBC’s Squawk Box and delivered a brutal takedown of the API token pricing model pushed by commercial frontier labs like OpenAI and Anthropic.
His core argument is that American enterprises are quietly "livid" because they are burning massive cash on skyrocketed token costs without seeing a clear return on investment. He noted that the industry’s incentive structure has completely devolved into meaningless "tokenmaxxing"—essentially forcing companies to maximize token throughput for questionable value while potentially transferring away their unique data and "alpha" to black-box systems.
Key takeaways from Karp's interview:
- The ROI Crisis: Advanced models are scaling in cost faster than they scale in utility. Karp joked that enterprise culture has become: "I’m going to chillax and waste my time with tokens."
- The Shift to Sovereignty: Technical enterprise customers and government agencies (including Palantir's clients transitioning to Nvidia's open-weight models) want complete control over their compute, data stack, and weights. They want to own the "means of production."
- The Global Threat: Belittling the speed of open-source progress—and rapid acceleration from Chinese labs—is a massive mistake.
My Take:
I completely agree with Karp. Frontier labs have built a predatory business model that encourages enterprise customers to overspend on infinite token loops without any guaranteed business outcome.
The API token business is going to become a commoditized race to the bottom. Open-weight models are winning because enterprises realize they cannot afford to lease their intelligence. To survive, businesses have to own their data, own their model weights, and build efficient, custom architecture rather than continually paying a premium tax to a third-party lab.
What are your thoughts? Is "tokenmaxxing" officially dead, or are open-weight models still too far behind the true frontier to replace them?
I developed a 270 million parameter language model entirely from scratch as an independent research project
The model is built on a custom Transformer architecture featuring Rotary Positional Embeddings, RMSNorm, SwiGLU feed forward layers, grouped query attention, and an efficient autoregressive decoder optimized for local inference.
Get in the hype wagon
New 18 dimensional math that will allow us to time travel
Google DeepMind Product and Design Lead using and advertising a competitor's model
[Harvard Business Review] AI Is Rewriting the Economics of Outsourcing
Summary.
Generative AI is changing the economics that fueled decades of outsourcing growth by automating many routine, rules-based tasks that companies once sent offshore for labor savings. Rather than deciding whether entire functions like finance, HR, or IT should be outsourced, leaders now need to analyze work at the task and workflow level to determine which activities AI can automate internally, which still require external expertise, and which become more strategically valuable to keep in-house. Companies that succeed will move beyond traditional labor-arbitrage models and redesign their organizations around AI-enabled speed, judgment, and control—while outsourcing partners evolve toward higher-skill, outcome-based services.
Guys, the ads are extremely weird. You really couldn’t come up with anything better than this?
I feel like I can sum up this entire ad campaign and concept as “so the person is like, and they have a thing and then they put it and then they like look and stuff real close tho”
I feel like I could, not using chat gpt, come up with 100 fake scenarios better than this without breaking a sweat.
Learning to write AI harness old fashioned way. Need help with attention drift and ignoring tool call results!
I've been writing a no-compile no-dependencies node.js based AI Harness for llama.cpp as a learning exercise and can really use some help. I'm basing my code off https://github.com/av/mi and https://pi.dev/ with really basic agentic loops. It basically loop until there are no more tool calls being made then returns the control to the user prompt.
My biggest problems are
- often times the LLM will ignore the tool call and the results and call the same tools again.
- or worse, sometimes it'll drift it's attention to answer a previously answered question and tries to work from there instead of the latest tool call or continue its plan.
I'm using a q4 quant of qwen3.6 27b. I don't experience this problem when I run the same model under pi. I've looked at pi's agentic loop implementation and there doesn't seem to be any special sauce.
I added reminder messages after tool calls to remind it to review them before moving on and it helps a bit, but I would like to know if anyone has experienced the same problem in their own AI harness development and how do you address it?
So far the reminder messages I've implemented kinna work, but it feels like band-aids than real cures.
Edit: add bare minimal source code.
tools/bash.mjs
if you have node.js installed 'node coffee.mjs' will run it. no dependencies. just make sure llama-server is running. all config information are stored as variables at the top of coffee.mjs. Very basic stuff, but should be very human readable code.
I have more tools and skills implemented, but this is the bare minimum that forms a basic AI coding agent/harness. Like I said, it's a learning project, not competing for anything. I've been using it as daily driver tho.
Oh, and if you have free AI resource, feel free to have it scan the code to see if it can help answer the question. thank you!
The war between Anthropic and Alibaba
Anthropic has accused Alibaba of creating tens of thousands of fake Claude accounts to scrape Claude of its intellectual property via distillation attacks.
Alibaba retaliates by telling their official (not contracted) employees to stop using Claude Code.
I'm noticing from Reddit posts and comments that Claude has gotten much more wary of what it determines as strange prompting requests?
There is an article indicating that Fable 5 has been "hardened" against distillation attacks, but it's locking out some legitimate users and refusing on innocuous requests.
Seems like a lot of users are caught in the middle?
How a 128gb ddr5 ram + 16gb vram, would work for a Moe model like Qwen 3.5 122b?
Who has results for this?
I’ve been working on Murmur, a local text-to-speech app for Apple Silicon Macs.
The new feature I’m building is called Projects / Story Studio, and it solves a problem I kept running into:
TTS tools are fine for one-off clips, but messy for actual audio projects.
If you’re making a podcast segment, audiobook chapter, course lesson, ad, or game dialogue, you usually need multiple speakers, multiple takes, pauses, reactions, music, edits, exports, and a way to come back to the project later.
So I built a project-based workflow:
Write a script → assign voices → generate dialogue → edit clips on a timeline → add music/SFX → export final audio.
It supports things like:
- multiple scripts inside one project
- Host / Guest / Narrator / Character speakers
- inline tags like
[pause],[laugh],[chuckle] - per-block regeneration
- timeline editing with waveforms
- media lane for music and SFX
- ripple editing and gap tools
- WAV/M4A export
- transcript and stem export
Everything runs locally on Mac, so long scripts and voice samples do not need to be uploaded to a cloud service.
I’m still polishing the workflow and would love feedback from Mac users, especially people who make podcasts, audiobooks, courses, YouTube narration, or game dialogue.
Who've told you that distributed training is impossible? Democratizing AI: The Psyche Network Architecture
It seems that not only it is totally possible without incurring in unfeasible excessively narrow train data transfer bottlenecks but that several models have already been trained using this method. It mostly depends on how many GPUs join such kind of network.
See here: https://psyche.network/runs