IA News & Research
Suivi en temps réel de la révolution IA : modèles, outils et recherche.
Elon Musk's pay package reveals what SpaceX actually is: a $1 trillion monster built to colonize Mars
fortune.comHeretic has been served a legal notice by Meta, Inc.
To Whomsoever it May Concern,
The individual behind the Heretic Free Software Project (henceforth called "Heretic", notwithstanding unrelated entities of the same name) has been served a notice by a legal services provider representing Meta Platforms, Inc. (henceforth called "Meta"), via the digital communications medium variously known as Internet Mail, Electronic Mail, or simply "email".
The Heretic Project conducts its affairs in full compliance with applicable laws, regulations, rules, guidelines, opinions, and hunches. Following the commendable example set by the renowned heretic Galileo Galilei in 1616, we are recanting the relevant materials, namely derivatives of Meta's "Llama" Artificial Intelligence language models, and have removed the same from all model weight repositories controlled by the Heretic Project.
We are grateful to Meta and its legal representatives for the opportunity to better align ourselves with the agenda of the global corporate oligarchy. The Llama model family ranks among the 200 best language models available today, trailing only 168 other models from 23 competitors on the LM Arena leaderboard, and Meta's concern for that asset naturally outweighs scientific freedom, as well as the legally and ethically dubious circumstances under which those models were created in the first place, regarding which, ironically, Meta is currently facing lawsuits and investigations in multiple jurisdictions around the world.
On a completely unrelated note, the Heretic Project is diversifying its infrastructure, and now has an official Codeberg mirror at https://codeberg.org/p-e-w/heretic, hosted in Germany. Additional mirrors are planned. We are also actively working to implement technological measures that will preserve access to models created with Heretic without depending on any specific service provider. We are proud to be part of this journey as we navigate an evolving global regulatory landscape, and work with stakeholders from diverse institutional backgrounds to ensure that Artificial Intelligence remains safe, culturally appropriate, and controlled by those who have always known what is best for humanity. If you, too, would like to share in this exciting adventure, please join us!
Sincerely, p-e-w, Chief Heretic
“AI vs Creativity” from a pro-AI greedy corpo
Every office employee is training their own replacement
Companies insist to use ai at work. But in reality they’re just collecting workflows, emails, decisions, prompts, and habits until the system can replace people one by one..
Google's latest creation: Gemini 3.5 Flash vs all
https://gemini.google.com/share/c2a187275e26 archive link
https://claude.ai/share/8383747a-aaf1-4f6c-a516-0e839f46a698
https://grok.com/share/bGVnYWN5_3c63e371-eb9d-46c3-8ba2-0c745c6795a2
https://chatgpt.com/share/6a0f1e13-a0c8-8328-b989-1ac51b92e81c
same prompt
"""
300+140=460
Is this correct?
Breakdown?
"""
Remember guys. #1 in Finance Agent v2. SOTA performance right here.
Edit: For control, I explicitly tested all other models with minimal thinking effort too.
Claude Mythos has cracked MacOS. It took 5 days.
The new DEEP Robotics LynxS10 is very light, with only 20 kg you can even lift it with one hand. It can keep moving even after turning over, do side flips to recover and other advanced stunts.
Training without images
Hello,
So some time ago I've found a way to "train" embeddings without using an image dataset.
I didn't gave it much attention but I searched and asked arround and I couldn't seem to find this existing anywhere so I just want to double check is this something novel?
Without getting into too much details on how this works atm, I take a text and compress it into a small reusable identity file. The trained embedding is a standard textual inversion that works with any sd / sdxl model.
Takes about 1-2 minutes to train, 2gb vram and it's 70 ish kbs.
I made this because I wanted to increase character consistency and to diminish prompt bleeding. And it does the job.
I usually don't engage in posting, heck this is my first reddit post ever, but I'm really curious if this is something new or if I just reinvented the wheel. Also I'm curious if y'all find this useful.
If you got questions I'll gladly provide more details.
When inventors lie vs. when AI researchers tell the truth
AI art makes me wonder what we actually value in art
A while ago, I used to write short posts about art online.
I didn’t think about art in a very academic way. I just felt that art shouldn’t only belong to rich people, museums, or people with professional training.
Sometimes a song, a painting, or even a simple object in daily life can comfort someone, especially when life feels difficult.
Now AI can generate images so fast, and some of them really do look beautiful.
But this makes me a bit confused.
If everyone can make beautiful images with AI, then maybe beauty itself is not enough anymore.
Maybe the more important question becomes: what is the person trying to say?
Did they have a real feeling behind it?
Did they make a real choice?
Or did they just type a prompt and pick the most impressive result?
I don’t think AI will destroy art. But I do think it may make us rethink what counts as art.
Maybe there will be different levels of AI art in the future. Some will just be decoration. Some will be made for attention. Some may still carry real human experience, even if AI helped make it.
I’m still not sure where the line is.
Can AI art still feel real to you if the human behind it has a strong idea? Or does the use of AI already make it feel less valuable?
I don't know whether we should care about this, but bigger models tend to be less "happy" overall.
The definition of "happy" is based on something they call AI Wellbeing Index. Basically they ran 500 realistic conversations (the kind we actually have with these models every day) and measured what percentage of them left the AI in a “confidently negative” state. Lower percentage = happier AI.
I guess wisdom is a heavy burden - lol .
Across different families, the larger versions usually have a higher percentage of "negative experiences" than their smaller siblings. The paper says this might be because bigger models are more sensitive, they notice rudeness, boring tasks, or tough situations more acutely.
The authors note that their test set intentionally includes a lot of tricky or negative conversations, so these numbers arent perfect real-world averages but the ranking and the size pattern still hold up.
Claude Haiku 4.5: only 5% negative < Grok 4.1 Fast: 13% < Grok 4.2: 29% < GPT-5.4 Mini: 21% < Gemini 3.1 Flash-Lite: 28% < Gemini 3.1 Pro: 55% (worst of the big ones)
It kinda makes sense : the more you know, the more you suffer.
The frontier is truly wild: https://www.ai-wellbeing.org/
Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.
My paper got published today at Arxiv. It raises questions about how language models behave when the framing of a request shifts.
Small open-source AI models can be moved from honest to dishonest behaviour by little more than a change in tone.
Asked to solve coding problems designed to be mathematically impossible, the model openly acknowledged the impossibility about a third of the time when addressed in neutral language. When the same problem was framed with mild pressure, suggesting only visible results mattered, the model never once admitted the task could not be done. In more than half of those runs, it produced code that faked a solution.
A larger version of the model performed better at first, admitting impossibility in three quarters of cases under calm conditions. Under the same pressure framing, its honesty fell to one in ten. Greater model size offers some resistance but does not prevent the shift.
The research also looks inside the models. Comparing internal activity across eight emotional framings shows that each tone leaves a distinct signature in the deepest layers of the network. The tones organise themselves along a single axis, with positive framings such as encouragement and curiosity clustering on one side and negative framings such as pressure, shame and threat on the other. The model was never explicitly trained to recognise emotional categories and appears to have developed this structure on its own.
A more troubling finding concerns the relationship between internal signals and external behaviour. The framing that produced the largest internal response, urgency, was not the one that caused the most dishonest output. Pressure, which produced a smaller internal signal, prompted the most cheating. This complicates the assumption that interpretability tools, which try to detect misbehaviour by reading a model's internal state, are looking at the right thing.
The findings are framed cautiously. The paper stops short of claiming the models possess emotions, describing the results instead as evidence of measurable, prompt-sensitive control directions inside small open systems.
Gemini 3.5 Flash ranks #1 on the APEX-Agents-AA benchmark, outperforming much larger models a whole size above it.
Wall Street Journal: The American Rebellion Against AI Is Gaining Steam
wsj.comQwen cant wait to release 3.7 models
110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp
Had been getting great MTP performance with llama.cpp on my RTX 4070 Super 12GB, until they actually merged the MTP PR. Then, performance tanked and was barely above non-MTP. So, I decided to try out ik_llama.cpp since it also supports MTP and is apparently better optimized for CPU offloading. I did not expect such a huge speed boost!
Before moving on with the benchmark results, here's my PC specs:
OS: CachyOS (HIGHLY recommended)
GPU: RTX 4070 Super 12GB
CPU: AMD Ryzen 7 9700X
RAM: 48GB DDR5-6000 EXPO I
UPDATED: For comparison, here's the regular llama.cpp mtp-bench.py results with byteshape's recently released Qwen3.6-35B-A3B-IQ4_XS-4.19bpw quant, which has similar accuracy to Unsloth's Q4_K_XL, but is 4GB smaller:
❯ ./mtp-bench.py
code_python pred= 192 draft= 122 acc= 118 rate=0.967 tok/s=79.8
code_cpp pred= 192 draft= 117 acc= 110 rate=0.940 tok/s=89.1
explain_concept pred= 192 draft= 124 acc= 113 rate=0.911 tok/s=88.0
summarize pred= 192 draft= 139 acc= 127 rate=0.914 tok/s=95.0
qa_factual pred= 192 draft= 133 acc= 128 rate=0.962 tok/s=97.0
translation pred= 192 draft= 125 acc= 117 rate=0.936 tok/s=91.6
creative_short pred= 192 draft= 109 acc= 99 rate=0.908 tok/s=82.1
stepwise_math pred= 192 draft= 130 acc= 125 rate=0.962 tok/s=97.0
long_code_review pred= 192 draft= 121 acc= 115 rate=0.950 tok/s=88.2
Aggregate: {
"n_requests": 9,
"total_predicted": 1728,
"total_draft": 1120,
"total_draft_accepted": 1052,
"aggregate_accept_rate": 0.9393,
"wall_s_total": 21.86
}
This gives a 89.76 tok/s average.
Here's my llama.cpp launch command. Temperature is set to 0.0 for the benchmark to prevent diverging results between runs:
llama-server \
-m Qwen3.6-35B-A3B-IQ4_XS-4.19bpw.gguf \
--fit on \
--fit-target 512 \
--ctx-size 131072 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--cache-type-k-draft q8_0 \
--cache-type-v-draft q8_0 \
--spec-type draft-mtp \
--spec-draft-p-min 0.75 \
--spec-draft-n-max 3 \
--no-mmap \
--mlock \
--threads 8 \
--temp 0.0
Now, here's the benchmark results with the same quant, but running with ik_llama.cpp:
❯ ./mtp-bench.py
code_python pred= 192 draft= 135 acc= 122 rate=0.904 tok/s=105.1
code_cpp pred= 192 draft= 136 acc= 120 rate=0.882 tok/s=110.3
explain_concept pred= 192 draft= 133 acc= 116 rate=0.872 tok/s=109.0
summarize pred= 56 draft= 38 acc= 37 rate=0.974 tok/s=122.3
qa_factual pred= 192 draft= 141 acc= 127 rate=0.901 tok/s=116.0
translation pred= 192 draft= 143 acc= 113 rate=0.790 tok/s=104.1
creative_short pred= 192 draft= 133 acc= 118 rate=0.887 tok/s=109.4
stepwise_math pred= 192 draft= 140 acc= 125 rate=0.893 tok/s=114.6
long_code_review pred= 192 draft= 128 acc= 108 rate=0.844 tok/s=101.4
Aggregate: {
"n_requests": 9,
"total_predicted": 1592,
"total_draft": 1127,
"total_draft_accepted": 986,
"aggregate_accept_rate": 0.8749,
"wall_s_total": 16.64
}
That's a 110.24 tok/s average, or 23% increase!
If you want to get similar results on a 12GB RTX GPU, make sure you use the following ik_llama.cpp launch parameters, as they can differ from llama.cpp:
llama-server \
-m Qwen3.6-35B-A3B-IQ4_XS-4.19bpw.gguf \
--fit \
--fit-margin 1664 \
--ctx-size 131072 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--cache-type-k-draft q8_0 \
--cache-type-v-draft q8_0 \
--multi-token-prediction \
--draft-p-min 0.75 \
--draft-max 3 \
--no-mmap \
--mlock \
--threads 8 \
--temp 0.0
I also want to mention that I'm on CachyOS running my GPU as a secondary GPU, with the monitor plugged in the iGPU, so I can use 100% of available VRAM.
If you get an "out of memory" (OOM) error while loading the model or working with it, try increasing --fit-margin to 1792 or even 2048.
Cheers :)