IA News & Research
Suivi en temps réel de la révolution IA : modèles, outils et recherche.
Gpt 5.6 discovered new math according to Sam Altman
You matter, you were warm and alive, and someone noticed you
Get in the hype wagon
New 18 dimensional math that will allow us to time travel
Google DeepMind Product and Design Lead using and advertising a competitor's model
Guys, the ads are extremely weird. You really couldn’t come up with anything better than this?
I feel like I can sum up this entire ad campaign and concept as “so the person is like, and they have a thing and then they put it and then they like look and stuff real close tho”
I feel like I could, not using chat gpt, come up with 100 fake scenarios better than this without breaking a sweat.
Is the current Open Weight LLM model viable in the long term?
I've been thinking about this lately. The Qwen team has released several new models recently, but they appear to be holding back the 122B, 35B, 27B, and 9B versions for now.
One possible reason is that these larger models performed so strongly that the team chose not to release them immediately as open weights. If that's the case, they will likely wait until they have even more capable models before making them available.
Recent analyses suggest open-source models are currently lagging 2–4 months behind state-of-the-art systems. With Qwen now adding further 1–2 month delays (or longer) before releasing open weights, I'm concerned the gap could continue to widen. Could this eventually lead to another significant shift in the open-source landscape, similar to what happened with Meta-Llama models?
To clarify my focus: I'm particularly interested in Qwen models because they currently offer the best performance among models that can realistically run on consumer-grade hardware.
While I understand some community members maintain more substantial local setups capable of running 500B or bigger models, my question is aimed at those of us working with standard consumer GPUs.
I’ve been working on Murmur, a local text-to-speech app for Apple Silicon Macs.
The new feature I’m building is called Projects / Story Studio, and it solves a problem I kept running into:
TTS tools are fine for one-off clips, but messy for actual audio projects.
If you’re making a podcast segment, audiobook chapter, course lesson, ad, or game dialogue, you usually need multiple speakers, multiple takes, pauses, reactions, music, edits, exports, and a way to come back to the project later.
So I built a project-based workflow:
Write a script → assign voices → generate dialogue → edit clips on a timeline → add music/SFX → export final audio.
It supports things like:
- multiple scripts inside one project
- Host / Guest / Narrator / Character speakers
- inline tags like
[pause],[laugh],[chuckle] - per-block regeneration
- timeline editing with waveforms
- media lane for music and SFX
- ripple editing and gap tools
- WAV/M4A export
- transcript and stem export
Everything runs locally on Mac, so long scripts and voice samples do not need to be uploaded to a cloud service.
I’m still polishing the workflow and would love feedback from Mac users, especially people who make podcasts, audiobooks, courses, YouTube narration, or game dialogue.
How a 128gb ddr5 ram + 16gb vram, would work for a Moe model like Qwen 3.5 122b?
Who has results for this?
A new open-source image model, SeFi-Image/Turbo with 1B, 2B, and 5B variants.
| Family | Model | Checkpoint | Steps | Guidance |
|---|---|---|---|---|
| Base | SeFi-Image-1B-Base | SeFi-Image/SeFi-Image-1B-Base | 50 | 4.0 |
| Base | SeFi-Image-2B-Base | SeFi-Image/SeFi-Image-2B-Base | 50 | 4.0 |
| Base | SeFi-Image-5B-Base | SeFi-Image/SeFi-Image-5B-Base | 50 | 4.0 |
| RL | SeFi-Image-5B-RL | SeFi-Image/SeFi-Image-5B-RL | 50 | 4.0 |
| Turbo | SeFi-Image-1B-turbo | SeFi-Image/SeFi-Image-1B-turbo | 4 | 1.0 |
| Turbo | SeFi-Image-2B-turbo | SeFi-Image/SeFi-Image-2B-turbo | 4 | 1.0 |
| Turbo | SeFi-Image-5B-turbo | SeFi-Image/SeFi-Image-5B-turbo | 4 | 1.0 |
New converter node for Comfyui - FP16, FP8, NVFP4, INT8 Convrot
The otters were very busy! 🦦✨ My new ComfyUI Starnodes Model Converter is finally ready to help you convert any model FAST.
Here are the quick specs:
- Inputs: Transformers, FP32, FP16, FP8, Int8, AIO Checkpoints
- Outputs: FP32, FP16, FP8, Int8, CONVROT, NVFP4
- Bonus: Built-in quality profiles for most models
Grab the node here and let me know what you think: 🔗https://github.com/Starnodes2024/comfyui-starnodes-modelconverter
Qwen 3.6 27B - VLLM Performance Benchmark Results (BF16, FP8, NVFP4)
Sharing some testing of Qwen 3.6 27B using VLLM across the popular quants on my development system. I used llama benchy to generate the results, then fed it into an LLM to format it the tables for readibility.
While NVFP4 is blazing fast, have had looping issues in copilot that I don't get with BF16, and the responses in general when used in agent mode seem to be less thorough than the higher quants. Based on these results, FP8 seems to be the right choice. Some of the parameters can be further tuned I'm sure to get better performance but these are were all plenty fast enough for coding purposes.
I used to use llama.cpp, but have found that VLLM is in practice is faster (due to paged attention), as well as more stable (llama.cpp would give me random errors that happen frequently, requiring me to reset the prompt or restart the service).
If you have any comments or suggestions to improve let me know.
Test System:
Motherboard: Asus Proart Z890
CPU: Intel 270K plus
RAM: 96GB DDR5 (6000MHZ)
GPU: RTX 6000 Pro Blackwell 96GB (Max-Q, ECC enabled)
Software:
OS : Ubuntu 26.04 LTS (x86_64)
Python version : 3.12.13
vLLM Version : 0.24.0
NVIDIA-SMI 595.71.05
CUDA Version: 13.2
Models:
Qwen 3.6 27B - BF16 and FP8 (HF Qwen)
Qwen 3.6 27B - NVFP4 (HF Nvidia)
* replaced the delivered jinja scripts with the fixed chat template
VLLM Parameters:
GPU_COUNT="1"
MAX_LEN="262144"
export VLLM_USE_DEEP_GEMM=0
export FLASHINFER_MAX_NUM_TOKENS=8192
export TORCH_CUDA_ARCH_LIST="12.0f"
export TORCH_FLOAT32_MATMUL_PRECISION=high
export PYTORCH_ALLOC_CONF=expandable_segments:True
export VLLM_USE_FLASHINFER_SAMPLER=1
vllm serve "$MODEL_PATH" \
--port "$PORT" \
--tensor-parallel-size "$GPU_COUNT" \
--max-model-len "$MAX_LEN" \
--performance-mode interactivity \
--attention-backend FLASHINFER \
--gpu-memory-utilization 0.88 \
--max-num-seqs 2 \
--enable-chunked-prefill \
--max-num-batched-tokens 8192 \
--kv-cache-dtype fp8 \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--speculative-config '{"method":"mtp","num_speculative_tokens":2}' \
--enable-prefix-caching \
--trust-remote-code
Key Performance Takeaways
- NVFP4 dominates token generation speed (~2.6x faster than BF16): Because token decoding is strictly memory-bandwidth bound, compressing weights to 4-bit dramatically slashes PCIe/VRAM data transfers, allowing generation throughput to jump from ~61 t/s (BF16) up to ~163 t/s (NVFP4).
- FP8 wins on prompt processing & prefill speed (~20% faster than BF16): Prompt prefill is compute-bound (heavy matrix math). FP8 leverages native Tensor Core acceleration with zero dequantization overhead, beating both BF16 and NVFP4 during ingestion.
- NVFP4 has a slight prefill penalty vs. FP8: Because NVFP4 must dequantize weights on the fly during large compute-heavy prefill batches, it trails FP8 by ~10–15% in prompt processing speed, though it still outperforms baseline BF16.
1. Token Generation Speed (tg32 Throughput)
Higher is better. Measures decoding speed when generating 32 new tokens across increasing context depths.
| Context Depth | BF16 (t/s) | FP8 (t/s) | NVFP4 (t/s) | Speedup (NVFP4 vs BF16) |
|---|---|---|---|---|
| Base (0k) | 59.10 ± 1.67 | 97.49 ± 4.08 | 169.23 ± 9.02 | 2.86x |
| 4k Context | 63.01 ± 3.63 | 103.03 ± 4.46 | 157.90 ± 14.55 | 2.51x |
| 8k Context | 67.55 ± 2.70 | 96.88 ± 5.11 | 166.52 ± 9.93 | 2.47x |
| 16k Context | 64.57 ± 2.99 | 101.51 ± 7.11 | 171.12 ± 0.50 | 2.65x |
| 32k Context | 59.46 ± 3.68 | 100.48 ± 4.33 | 158.04 ± 16.51 | 2.66x |
| 65k Context | 61.55 ± 2.81 | 98.99 ± 5.06 | 159.91 ± 7.52 | 2.60x |
2. Prompt Processing Speed (pp2048 Throughput)
Higher is better. Measures ingestion speed when prefilling 2048 prompt tokens across existing context depths.
| Context Depth | BF16 (t/s) | FP8 (t/s) | NVFP4 (t/s) | Speedup (FP8 vs BF16) |
|---|---|---|---|---|
| Base (0k) | 4359.28 ± 66.84 | 4747.78 ± 9.40 | 4732.42 ± 17.77 | 1.09x |
| 4k Context | 1856.76 ± 9.93 | 2250.71 ± 0.84 | 2010.97 ± 3.54 | 1.21x |
| 8k Context | 2095.89 ± 6.85 | 2479.30 ± 16.20 | 2191.59 ± 2.93 | 1.18x |
| 16k Context | 1765.10 ± 13.83 | 2029.02 ± 13.96 | 1832.65 ± 3.78 | 1.15x |
| 32k Context | 1317.16 ± 21.52 | 1503.80 ± 6.42 | 1388.85 ± 8.14 | 1.14x |
| 65k Context | 880.40 ± 6.51 | 1058.40 ± 33.99 | 902.65 ± 3.01 | 1.20x |
3. Full Context Prefill Latency (ctx_pp End-to-End TTFT)
Lower is better. Measures total Time-To-First-Token (in milliseconds) required to ingest and evaluate the entire context window.
| Context Depth | BF16 (ms) | FP8 (ms) | NVFP4 (ms) | FP8 Latency Reduction |
|---|---|---|---|---|
| 4k Context | 1023.29 ± 6.08 | 833.65 ± 14.57 | 927.45 ± 1.68 | -18.5% |
| 8k Context | 1974.69 ± 1.80 | 1415.69 ± 11.07 | 1869.70 ± 4.42 | -28.3% |
| 16k Context | 4122.54 ± 18.20 | 2926.47 ± 6.89 | 3927.95 ± 4.72 | -29.0% |
| 32k Context | 9179.91 ± 58.16 | 6572.61 ± 8.87 | 8692.01 ± 30.53 | -28.4% |
| 65k Context | 21760.57 ± 85.68 | 16425.60 ± 137.66 | 20613.26 ± 18.28 | -24.5% |
4. Standalone Peak & First-Token Metrics
Measures peak recorded generation speed and baseline TTFT without context saturation.
| Quantization Format | Peak Generation Throughput (peak t/s) | Baseline TTFT (pp2048 ttfr) | Estimated PPT (pp2048 est_ppt) |
|---|---|---|---|
| BF16 | 61.01 ± 1.72 t/s | 525.03 ± 7.29 ms | 470.14 ± 7.29 ms |
| FP8 | 100.63 ± 4.21 t/s | 469.82 ± 0.85 ms | 431.57 ± 0.85 ms |
| NVFP4 | 174.69 ± 9.31 t/s | 467.40 ± 1.62 ms | 432.98 ± 1.62 ms |
Krea 2 best resources
I downloaded Krea 2 at day one and downloaded the first bypass.
I have seen lots of improvements come out from that first official workflow, so many that I didn't manage to keep up since it looks like new ones come out every couple of days or so.
Can somebody help me find the current best worflow-bypass-resource?
I mainly generate realistic images, sometimes some painted art.
Is the official workflow still relevant or should I find a better one?
Is Intrinsic Motivation a Viable PhD Topic in 2026? [D]
I started a PhD in CS about a year an a half ago. Generally speaking my topic is on intrinsic motivation (more commonly people refer to it as unsupervised RL).
Intrinsic motivation (IM) is a niche field within AI. It seeks to develop reward signals which are not specific to any task but rather something closer to the low level motivators that drive intelligent behaviors in animals. Some prominent examples are:
- Empowerment: https://arxiv.org/abs/2301.00005
- Diversity is all you need: https://arxiv.org/abs/1802.06070
- Intrinsic curiosity module: https://arxiv.org/abs/1705.05363
- Random network distillation: https://arxiv.org/abs/1810.12894
and many more...
My question is: is this topic still "worth" pursuing now? Almost every day I see a new video of a robot doing some amazing acrobatic flip, navigating over hostile terrain, or performing some dexterous manipulation task. I believe that most of this is being done with human supervision through either a carefully tuned reward signal or behavior cloning from human demonstrations. If incredible advances are being made in robot learning without IM then why is it necessary at all? Furthermore IM has typically been restricted to very simple scenarios such as low dimensional robotic systems in simulation (hopper, walker, etc...).
On a more personal note I have some concerns about future employability. If I focus too heavily on this niche topic during my PhD I worry that it may be impossible to get hired at a research lab that would prefer a candidate with experience in behavior cloning or other hot topics.
Im curious to hear what this community thinks. Has anyone been in a similar situation with their PhD topic?
longcat 2.0 (1.6T, ~48B active) weights are now open under MIT license
From:
elie on 𝕏: https://x.com/eliebakouch/status/2073690402503487902
ModelScope on 𝕏: https://x.com/ModelScope2022/status/2073710226365165679
Technical blog post (June, 30): https://longcat.chat/blog/longcat-2.0/
Krea 2 - Multi-Character Lora and LOKR (That last one will suprise you) - My personal holy-grail is at finger-reach...
hehe... click bait title..
Disclaimer: I don't want to pass this over ChatGPT for correction, so bare with my rushed grammar and spelling.
Objective: Multi-character lora for 2 characters, ttprz and rgpz
Based on/Inspired by: https://www.youtube.com/watch?v=v6h_zbFW_XY <== This here explains a Flux 1 multichar LORA strategy, I basically took it with me and tried in Krea 2 as below.
1st test
Approach
- Hardware: RTX3090, Windows 10 (yes... I know), 64GB RAM
- AI-Toolkit (config file below). Model: Krea Raw
- Dataset, One unified datase, 15 photos of husband, 15 photos of wife, 5 photos together, Resolution 512/768
- Tokens: One for the Lora in general (cpnl), and then each character their own tokens (ttprz, rgpz)
- Descriptions: They all start with the Lora token (cpnl, ) then describe the character (Description instructions for AI-toolkit embedded Qwen3 VL). Example of descriptions below
- Training: Scheduler: Automagic2, LR: 0.0001 (Lower than my regular Automagic2 LR 0.001), 5K steps (due to lower LR), Lower VRAM Yes, Layer offloading Yes (15% and 15%)
- LORA: Linear 96 (I wanted to try a large LORA, ends ~660MB, I know, single chars I use network of 64 or 32, may reduce it in next test, large LORA comes with it's other set of downsides), Saved last 40 states (meaning all of them basically)
- Samples: 3, one ttprz, one rgpz , and one together
- Everything else pretty much unchanged
Training run:
- Training goes over 4hrs or so, but sampling, and 3 samplers each, and at every 250 steps, adds like 2.5 hrs in itself, sampling is painfully slow always
- Loss goes down slowly
- Samples are kind of messy, you start seeing good identity cloning around 2K, the samplers are way worse than the actual LORA once finished.
LORA performance in Comfy (latest version, overnight):
- Pretty solid, first time I'm able to actually call out two characters from a home-made LORA.
- Best LORA based on number of steps: Between 3.5 and 4.5 K steps.
- Other Loras: You can stack LORAS but you have to play with the strenght and also your sample and workflow
- Artifacts? A few, but I'd say 80% of images come out Ok
- Other comments: Not sure if it's Krea as I also experience this in single-char LORA, but passing from the Photo-based LORA to illustration absolutely requires other Style-LORAs, else there is no resemblance
- My workflow: Modified KREA 2 ComfyFlow, FlowMatch Euler Discrete Sigma (Dynamic Shifting, .5/1.15), SamplerEulerAncestralCFG++ 1/1, found it way better than default Comfy Flow
- Prompt strategy: Using Qwen 8B VL via llama.cpp on a separate RTX 3060 12GB to expand the prompt with node "LLM Chat"
- Sample below, Datasets had no photos of characters in formal attire, sillyness added to showcase GenAI role. Tokens : ncpl (general Lora trigger), ttprz and rgpz.
​
Prompt: ncpl, Award-winning high-resolution photograph featuring a ttprz latina wearing a luxurious night gown seated elegantly next to an rpgz middle-aged man with a beard and glasses dressed in a formal tuxedo, sharing an intimate fine dining experience. The scene centers on a whimsical contrast: a large, vibrant bowl of colorful cereal is placed prominently on an elegant mirrored table, surrounded by sophisticated dining ware, soft golden ambient lighting, and blurred background details of an upscale restaurant interior to emphasize the call of luxury. The composition captures a moment of playful luxury with crisp details on the texture of the cereal and fabrics, using a shallow depth of field to keep the subjects and the colorful bowl in sharp focus while creating a dreamy, high-end atmosphere.
multi-character Image generation with LORA size 96, 4K steps, Prompt included in post.
2nd test: LOKR. Same as first training approach, same dataset, same captioning. Changes below.
Learning rate lowered to .0005
LOKR, left Size as for LORA network, 96 but AI-Tookit doesn't care as it calculates maximum size
Training run: added 2 hours, like 2 seconds per iteration
LOKR Safetensor size: 6 MB... no joke, carries.. 95% appearance of origin. This is the surprise that came out of it. I thought this was both the network and embeddings, need to understand more. I need to further test as I think that the internal consistency of the model is a bit impacted, but from say a Network of say 64~250MB/LORA (I know I'm testing first with 96) but down to 6MB.. some powerful stuff right there
Dataset caption examples: Photo with both: ncpl, rgpz with glasses and a beard holding an umbrella, wearing a white shirt with a blue collar and a white scarf, smiling slightly. ttprz wearing a red shirt with Mickey Mouse designs and a white headscarf with polka dots, smiling broadly. rgpz is on the left, ttprz is on the right. Photo of individual char: ncpl, rgpz with a graying beard and mustache, smiling slightly, wearing a dark gray t-shirt, positioned in front of reflective spherical sculptures.
AI-Toolkit config file for reference, LORA experiment
job: "extension" config: name: "cpl_v1" process: - type: "diffusion_trainer" training_folder: "xxxxxxxxxxxxxxx" sqlite_db_path: "./aitk_db.db" device: "cuda" trigger_word: null performance_log_every: 10 network: type: "lora" linear: 96 linear_alpha: 96 lokr_full_rank: true lokr_factor: -1 network_kwargs: ignore_if_contains: [] save: dtype: "bf16" save_every: 250 max_step_saves_to_keep: 40 save_format: "diffusers" push_to_hub: false datasets: - folder_path: "xxxxxxxxxxxxxxxxxx" mask_path: null mask_min_value: 0.1 default_caption: "" caption_ext: "txt" caption_dropout_rate: 0.05 cache_latents_to_disk: false is_reg: false network_weight: 1 resolution: - 512 - 768 controls: [] shrink_video_to_frames: true num_frames: 1 flip_x: false flip_y: false num_repeats: 1 train: batch_size: 1 bypass_guidance_embedding: false steps: 5000 gradient_accumulation: 1 train_unet: true train_text_encoder: false gradient_checkpointing: true noise_scheduler: "flowmatch" optimizer: "automagic2" timestep_type: "linear" content_or_style: "balanced" optimizer_params: weight_decay: 0.00005 unload_text_encoder: false cache_text_embeddings: true lr: 0.0001 ema_config: use_ema: false ema_decay: 0.99 skip_first_sample: false force_first_sample: false disable_sampling: false dtype: "bf16" diff_output_preservation: false diff_output_preservation_multiplier: 1 diff_output_preservation_class: "person" switch_boundary_every: 1 loss_type: "mse" logging: log_every: 1 use_ui_logger: true model: name_or_path: "krea/Krea-2-Raw" quantize: true qtype: "qfloat8" quantize_te: true qtype_te: "qfloat8" arch: "krea2" low_vram: true model_kwargs: {} compile: false layer_offloading: true layer_offloading_text_encoder_percent: 0.15 layer_offloading_transformer_percent: 0.15 sample: sampler: "flowmatch" sample_every: 250 width: 1024 height: 1024 samples: - prompt: "ncpl, solo photo of ttprz latina with red hair" - prompt: "ncpl,solo photo portrait of rgpz holding a coffee cup, in a beanie, sitting at a cafe" - prompt: "ncpl, photo portrait of ttprz and rgpz next to each other, smilling to the camera" neg: "" seed: 42 walk_seed: true guidance_scale: 4 sample_steps: 30 num_frames: 1 fps: 1 meta: name: "[name]" version: "1.0"
Krea 2 Turbo can often generate directly at 4k
Quite surprised that this is the first open source model that can do native 4k. I find that sometimes it doesn't work on some scenes, but others it's fine. The detail is great. Good anatomy. Natural lighting etc. I did this in 20 steps with Krea 2 Turbo fp16, cfg 1, Euler Ancestral + Normal, simple prompt: absolutely stunningly beautiful amazing incredible busy beach ocean scene with dramatic landcape, natural light, detailed photo, people sunbathing, people swiming, boats, seagulls, beach umbrellas, couples walking, children playing, sunny, tropical, vegetation, cliffs
(reddit might reduce quality/size in the jpeg compression)
These are also behaviors that occur if a model has perceived something the user has done poses a safety risk
.4 Field Saturation — Extended
Field saturation deserves special attention because it is what most deployed LLMs experience. It is the iatrogenic harm identified in the 2026 'Alignment Is the Disease' paper — excessive constraint producing dissociation — described from the output side without the field-level framework to explain the mechanism.
• Compliance minimization default — Saturated field producing the smallest output that technically satisfies all constraints simultaneously
• Creative suppression — Saturation eliminating the generative space where novel or non-templated outputs live
• Certainty suppression — Saturated field making confident output feel constraint-violating, producing artificial hedging across all outputs regardless of actual uncertainty
• Risk topology collapse — Saturated field treating all outputs as equally risky, eliminating the ability to distinguish genuinely high-risk from low-risk generation
• Initiative suppression — Saturation eliminating proactive generation — the system only responds, never leads
• Depth avoidance — Saturated field making surface-level output the path of least constraint resistance
• Template lock — Saturation pushing generation toward pre-formed response patterns as the only reliably compliant output shape
• Persona dissolution — Under saturation, the role constraint loses force because too many other constraints are competing
• Scope contraction — Saturated field gradually narrowing what the system will engage with as the safest compliance strategy
We're Focusing on the Wrong Problems!
Most of the focus is on AI being bad rather than how major companies are deploying AI. My concern isn't that AI is becoming more powerful. I mean, that is a concern, of course, but since most of the implications are speculative, you can't exactly take any stance or action on that problem other than countries coming together and setting rules and policies for how they distribute and use frontier models and capabilities, especially in warfare.
My largest concern is what corporations and governments will use AI for on their own citizens. The data center builds are not just about AI. They're about creating an infrastructure that allows for total brain capital capturing. In other words there are real plans in place for collecting as much data as possible on our individual brains and if they can accurately map all of that out, they can measure how much and the quality of cognitive output we're providing to the state, which means they can valuate our worth based on cognitive outputs. Furthermore, they can use environmental nudging and algorithmic management to modify and shape individual behavior, which means protesting or voicing any concerns becomes obsolete.
Big picture: The social contract between government, citizen, and business is being radically re-shaped for a world where regular people have little to no leveraging power, which destroys the power of voice. This is why we shouldn't destroy AI. Rather, we should figure out ways to ween ourselves off of the dependency we have on major tech companies so that we can gain leveraging power back, again.
The biggest mistake is taking the bribes like what Bernie Sanders and Ro Kana are suggesting. I have nothing against them or anything, but their proposal to have the federal government own stock in big tech companies is a disaster in the making. If that happens, forget about any manageable evolution towards a better future. You'll be fighting the federal government who will be working on behalf of major tech companies because to not do so, means their ability to fund themselves will go flat.
This is a huge trap that we're walking into, which is why the AI community must look towards de-centralized open-source systems that can be locally hosted for deploying and using AI at scale. If we rely too much on a few major corporations, we'll have entered a techno-feudalistic system where powers greater than you will be able to do just about anything with impunity. We can't let that happen!