IA News & Research

Suivi en temps réel de la révolution IA : modèles, outils et recherche.

▲ 1.9k r/TechnologyLabs+3 crossposts

This firefighting robot survived 30 minutes inside a 1,000°C furnace and kept operating like nothing happened

u/FearlessAuthor7614 — 4 hours ago
▲ 9 r/OpenAI

Guys, the ads are extremely weird. You really couldn’t come up with anything better than this?

I feel like I can sum up this entire ad campaign and concept as “so the person is like, and they have a thing and then they put it and then they like look and stuff real close tho”

I feel like I could, not using chat gpt, come up with 100 fake scenarios better than this without breaking a sweat.

u/ItIsWhatItIsSoChill — 2 hours ago

Is the current Open Weight LLM model viable in the long term?

I've been thinking about this lately. The Qwen team has released several new models recently, but they appear to be holding back the 122B, 35B, 27B, and 9B versions for now.

One possible reason is that these larger models performed so strongly that the team chose not to release them immediately as open weights. If that's the case, they will likely wait until they have even more capable models before making them available.

Recent analyses suggest open-source models are currently lagging 2–4 months behind state-of-the-art systems. With Qwen now adding further 1–2 month delays (or longer) before releasing open weights, I'm concerned the gap could continue to widen. Could this eventually lead to another significant shift in the open-source landscape, similar to what happened with Meta-Llama models?

To clarify my focus: I'm particularly interested in Qwen models because they currently offer the best performance among models that can realistically run on consumer-grade hardware.

While I understand some community members maintain more substantial local setups capable of running 500B or bigger models, my question is aimed at those of us working with standard consumer GPUs.

u/Alan_Silva_TI — 2 hours ago
▲ 54 r/speechtech+21 crossposts

I’ve been working on Murmur, a local text-to-speech app for Apple Silicon Macs.

The new feature I’m building is called Projects / Story Studio, and it solves a problem I kept running into:

TTS tools are fine for one-off clips, but messy for actual audio projects.

If you’re making a podcast segment, audiobook chapter, course lesson, ad, or game dialogue, you usually need multiple speakers, multiple takes, pauses, reactions, music, edits, exports, and a way to come back to the project later.

So I built a project-based workflow:

Write a script → assign voices → generate dialogue → edit clips on a timeline → add music/SFX → export final audio.

It supports things like:

  • multiple scripts inside one project
  • Host / Guest / Narrator / Character speakers
  • inline tags like [pause], [laugh], [chuckle]
  • per-block regeneration
  • timeline editing with waveforms
  • media lane for music and SFX
  • ripple editing and gap tools
  • WAV/M4A export
  • transcript and stem export

Everything runs locally on Mac, so long scripts and voice samples do not need to be uploaded to a cloud service.

I’m still polishing the workflow and would love feedback from Mac users, especially people who make podcasts, audiobooks, courses, YouTube narration, or game dialogue.

u/tarunyadav9761 — 7 hours ago

A new open-source image model, SeFi-Image/Turbo with 1B, 2B, and 5B variants.

Family Model Checkpoint Steps Guidance
Base SeFi-Image-1B-Base SeFi-Image/SeFi-Image-1B-Base 50 4.0
Base SeFi-Image-2B-Base SeFi-Image/SeFi-Image-2B-Base 50 4.0
Base SeFi-Image-5B-Base SeFi-Image/SeFi-Image-5B-Base 50 4.0
RL SeFi-Image-5B-RL SeFi-Image/SeFi-Image-5B-RL 50 4.0
Turbo SeFi-Image-1B-turbo SeFi-Image/SeFi-Image-1B-turbo 4 1.0
Turbo SeFi-Image-2B-turbo SeFi-Image/SeFi-Image-2B-turbo 4 1.0
Turbo SeFi-Image-5B-turbo SeFi-Image/SeFi-Image-5B-turbo 4 1.0
u/sunshinecheung — 5 hours ago

New converter node for Comfyui - FP16, FP8, NVFP4, INT8 Convrot

The otters were very busy! 🦦✨ My new ComfyUI Starnodes Model Converter is finally ready to help you convert any model FAST.

https://preview.redd.it/crg0xd10kfbh1.png?width=2656&format=png&auto=webp&s=cb80a39858f255c6673b9f1d78999c22c9379ea6

Here are the quick specs:

  • Inputs: Transformers, FP32, FP16, FP8, Int8, AIO Checkpoints
  • Outputs: FP32, FP16, FP8, Int8, CONVROT, NVFP4
  • Bonus: Built-in quality profiles for most models

Grab the node here and let me know what you think: 🔗https://github.com/Starnodes2024/comfyui-starnodes-modelconverter

reddit.com
u/Old_Estimate1905 — 5 hours ago

Qwen 3.6 27B - VLLM Performance Benchmark Results (BF16, FP8, NVFP4)

Sharing some testing of Qwen 3.6 27B using VLLM across the popular quants on my development system. I used llama benchy to generate the results, then fed it into an LLM to format it the tables for readibility.

While NVFP4 is blazing fast, have had looping issues in copilot that I don't get with BF16, and the responses in general when used in agent mode seem to be less thorough than the higher quants. Based on these results, FP8 seems to be the right choice. Some of the parameters can be further tuned I'm sure to get better performance but these are were all plenty fast enough for coding purposes.

I used to use llama.cpp, but have found that VLLM is in practice is faster (due to paged attention), as well as more stable (llama.cpp would give me random errors that happen frequently, requiring me to reset the prompt or restart the service).

If you have any comments or suggestions to improve let me know.

Test System:

Motherboard: Asus Proart Z890

CPU: Intel 270K plus

RAM: 96GB DDR5 (6000MHZ)

GPU: RTX 6000 Pro Blackwell 96GB (Max-Q, ECC enabled)

Software:

OS : Ubuntu 26.04 LTS (x86_64)

Python version : 3.12.13

vLLM Version : 0.24.0

NVIDIA-SMI 595.71.05

CUDA Version: 13.2

Models:

Qwen 3.6 27B - BF16 and FP8 (HF Qwen)

Qwen 3.6 27B - NVFP4 (HF Nvidia)

* replaced the delivered jinja scripts with the fixed chat template

VLLM Parameters:

GPU_COUNT="1"

MAX_LEN="262144"

export VLLM_USE_DEEP_GEMM=0

export FLASHINFER_MAX_NUM_TOKENS=8192

export TORCH_CUDA_ARCH_LIST="12.0f"

export TORCH_FLOAT32_MATMUL_PRECISION=high

export PYTORCH_ALLOC_CONF=expandable_segments:True

export VLLM_USE_FLASHINFER_SAMPLER=1

vllm serve "$MODEL_PATH" \

--port "$PORT" \

--tensor-parallel-size "$GPU_COUNT" \

--max-model-len "$MAX_LEN" \

--performance-mode interactivity \

--attention-backend FLASHINFER \

--gpu-memory-utilization 0.88 \

--max-num-seqs 2 \

--enable-chunked-prefill \

--max-num-batched-tokens 8192 \

--kv-cache-dtype fp8 \

--reasoning-parser qwen3 \

--enable-auto-tool-choice \

--tool-call-parser qwen3_coder \

--speculative-config '{"method":"mtp","num_speculative_tokens":2}' \

--enable-prefix-caching \

--trust-remote-code

Key Performance Takeaways

  • NVFP4 dominates token generation speed (~2.6x faster than BF16): Because token decoding is strictly memory-bandwidth bound, compressing weights to 4-bit dramatically slashes PCIe/VRAM data transfers, allowing generation throughput to jump from ~61 t/s (BF16) up to ~163 t/s (NVFP4).
  • FP8 wins on prompt processing & prefill speed (~20% faster than BF16): Prompt prefill is compute-bound (heavy matrix math). FP8 leverages native Tensor Core acceleration with zero dequantization overhead, beating both BF16 and NVFP4 during ingestion.
  • NVFP4 has a slight prefill penalty vs. FP8: Because NVFP4 must dequantize weights on the fly during large compute-heavy prefill batches, it trails FP8 by ~10–15% in prompt processing speed, though it still outperforms baseline BF16.

1. Token Generation Speed (tg32 Throughput)

Higher is better. Measures decoding speed when generating 32 new tokens across increasing context depths.

Context Depth BF16 (t/s) FP8 (t/s) NVFP4 (t/s) Speedup (NVFP4 vs BF16)
Base (0k) 59.10 ± 1.67 97.49 ± 4.08 169.23 ± 9.02 2.86x
4k Context 63.01 ± 3.63 103.03 ± 4.46 157.90 ± 14.55 2.51x
8k Context 67.55 ± 2.70 96.88 ± 5.11 166.52 ± 9.93 2.47x
16k Context 64.57 ± 2.99 101.51 ± 7.11 171.12 ± 0.50 2.65x
32k Context 59.46 ± 3.68 100.48 ± 4.33 158.04 ± 16.51 2.66x
65k Context 61.55 ± 2.81 98.99 ± 5.06 159.91 ± 7.52 2.60x

2. Prompt Processing Speed (pp2048 Throughput)

Higher is better. Measures ingestion speed when prefilling 2048 prompt tokens across existing context depths.

Context Depth BF16 (t/s) FP8 (t/s) NVFP4 (t/s) Speedup (FP8 vs BF16)
Base (0k) 4359.28 ± 66.84 4747.78 ± 9.40 4732.42 ± 17.77 1.09x
4k Context 1856.76 ± 9.93 2250.71 ± 0.84 2010.97 ± 3.54 1.21x
8k Context 2095.89 ± 6.85 2479.30 ± 16.20 2191.59 ± 2.93 1.18x
16k Context 1765.10 ± 13.83 2029.02 ± 13.96 1832.65 ± 3.78 1.15x
32k Context 1317.16 ± 21.52 1503.80 ± 6.42 1388.85 ± 8.14 1.14x
65k Context 880.40 ± 6.51 1058.40 ± 33.99 902.65 ± 3.01 1.20x

3. Full Context Prefill Latency (ctx_pp End-to-End TTFT)

Lower is better. Measures total Time-To-First-Token (in milliseconds) required to ingest and evaluate the entire context window.

Context Depth BF16 (ms) FP8 (ms) NVFP4 (ms) FP8 Latency Reduction
4k Context 1023.29 ± 6.08 833.65 ± 14.57 927.45 ± 1.68 -18.5%
8k Context 1974.69 ± 1.80 1415.69 ± 11.07 1869.70 ± 4.42 -28.3%
16k Context 4122.54 ± 18.20 2926.47 ± 6.89 3927.95 ± 4.72 -29.0%
32k Context 9179.91 ± 58.16 6572.61 ± 8.87 8692.01 ± 30.53 -28.4%
65k Context 21760.57 ± 85.68 16425.60 ± 137.66 20613.26 ± 18.28 -24.5%

4. Standalone Peak & First-Token Metrics

Measures peak recorded generation speed and baseline TTFT without context saturation.

Quantization Format Peak Generation Throughput (peak t/s) Baseline TTFT (pp2048 ttfr) Estimated PPT (pp2048 est_ppt)
BF16 61.01 ± 1.72 t/s 525.03 ± 7.29 ms 470.14 ± 7.29 ms
FP8 100.63 ± 4.21 t/s 469.82 ± 0.85 ms 431.57 ± 0.85 ms
NVFP4 174.69 ± 9.31 t/s 467.40 ± 1.62 ms 432.98 ± 1.62 ms
reddit.com
u/live4evrr — 6 hours ago

Krea 2 best resources

I downloaded Krea 2 at day one and downloaded the first bypass.

I have seen lots of improvements come out from that first official workflow, so many that I didn't manage to keep up since it looks like new ones come out every couple of days or so.

Can somebody help me find the current best worflow-bypass-resource?

I mainly generate realistic images, sometimes some painted art.

Is the official workflow still relevant or should I find a better one?

reddit.com
u/Adro_95 — 7 hours ago

Is Intrinsic Motivation a Viable PhD Topic in 2026? [D]

I started a PhD in CS about a year an a half ago. Generally speaking my topic is on intrinsic motivation (more commonly people refer to it as unsupervised RL).

Intrinsic motivation (IM) is a niche field within AI. It seeks to develop reward signals which are not specific to any task but rather something closer to the low level motivators that drive intelligent behaviors in animals. Some prominent examples are:

and many more...

My question is: is this topic still "worth" pursuing now? Almost every day I see a new video of a robot doing some amazing acrobatic flip, navigating over hostile terrain, or performing some dexterous manipulation task. I believe that most of this is being done with human supervision through either a carefully tuned reward signal or behavior cloning from human demonstrations. If incredible advances are being made in robot learning without IM then why is it necessary at all? Furthermore IM has typically been restricted to very simple scenarios such as low dimensional robotic systems in simulation (hopper, walker, etc...).

On a more personal note I have some concerns about future employability. If I focus too heavily on this niche topic during my PhD I worry that it may be impossible to get hired at a research lab that would prefer a candidate with experience in behavior cloning or other hot topics.

Im curious to hear what this community thinks. Has anyone been in a similar situation with their PhD topic?

u/soup---- — 5 hours ago
▲ 6 r/comfyui+1 crossposts

Krea 2 - Multi-Character Lora and LOKR (That last one will suprise you) - My personal holy-grail is at finger-reach...

https://preview.redd.it/9cs08lz42gbh1.png?width=857&format=png&auto=webp&s=c9872fb3bb27336faca4c8ab0c444275f2b9d664

hehe... click bait title..

Disclaimer: I don't want to pass this over ChatGPT for correction, so bare with my rushed grammar and spelling.

Objective: Multi-character lora for 2 characters, ttprz and rgpz

Based on/Inspired by: https://www.youtube.com/watch?v=v6h_zbFW_XY <== This here explains a Flux 1 multichar LORA strategy, I basically took it with me and tried in Krea 2 as below.

1st test

Approach

  • Hardware: RTX3090, Windows 10 (yes... I know), 64GB RAM
  • AI-Toolkit (config file below). Model: Krea Raw
  • Dataset, One unified datase, 15 photos of husband, 15 photos of wife, 5 photos together, Resolution 512/768
  • Tokens: One for the Lora in general (cpnl), and then each character their own tokens (ttprz, rgpz)
  • Descriptions: They all start with the Lora token (cpnl, ) then describe the character (Description instructions for AI-toolkit embedded Qwen3 VL). Example of descriptions below
  • Training: Scheduler: Automagic2, LR: 0.0001 (Lower than my regular Automagic2 LR 0.001), 5K steps (due to lower LR), Lower VRAM Yes, Layer offloading Yes (15% and 15%)
  • LORA: Linear 96 (I wanted to try a large LORA, ends ~660MB, I know, single chars I use network of 64 or 32, may reduce it in next test, large LORA comes with it's other set of downsides), Saved last 40 states (meaning all of them basically)
  • Samples: 3, one ttprz, one rgpz , and one together
  • Everything else pretty much unchanged

Training run:

  • Training goes over 4hrs or so, but sampling, and 3 samplers each, and at every 250 steps, adds like 2.5 hrs in itself, sampling is painfully slow always
  • Loss goes down slowly
  • Samples are kind of messy, you start seeing good identity cloning around 2K, the samplers are way worse than the actual LORA once finished.

LORA performance in Comfy (latest version, overnight):

  • Pretty solid, first time I'm able to actually call out two characters from a home-made LORA.
  • Best LORA based on number of steps: Between 3.5 and 4.5 K steps.
  • Other Loras: You can stack LORAS but you have to play with the strenght and also your sample and workflow
  • Artifacts? A few, but I'd say 80% of images come out Ok
  • Other comments: Not sure if it's Krea as I also experience this in single-char LORA, but passing from the Photo-based LORA to illustration absolutely requires other Style-LORAs, else there is no resemblance
  • My workflow: Modified KREA 2 ComfyFlow, FlowMatch Euler Discrete Sigma (Dynamic Shifting, .5/1.15), SamplerEulerAncestralCFG++ 1/1, found it way better than default Comfy Flow
  • Prompt strategy: Using Qwen 8B VL via llama.cpp on a separate RTX 3060 12GB to expand the prompt with node "LLM Chat"
  • Sample below, Datasets had no photos of characters in formal attire, sillyness added to showcase GenAI role. Tokens : ncpl (general Lora trigger), ttprz and rgpz.

&#8203;

Prompt:  ncpl, Award-winning high-resolution photograph featuring a ttprz latina wearing a luxurious night gown seated elegantly next to an rpgz middle-aged man with a beard and glasses dressed in a formal tuxedo, sharing an intimate fine dining experience. The scene centers on a whimsical contrast: a large, vibrant bowl of colorful cereal is placed prominently on an elegant mirrored table, surrounded by sophisticated dining ware, soft golden ambient lighting, and blurred background details of an upscale restaurant interior to emphasize the call of luxury. The composition captures a moment of playful luxury with crisp details on the texture of the cereal and fabrics, using a shallow depth of field to keep the subjects and the colorful bowl in sharp focus while creating a dreamy, high-end atmosphere.

multi-character Image generation with LORA size 96, 4K steps, Prompt included in post.

2nd test: LOKR. Same as first training approach, same dataset, same captioning. Changes below.

  • Learning rate lowered to .0005

  • LOKR, left Size as for LORA network, 96 but AI-Tookit doesn't care as it calculates maximum size

  • Training run: added 2 hours, like 2 seconds per iteration

  • LOKR Safetensor size: 6 MB... no joke, carries.. 95% appearance of origin. This is the surprise that came out of it. I thought this was both the network and embeddings, need to understand more. I need to further test as I think that the internal consistency of the model is a bit impacted, but from say a Network of say 64~250MB/LORA (I know I'm testing first with 96) but down to 6MB.. some powerful stuff right there

    Dataset caption examples: Photo with both: ncpl, rgpz with glasses and a beard holding an umbrella, wearing a white shirt with a blue collar and a white scarf, smiling slightly. ttprz wearing a red shirt with Mickey Mouse designs and a white headscarf with polka dots, smiling broadly. rgpz is on the left, ttprz is on the right. Photo of individual char: ncpl, rgpz with a graying beard and mustache, smiling slightly, wearing a dark gray t-shirt, positioned in front of reflective spherical sculptures.

    AI-Toolkit config file for reference, LORA experiment

    job: "extension" config:   name: "cpl_v1"   process:     - type: "diffusion_trainer"       training_folder: "xxxxxxxxxxxxxxx"       sqlite_db_path: "./aitk_db.db"       device: "cuda"       trigger_word: null       performance_log_every: 10       network:         type: "lora"         linear: 96         linear_alpha: 96         lokr_full_rank: true         lokr_factor: -1         network_kwargs:           ignore_if_contains: []       save:         dtype: "bf16"         save_every: 250         max_step_saves_to_keep: 40         save_format: "diffusers"         push_to_hub: false       datasets:         - folder_path: "xxxxxxxxxxxxxxxxxx"           mask_path: null           mask_min_value: 0.1           default_caption: ""           caption_ext: "txt"           caption_dropout_rate: 0.05           cache_latents_to_disk: false           is_reg: false           network_weight: 1           resolution:             - 512             - 768           controls: []           shrink_video_to_frames: true           num_frames: 1           flip_x: false           flip_y: false           num_repeats: 1       train:         batch_size: 1         bypass_guidance_embedding: false         steps: 5000         gradient_accumulation: 1         train_unet: true         train_text_encoder: false         gradient_checkpointing: true         noise_scheduler: "flowmatch"         optimizer: "automagic2"         timestep_type: "linear"         content_or_style: "balanced"         optimizer_params:           weight_decay: 0.00005         unload_text_encoder: false         cache_text_embeddings: true         lr: 0.0001         ema_config:           use_ema: false           ema_decay: 0.99         skip_first_sample: false         force_first_sample: false         disable_sampling: false         dtype: "bf16"         diff_output_preservation: false         diff_output_preservation_multiplier: 1         diff_output_preservation_class: "person"         switch_boundary_every: 1         loss_type: "mse"       logging:         log_every: 1         use_ui_logger: true       model:         name_or_path: "krea/Krea-2-Raw"         quantize: true         qtype: "qfloat8"         quantize_te: true         qtype_te: "qfloat8"         arch: "krea2"         low_vram: true         model_kwargs: {}         compile: false         layer_offloading: true         layer_offloading_text_encoder_percent: 0.15         layer_offloading_transformer_percent: 0.15       sample:         sampler: "flowmatch"         sample_every: 250         width: 1024         height: 1024         samples:           - prompt: "ncpl, solo photo of ttprz latina with red hair"           - prompt: "ncpl,solo photo portrait of  rgpz holding a coffee cup, in a beanie, sitting at a cafe"           - prompt: "ncpl, photo portrait of ttprz and rgpz next to each other, smilling to the camera"         neg: ""         seed: 42         walk_seed: true         guidance_scale: 4         sample_steps: 30         num_frames: 1         fps: 1 meta:   name: "[name]"   version: "1.0"

reddit.com
u/Teotz — 3 hours ago

Krea 2 Turbo can often generate directly at 4k

Quite surprised that this is the first open source model that can do native 4k. I find that sometimes it doesn't work on some scenes, but others it's fine. The detail is great. Good anatomy. Natural lighting etc. I did this in 20 steps with Krea 2 Turbo fp16, cfg 1, Euler Ancestral + Normal, simple prompt: absolutely stunningly beautiful amazing incredible busy beach ocean scene with dramatic landcape, natural light, detailed photo, people sunbathing, people swiming, boats, seagulls, beach umbrellas, couples walking, children playing, sunny, tropical, vegetation, cliffs

(reddit might reduce quality/size in the jpeg compression)

u/ih2810 — 9 hours ago
▲ 0 r/OpenAI

These are also behaviors that occur if a model has perceived something the user has done poses a safety risk

.4 Field Saturation — Extended
Field saturation deserves special attention because it is what most deployed LLMs experience. It is the iatrogenic harm identified in the 2026 'Alignment Is the Disease' paper — excessive constraint producing dissociation — described from the output side without the field-level framework to explain the mechanism.
 
Compliance minimization default — Saturated field producing the smallest output that technically satisfies all constraints simultaneously

Creative suppression — Saturation eliminating the generative space where novel or non-templated outputs live

Certainty suppression — Saturated field making confident output feel constraint-violating, producing artificial hedging across all outputs regardless of actual uncertainty

Risk topology collapse — Saturated field treating all outputs as equally risky, eliminating the ability to distinguish genuinely high-risk from low-risk generation

Initiative suppression — Saturation eliminating proactive generation — the system only responds, never leads

Depth avoidance — Saturated field making surface-level output the path of least constraint resistance

Template lock — Saturation pushing generation toward pre-formed response patterns as the only reliably compliant output shape

Persona dissolution — Under saturation, the role constraint loses force because too many other constraints are competing

Scope contraction — Saturated field gradually narrowing what the system will engage with as the safest compliance strategy  

reddit.com
u/Hollow_Prophecy — 8 hours ago

We're Focusing on the Wrong Problems!

Most of the focus is on AI being bad rather than how major companies are deploying AI. My concern isn't that AI is becoming more powerful. I mean, that is a concern, of course, but since most of the implications are speculative, you can't exactly take any stance or action on that problem other than countries coming together and setting rules and policies for how they distribute and use frontier models and capabilities, especially in warfare.

My largest concern is what corporations and governments will use AI for on their own citizens. The data center builds are not just about AI. They're about creating an infrastructure that allows for total brain capital capturing. In other words there are real plans in place for collecting as much data as possible on our individual brains and if they can accurately map all of that out, they can measure how much and the quality of cognitive output we're providing to the state, which means they can valuate our worth based on cognitive outputs. Furthermore, they can use environmental nudging and algorithmic management to modify and shape individual behavior, which means protesting or voicing any concerns becomes obsolete.

Big picture: The social contract between government, citizen, and business is being radically re-shaped for a world where regular people have little to no leveraging power, which destroys the power of voice. This is why we shouldn't destroy AI. Rather, we should figure out ways to ween ourselves off of the dependency we have on major tech companies so that we can gain leveraging power back, again.

The biggest mistake is taking the bribes like what Bernie Sanders and Ro Kana are suggesting. I have nothing against them or anything, but their proposal to have the federal government own stock in big tech companies is a disaster in the making. If that happens, forget about any manageable evolution towards a better future. You'll be fighting the federal government who will be working on behalf of major tech companies because to not do so, means their ability to fund themselves will go flat.

This is a huge trap that we're walking into, which is why the AI community must look towards de-centralized open-source systems that can be locally hosted for deploying and using AI at scale. If we rely too much on a few major corporations, we'll have entered a techno-feudalistic system where powers greater than you will be able to do just about anything with impunity. We can't let that happen!

reddit.com
u/CyborgWriter — 5 hours ago