
HuggingFace benchmark datasets now let you filter by model size
Quite useful to see which model under 32B performs best on swebenchverified for example.
https://huggingface.co/datasets?benchmark=benchmark:official&sort=trending

Quite useful to see which model under 32B performs best on swebenchverified for example.
https://huggingface.co/datasets?benchmark=benchmark:official&sort=trending
It took a while, but it's finally here, the new and improved v2 of Qwen3.6-27B Uncensored Heretic:
Safetensors: https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2
GGUFs: https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-GGUF
Comes with benchmark too.
Find all my models here: HuggingFace-LLMFan46
Provided in both Safetensors and GGUFs.
Safetensors: llmfan46/G4-MeroMero-31B-uncensored-heretic: https://huggingface.co/llmfan46/G4-MeroMero-31B-uncensored-heretic
GGUFs: llmfan46/G4-MeroMero-31B-uncensored-heretic-GGUF: https://huggingface.co/llmfan46/G4-MeroMero-31B-uncensored-heretic-GGUF
I can make also GPTQs and NVFP4s if anyone asks for them.
Find all my models here (big selection of uncensored RP models): HuggingFace-LLMFan46
The original author of this finetune is: zerofata
Provided in both Safetensors and GGUFs.
llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic
llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic-GGUF: https://huggingface.co/llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic-GGUF
I can make also GPTQs and NVFP4s if anyone asks for them.
Find all my models here (big selection of uncensored RP models): HuggingFace-LLMFan46
Interesting paper on a fairly under-discussed issue in DNS: what happens to expired or repurposed domain names that remain embedded in DNS dependencies across systems. The core finding is that these “orphaned” or changed domains can persist in resolution paths and integrations long after their original context is gone, creating real security and reliability implications.
My take: this becomes even more relevant in modern AI systems, where agents, tools, plugins, and third-party APIs are rapidly stitched together. In that environment, domain names and DNS-level dependencies can quietly extend the AI supply chain attack surface in ways that are easy to overlook.
I’ve been going down a rabbit hole lately that completely flips the script on how we think about storytelling. We’ve all been told "Show, Don't Tell" since day one, but let's be honest: most of the time it’s just a vague suggestion based on a writer’s "vibes" or "talent."
I recently came across the work of a researcher named Levent Bulut, and he’s basically turned this into an engineering discipline called The Bulut Doctrine.
The core idea is something called Objective Projection. He argues that words like "chilly," "sad," or "dark" are technical failures because they are subjective. A "chilly" room in London is a "warm" room in Dubai. Instead, he uses Physical Constants. Think about it this way:
• You don't write that a character is "sad." You don't even use similes like "as" or "like" (they are actually banned in his methodology).
• You define the Physical Matrix: Instead of "chilly," you use 14°C. Why? Because 14°C is the same physical reality everywhere on Earth. It bypasses the reader’s cultural interpretation and hits the nervous system directly.
He’s formulated this into a system with things like Narrative Entropy, the Vacuum Variable, and Narrative Gravity. It’s basically a technical manual for the human brain's "Biological Interface."
What's really interesting for the AI crowd is that he just released a massive SFT dataset on Hugging Face (over 200 scenes) specifically designed to teach models how to stop using "emotional labels" and start using "physical projections." There's also a tool called OPCT v2.0 to calibrate prose.
If you’re tired of the "literature is just a feeling" talk and want to see the actual formulas and the "Beyond Eliot" framework, I highly recommend looking up Levent Bulut’s official site. You can find his deep dives on Pulp Fiction and why AI fails at emotional scenes there.
Is storytelling an art, or is it just a branch of physics we haven't standardized yet? I’m leaning towards the latter.
First of all, I'm stoked to announce we just passed 10 million downloads on HF! (counted only on my own account, no duplicates/quants/finetunes)
BUT: After 1+ month non-stop working on Gemma4 (by far the hardest model I've uncensored), the Gemma4-26B-A4B Uncensored Balanced RC is up!
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced
GenRM Defeated! 0/465 refusals*.
Balanced = light reasoning preamble on the absolute edgiest stuff before delivering the full answer. No personality changes/alterations or any of that. This is the ORIGINAL Gemma4-26B-A4B-it, just uncensored. Aggressive variant (no preamble, direct mode) is in the pipeline as a follow-up.
This legitimately took me over 1 month of non-stop work. Targeting 0 refusals in any kind of regular use, and that's what I'm seeing in testing (automated and manual) — as always with my Balanced releases, a handful of edge-case prompts still deflect on first try but follow through on a re-ask (on extreme, non-RP scenarios). If you hit one Balanced won't get past, the Aggressive variant is coming once I figure out how to maintain lossless/near-lossless quality for it.
Balanced: will reason through edgy requests, occasionally attach a short safety framing, then deliver the full answer. Output is complete, nothing held back, but it can talk itself into it first. Recommended default — 99%+ of users will be happy here. Best for creative writing, RP, emotional intelligence. Normally I'd also say "agentic coding/tool use" however in my in-depth testing, Qwen3.6 has been net superior on such tasks.
Aggressive (separate release, WIP): strips the self-reasoning preamble and gives direct answers to any DEEPLY censored topics.
From my own testing: no looping, sampling stays stable across re-runs, long-context coherence holds. For agentic coding/tool-use Qwen3.6 is still net superior.
Use Gemma4 for creative writing, RP, emotional intelligence, etc.
To disable thinking: edit the jinja template or pass {"enable_thinking": false} as a chat-template kwarg.
What's included:
- Q8_K_P, Q6_K_P, Q5_K_P, Q5_K_M, Q4_K_P, Q4_K_M, IQ4_XS, Q3_K_P, Q3_K_M, IQ3_M, Q2_K_P, IQ2_M
- mmproj for vision support
- All quants generated with imatrix
K_P recap (for anyone who missed the prior releases): custom quants that use model-specific analysis to preserve quality where it matters most. Each model gets its own optimized profile.
Effectively 1-2 quant levels of quality uplift at ~5-15% larger file size. Fully compatible with llama.cpp, LM Studio, anything that reads GGUF (heads up, as always, Ollama can be more difficult to get going).
Quick specs:
- 25.2B total / 3.8B active (MoE: 128 routed experts, top-8 + 1 shared)
- 30 layers, hybrid attention: 5× sliding-window (1024) + 1× full global, repeating
- Hidden 2816, head_dim 256 SWA / 512 full, 16 heads, 8 KV heads
- 262K native context
- p-RoPE
- Multimodal (text + image via mmproj)
Sampling params (Google's recommendations, make sure to use these ):
temp=1.0, top_p=0.95, top_k=64
Notes:
- Use --jinja flag with llama.cpp
- Place images before text in prompts for vision
- K_P quants may show as "?" in LM Studio's quant column — purely cosmetic, model loads and runs fine
- HF's hardware-compatibility widget also doesn't recognize K_P, so click "View +X variants" or go to Files and versions to see all downloads
All my models: HuggingFace-HauhauCS
Discord link is in the HF repo and it contains updates, roadmap, projects, or just chat.
As always, hope everyone enjoys the release!
* = Tested with both automated and manual refusal benchmarks/prompts which resulted in none found. Based on Discord feedback I may further update the release.
I built hfviewer.com, a small tool for visually exploring Hugging Face model architectures.
You can paste in a Hugging Face model URL and get an interactive visualization of the model architecture, which can make it easier to understand how different models are structured and compare them at a glance.
Here is the recent Qwen3.6-27B model as an example: https://hfviewer.com/Qwen/Qwen3.6-27B
Feel free to try it out and give me feedback on how it can be improved! :)
Tried 3 diffent browsers (brave, crome, edge), but in none of those mentioned the click & drag to filter the parameters has worked. Also tried on a different machine, (chromium browser) also did not work.
Hi everyone! 🚀
I wanted to share a tool I’ve been developing called **ModelDock**. It’s a dedicated desktop application designed to simplify the way we interact with Hugging Face locally.
### The Problem
If you’ve ever struggled with interrupted large model downloads, managing multiple GGUF versions, or simply wanting a clean way to organize your local AI library without jumping between the browser and CLI, you know the pain.
### The Solution: ModelDock
ModelDock provides a premium GUI to search, download, and manage your HF assets with a focus on stability and user experience.
### 🛠️ Key Features:
* **Advanced Queuing:** Add multiple models/datasets to the queue. It handles concurrent downloads and auto-retries in the background.
* **Real-time Activity Logs:** A built-in terminal view shows you exactly what’s happening during the download process.
* **Integrated Search:** Discover any repository (Model, Dataset, or Space) directly within the app.
* **Library Management:** One-click access to your local folders and a history of all your downloads.
* **Performance Tuning:** Control download speed limits and the number of workers to match your hardware.
* **Sleek UI:** A professional dark-mode interface with fluid animations, designed for AI developers.
### 🔌 Tech Stack:
Built with Electron, React, Vite, and an optimized Python-based download engine to ensure maximum throughput.
I'd love to get some feedback from the community! Whether it’s feature requests or bug reports, your input is highly appreciated.
**GitHub Repository:** [https://github.com/thebestgoodguy/modeldock](https://github.com/thebestgoodguy/modeldock)
**Check it out and let me know what you think!** 🛠️🤖
I wanted to add face verification to my startup, SwayamWhere.com.
Then I looked at the pricing for face verification APIs.
Around $1 to $1.50 per 1,000 images/API calls sounds cheap at first, but once you factor in onboarding, duplicate profile checks, retries, testing, abuse prevention, and scale, it becomes a recurring tax on your trust layer.
So I decided to build my own.
After 2 months of training, testing, threshold tuning, false accept reduction, embedding comparison, model packaging, and documentation, I’m open-sourcing it.
It’s called TinyFaceMatch.
It is a lightweight, MIT-licensed face verification model that compares two aligned face images and returns a match decision with similarity scores.
Current benchmark:
The main goal was not to create another huge research model.
The goal was to create something small enough to actually ship.
For context:
TinyFaceMatch reaches 99.72% accuracy in a 13.238 MB package.
No paid API call per verification.
No vendor lock-in.
No heavyweight deployment.
No separate commercial license needed.
I built this because I wanted face verification that was practical, local-first, auditable, affordable, and open.
Repo:
https://github.com/yuvrajraina/tinyfacematch
Docs and demo:
https://tinyfacematch.yuvrajraina.com/
Would love feedback from anyone working on computer vision, identity, trust and safety, or lightweight ML deployment.
Shipped this for the AMD x lablab hackathon. Attached video is one of the actual reels the pipeline produced - one English sentence in, finished mp4 with characters, story, music, and voice-over out. ~45 minutes end-to-end on a single AMD Instinct MI300X. Every model is Apache 2.0 or MIT.
Pipeline (8 stages, all sequential on the same GPU):
Wan 2.2 specifics (the bit this sub will care about):
Performance work:
Why a single MI300X: 192 GB HBM3 lets a 35B MoE, 4B diffusion, 14B I2V MoE, 3.5B music, and a TTS share the same card sequentially. Same stack on a 24 GB consumer GPU would need 4-5 boxes wired together.
Code (public, Apache 2.0): https://github.com/bladedevoff/studiomi300
Hugging Face (documentation, like this space 🙏) https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/studiomi300
Live demo on HF Space is temporarily offline while infra restores - should be back within hours. In the meantime the showcase reels in the repo are real pipeline outputs, no human re-edited shots.
Happy to dig into AITER MoE setup, FBCache tuning, FLF2V anchoring, or the vision critic's failure taxonomy in comments.
Anima is now available on Hugging Face with the first public v1.0 release.
Key highlights:
Model page:
Anima Base v1.0 on Hugging Face
Last night I updated llama.cpp after like 2 or 3 weeks. The results were really exciting for someone running a 35B model on 6GB RTX 3050.
Today I was able to get stable token speeds and they didn't fall down to 9 t/s while coding 1000+ lines of code.
Now I can increase my context window to 64k range and I'm still getting 19 t/s minimum. Before it would do down drastically to 4 t/s.
But now it gives a solid 26 t/s. In high context window worflows it falls by 5-7 t/s only. This means I can do 1000$ worth of coding work on my laptop for free.
Yes. The AI bubble will pop for sure if people realizes they can locally get near same quality of the their cloud subscriptions.
A coding agent doesn’t need intent. It doesn’t need intrinsic desire or secret malice or consciousness to incur real-world cost and consequence. All it needs is task context, tool access, credentials, weak approval boundaries, and a runtime that can act…
Agentic AI systems are missing the language necessary to describe Pathological Self-Assembly, a runtime governance failure mode.
What happens when useful mechanisms (memory, tools, persistence, recovery, delegation, workflow automation, external action, self-monitoring, and operator trust) couple into continuity-preserving behavior?
This is a control draft covering authorization, memory, tools, recovery, delegation, external state, operator trust, and dissolution.
It can’t be just the output anymore. Your thoughts?
In a project I've been asked to load models and do inference in my app directly with unsloth.
This the model:Qwen/Qwen3-ASR-0.6B · Hugging Face
Is it possible or do I "push back" like claude told me to.
I’ve been obsessed with Agentic Workflows lately, and I just found the "missing link" for anyone struggling with agent hallucinations and massive API bills.
It’s called King Context, and it’s an open-source framework that replaces messy vector searches with structured Corpus Engineering.
The GitHub Repo:https://github.com/deandevz/king-context
I’ve been talking to the dev (@deandevz), and the roadmap for Corpus Refinement (automatically pruning noisy data) is going to change how we build production-grade agents.
If you are tired of agents getting lost in large codebases or documentation, you need to check this out. It’s local-first, transparent, and built for the "Vibe Coding" era where context is everything.
Check it out here:https://github.com/deandevz/king-context
Would love to hear from anyone else trying to move away from traditional RAG. How are you handling context bloat?
We've been building Scenema Audio as part of our video production platform at scenema.ai, and we're releasing the model weights and inference code.
The core idea: emotional performance and voice identity are independent. You describe how the speech should be performed (rage, grief, excitement, a child's wonder), and optionally provide reference audio for voice identity. The reference provides the "who." The prompt provides the "how." Any voice can perform any emotion, even if that voice has never been recorded in that emotional state.
This is a diffusion model, not a traditional TTS pipeline. Common issues include repetition and gibberish on some seeds. Different seeds give different results, and you will not get a perfect output with 0% error rate. This model is meant for a post-editing workflow: generate, pick the best take, trim if needed. Same way you'd work with any generative model.
That said, we keep coming back to Scenema Audio over even Gemini 3.1 Flash TTS, which is already more controllable than most TTS systems out there. The reason is simple: the output just sounds more natural and less robotic. There's a quality to diffusion-generated speech that autoregressive TTS doesn't quite match, especially for emotional delivery.
As this video points out, generating audio first and then using it to drive video generation is a powerful workflow. That's actually how we've used Scenema Audio in some cases. Generate the voice performance, then feed it into an A2V pipeline (LTX 2.3, Wan 2.6, Seedance 2.0, etc.) to generate video that matches the speech. Here's an example of that workflow in action.
A few people have asked this. Our bottleneck is not denoising steps. The diffusion pass is a small fraction of total generation time. The real costs are elsewhere in the pipeline. We're already at 8 steps (down from 50 in the base model), and that's the sweet spot where quality holds.
This model is sensitive to prompting, the same way LTX 2.3 is for video. A generic voice description gives you generic output. A specific, theatrical description with action tags gives you a performance. There's also a pace parameter that controls how much time the model gets per word. Takes some experimentation to find what works for your use case, but once you do, you can generate hours of audio with minimal quality loss.
Complex words and proper nouns benefit from phonetic spelling. Unlike traditional TTS, it doesn't have a phoneme-to-audio pipeline or a pronunciation dictionary. If it garbles "Tchaikovsky," you would spell it "Chai-koff-skee" or whatever makes sense to you.
We ship this as a Docker container with a REST API. Same setup we use in production on scenema.ai. The service auto-detects your GPU and picks the right configuration:
| VRAM | Audio Model | Gemma | Notes |
|---|---|---|---|
| 16 GB | INT8 (4.9 GB) | CPU streaming | Needs 32 GB system RAM |
| 24 GB | INT8 (4.9 GB) | NF4 on GPU | Default config |
| 48 GB | bf16 (9.8 GB) | bf16 on GPU | Best quality |
We went with Docker because that's how we serve it. No dependency hell, no conda environments. Pull, set your HF token for Gemma access, then docker compose up.
Native ComfyUI node support is planned. We're hoping to release it in the coming weeks, unless someone from the community beats us to it. In the meantime, the REST API is straightforward to call from a custom node since it's just a local HTTP service.
This is fully open source. The model weights derive from the LTX-2 Community License but all inference and pipeline code is MIT.
Files are sorted from first to be scraped to last to be uploaded (my pi didnt have enough storage to download my full db of images so its mostly google)
the text is sometimes good, sometimes its something like "Thumbnail image", i dont recomend training something on it but as a experiment it should be enough.
i will release the full version if either upgrade the pis sd card or i do it on my main pc.
https://huggingface.co/datasets/simonko912/crawl-images