r/huggingface

▲ 197 r/huggingface+5 crossposts

G4-MeroMero-31B-uncensored-heretic is Out Now, A finetune of Gemma 4 31B it designed for creative tasks, with KLD of 0.0100 and 15/100 Refusals!

Provided in both Safetensors and GGUFs.

Safetensors: llmfan46/G4-MeroMero-31B-uncensored-heretic: https://huggingface.co/llmfan46/G4-MeroMero-31B-uncensored-heretic

GGUFs: llmfan46/G4-MeroMero-31B-uncensored-heretic-GGUF: https://huggingface.co/llmfan46/G4-MeroMero-31B-uncensored-heretic-GGUF

I can make also GPTQs and NVFP4s if anyone asks for them.

Find all my models here (big selection of uncensored RP models): HuggingFace-LLMFan46

The original author of this finetune is: zerofata

huggingface.co
u/LLMFan46 — 5 days ago
▲ 207 r/huggingface+5 crossposts

gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now, A Writing Finetune that Aims to Improve Gemma 4 31B it writing Quality and Prose with More Natural English and Better Prose, Good for Creative Writings, Translations and RPs!

Provided in both Safetensors and GGUFs.

llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic: https://huggingface.co/llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic

llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic-GGUF: https://huggingface.co/llmfan46/gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic-GGUF

I can make also GPTQs and NVFP4s if anyone asks for them.

Find all my models here (big selection of uncensored RP models): HuggingFace-LLMFan46

huggingface.co
u/LLMFan46 — 6 days ago
▲ 21 r/huggingface+17 crossposts

New Academic Research: “Zombies in Alternate Realities: The Afterlife of Domain Names in DNS Integrations”

Interesting paper on a fairly under-discussed issue in DNS: what happens to expired or repurposed domain names that remain embedded in DNS dependencies across systems. The core finding is that these “orphaned” or changed domains can persist in resolution paths and integrations long after their original context is gone, creating real security and reliability implications.

My take: this becomes even more relevant in modern AI systems, where agents, tools, plugins, and third-party APIs are rapidly stitched together. In that environment, domain names and DNS-level dependencies can quietly extend the AI supply chain attack surface in ways that are easy to overlook.

Paper: https://arxiv.org/abs/2605.06880

reddit.com
u/VincentADAngelo — 5 days ago
▲ 3 r/huggingface+2 crossposts

Stop treating "Show, Don't Tell" as a suggestion. It’s actually a physics problem.

I’ve been going down a rabbit hole lately that completely flips the script on how we think about storytelling. We’ve all been told "Show, Don't Tell" since day one, but let's be honest: most of the time it’s just a vague suggestion based on a writer’s "vibes" or "talent."

I recently came across the work of a researcher named Levent Bulut, and he’s basically turned this into an engineering discipline called The Bulut Doctrine.

The core idea is something called Objective Projection. He argues that words like "chilly," "sad," or "dark" are technical failures because they are subjective. A "chilly" room in London is a "warm" room in Dubai. Instead, he uses Physical Constants. Think about it this way:
• You don't write that a character is "sad." You don't even use similes like "as" or "like" (they are actually banned in his methodology).

• You define the Physical Matrix: Instead of "chilly," you use 14°C. Why? Because 14°C is the same physical reality everywhere on Earth. It bypasses the reader’s cultural interpretation and hits the nervous system directly.

He’s formulated this into a system with things like Narrative Entropy, the Vacuum Variable, and Narrative Gravity. It’s basically a technical manual for the human brain's "Biological Interface."

What's really interesting for the AI crowd is that he just released a massive SFT dataset on Hugging Face (over 200 scenes) specifically designed to teach models how to stop using "emotional labels" and start using "physical projections." There's also a tool called OPCT v2.0 to calibrate prose.

If you’re tired of the "literature is just a feeling" talk and want to see the actual formulas and the "Beyond Eliot" framework, I highly recommend looking up Levent Bulut’s official site. You can find his deep dives on Pulp Fiction and why AI fails at emotional scenes there.

Is storytelling an art, or is it just a branch of physics we haven't standardized yet? I’m leaning towards the latter.

reddit.com
u/Impossible-Bed7058 — 6 days ago
▲ 310 r/huggingface+3 crossposts

Gemma4-26B-A4B Uncensored Balanced is out with K_P quants!

First of all, I'm stoked to announce we just passed 10 million downloads on HF! (counted only on my own account, no duplicates/quants/finetunes)

BUT: After 1+ month non-stop working on Gemma4 (by far the hardest model I've uncensored), the Gemma4-26B-A4B Uncensored Balanced RC is up!

https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced

GenRM Defeated! 0/465 refusals*.

Balanced = light reasoning preamble on the absolute edgiest stuff before delivering the full answer. No personality changes/alterations or any of that. This is the ORIGINAL Gemma4-26B-A4B-it, just uncensored. Aggressive variant (no preamble, direct mode) is in the pipeline as a follow-up.

This legitimately took me over 1 month of non-stop work. Targeting 0 refusals in any kind of regular use, and that's what I'm seeing in testing (automated and manual) — as always with my Balanced releases, a handful of edge-case prompts still deflect on first try but follow through on a re-ask (on extreme, non-RP scenarios). If you hit one Balanced won't get past, the Aggressive variant is coming once I figure out how to maintain lossless/near-lossless quality for it.

  • Balanced: will reason through edgy requests, occasionally attach a short safety framing, then deliver the full answer. Output is complete, nothing held back, but it can talk itself into it first. Recommended default — 99%+ of users will be happy here. Best for creative writing, RP, emotional intelligence. Normally I'd also say "agentic coding/tool use" however in my in-depth testing, Qwen3.6 has been net superior on such tasks.

  • Aggressive (separate release, WIP): strips the self-reasoning preamble and gives direct answers to any DEEPLY censored topics.

    From my own testing: no looping, sampling stays stable across re-runs, long-context coherence holds. For agentic coding/tool-use Qwen3.6 is still net superior.

    Use Gemma4 for creative writing, RP, emotional intelligence, etc.

    To disable thinking: edit the jinja template or pass {"enable_thinking": false} as a chat-template kwarg.

    What's included:

    - Q8_K_P, Q6_K_P, Q5_K_P, Q5_K_M, Q4_K_P, Q4_K_M, IQ4_XS, Q3_K_P, Q3_K_M, IQ3_M, Q2_K_P, IQ2_M

    - mmproj for vision support

    - All quants generated with imatrix

    K_P recap (for anyone who missed the prior releases): custom quants that use model-specific analysis to preserve quality where it matters most. Each model gets its own optimized profile.

    Effectively 1-2 quant levels of quality uplift at ~5-15% larger file size. Fully compatible with llama.cpp, LM Studio, anything that reads GGUF (heads up, as always, Ollama can be more difficult to get going).

    Quick specs:

    - 25.2B total / 3.8B active (MoE: 128 routed experts, top-8 + 1 shared)

    - 30 layers, hybrid attention: 5× sliding-window (1024) + 1× full global, repeating

    - Hidden 2816, head_dim 256 SWA / 512 full, 16 heads, 8 KV heads

    - 262K native context

    - p-RoPE

    - Multimodal (text + image via mmproj)

    Sampling params (Google's recommendations, make sure to use these ):

temp=1.0, top_p=0.95, top_k=64

Notes:

- Use --jinja flag with llama.cpp

- Place images before text in prompts for vision

- K_P quants may show as "?" in LM Studio's quant column — purely cosmetic, model loads and runs fine

- HF's hardware-compatibility widget also doesn't recognize K_P, so click "View +X variants" or go to Files and versions to see all downloads

All my models: HuggingFace-HauhauCS

Discord link is in the HF repo and it contains updates, roadmap, projects, or just chat.

As always, hope everyone enjoys the release!

* = Tested with both automated and manual refusal benchmarks/prompts which resulted in none found. Based on Discord feedback I may further update the release.

u/hauhau901 — 8 days ago
▲ 804 r/huggingface+3 crossposts

I built hfviewer.com, a small tool for visually exploring Hugging Face model architectures.

You can paste in a Hugging Face model URL and get an interactive visualization of the model architecture, which can make it easier to understand how different models are structured and compare them at a glance.

Here is the recent Qwen3.6-27B model as an example: https://hfviewer.com/Qwen/Qwen3.6-27B

Feel free to try it out and give me feedback on how it can be improved! :)

u/Course_Latter — 9 days ago
▲ 4 r/huggingface+1 crossposts

ModelDock - A Premium Local AI Hub to Download and Organize Hugging Face Repos (Models, Datasets, Spaces)

https://preview.redd.it/pco7hhwzpf1h1.png?width=2548&format=png&auto=webp&s=7aec424d55bd46a45a3710ac1e99e535ceffc99c

https://preview.redd.it/kn4sehwzpf1h1.png?width=2555&format=png&auto=webp&s=7fd32b33e5d865321f8228348d584e2b550a5ad6

https://preview.redd.it/6lze1iwzpf1h1.png?width=2552&format=png&auto=webp&s=187d121602c7a3d40016b2e6c98185c0855df418

Hi everyone! 🚀

I wanted to share a tool I’ve been developing called **ModelDock**. It’s a dedicated desktop application designed to simplify the way we interact with Hugging Face locally.

### The Problem

If you’ve ever struggled with interrupted large model downloads, managing multiple GGUF versions, or simply wanting a clean way to organize your local AI library without jumping between the browser and CLI, you know the pain.

### The Solution: ModelDock

ModelDock provides a premium GUI to search, download, and manage your HF assets with a focus on stability and user experience.

### 🛠️ Key Features:

* **Advanced Queuing:** Add multiple models/datasets to the queue. It handles concurrent downloads and auto-retries in the background.

* **Real-time Activity Logs:** A built-in terminal view shows you exactly what’s happening during the download process.

* **Integrated Search:** Discover any repository (Model, Dataset, or Space) directly within the app.

* **Library Management:** One-click access to your local folders and a history of all your downloads.

* **Performance Tuning:** Control download speed limits and the number of workers to match your hardware.

* **Sleek UI:** A professional dark-mode interface with fluid animations, designed for AI developers.

### 🔌 Tech Stack:

Built with Electron, React, Vite, and an optimized Python-based download engine to ensure maximum throughput.

I'd love to get some feedback from the community! Whether it’s feature requests or bug reports, your input is highly appreciated.

**GitHub Repository:** [https://github.com/thebestgoodguy/modeldock](https://github.com/thebestgoodguy/modeldock)

**Check it out and let me know what you think!** 🛠️🤖

reddit.com
u/Decent_Lynx4729 — 6 days ago
▲ 11 r/huggingface+7 crossposts

I built a 13 MB open-source face verification model because paid APIs felt ridiculous

I wanted to add face verification to my startup, SwayamWhere.com.

Then I looked at the pricing for face verification APIs.

Around $1 to $1.50 per 1,000 images/API calls sounds cheap at first, but once you factor in onboarding, duplicate profile checks, retries, testing, abuse prevention, and scale, it becomes a recurring tax on your trust layer.

So I decided to build my own.

After 2 months of training, testing, threshold tuning, false accept reduction, embedding comparison, model packaging, and documentation, I’m open-sourcing it.

It’s called TinyFaceMatch.

It is a lightweight, MIT-licensed face verification model that compares two aligned face images and returns a match decision with similarity scores.

Current benchmark:

  • Accuracy: 99.72%
  • ROC AUC: 0.9983
  • Balanced accuracy: 99.02%
  • True accept rate: 98.30%
  • False accept rate: 0.25%
  • False reject rate: 1.70%
  • Model size: 13.238 MB
  • Embedding size: 128-D
  • License: MIT

The main goal was not to create another huge research model.

The goal was to create something small enough to actually ship.

For context:

  • OpenCV SFace reports 99.60% LFW accuracy with a 36.9 MB recognition model.
  • dlib face recognition reports 99.38% LFW accuracy.
  • FaceNet VGGFace2-style models report around 99.65% LFW accuracy, but can be around 107 MB.

TinyFaceMatch reaches 99.72% accuracy in a 13.238 MB package.

No paid API call per verification.

No vendor lock-in.

No heavyweight deployment.

No separate commercial license needed.

I built this because I wanted face verification that was practical, local-first, auditable, affordable, and open.

Repo:
https://github.com/yuvrajraina/tinyfacematch

Docs and demo:
https://tinyfacematch.yuvrajraina.com/

Would love feedback from anyone working on computer vision, identity, trust and safety, or lightweight ML deployment.

u/No-Half4231 — 8 days ago
▲ 65 r/huggingface+1 crossposts

Built an open-source one-prompt-to-cinematic-reel pipeline on a single GPU — FLUX.2 [klein] for character keyframes, Wan2.2-I2V for animation, vision critic with auto-retry, music + 9-language narration in the same pipeline

Shipped this for the AMD x lablab hackathon. Attached video is one of the actual reels the pipeline produced - one English sentence in, finished mp4 with characters, story, music, and voice-over out. ~45 minutes end-to-end on a single AMD Instinct MI300X. Every model is Apache 2.0 or MIT.

Pipeline (8 stages, all sequential on the same GPU):

  1. Director Agent - Qwen3.5-35B-A3B (vLLM + AITER MoE) plans 6 shots from one sentence, returns structured JSON with character bibles, shot prompts, music brief, per-shot voice-over script, narration language
  2. Character masters - FLUX.2 [klein] paints one canonical portrait per character. No LoRA training step - reference editing pins identity across shots by construction
  3. Per-shot keyframes - FLUX.2 again with reference image. Sub-second per keyframe after warmup
  4. Animation - Wan2.2-I2V-A14B, 81 frames @ 16 fps native. FLF2V for cut:false continuation arcs (last frame of shot N anchors first frame of shot N+1)
  5. Vision critic - same Qwen3.5-35B reloaded with 10 structured failure labels (character drift, extras invade frame, camera ignored, walking backwards, object morphing, hand/finger artifact, wardrobe drift, neon glow leak, stylized AI look, random intimacy). Bad clips re-render with targeted retry strategies (different seed, FLF2V anchor, prompt simplification)
  6. Music - ACE-Step v1 generates a 30s instrumental from Director's brief
  7. Narration - Kokoro-82M, 9 languages. Director picks language to match setting (Tokyo→Japanese, Paris→French, Mumbai→Hindi)
  8. Mix - ffmpeg with per-shot vo aligned via adelay

Wan 2.2 specifics (the bit this sub will care about):

  • 1280×720, not 640×640 default. Costs more but matches what producers want
  • 121 frames at 24 fps was my first attempt - gave temporal rippling. Switched to 81 @ 16 fps native (the distribution Wan was trained on) and it cleaned up
  • flow_shift = 5 for hero shots, 8 for b-roll (upstream wan_i2v_A14B.py defaults)
  • Negative prompt: verbatim Chinese trained negative from shared_config.py. umT5 was multilingual-pretrained against those exact tokens. English translation is observably weaker
  • Camera language: ONE camera verb per shot, sentence-case, placed first ("Tracking shot following from behind"). Multiple verbs in one prompt cancel each other out
  • Avoid the word "cinematic" - triggers Wan's stylization branch, gives the AI look. Use lens/film tags instead ("Arri Alexa, anamorphic, 35mm film grain")

Performance work:

  • ParaAttention FBCache (lossless 2× on Wan2.2)
  • torch.compile on transformer_2 (selective, the dual-expert MoE makes full compile flaky) - another 1.2×
  • AITER MoE acceleration on Qwen director (vLLM)
  • End-to-end: 25.9 min → 10.4 min per 720p clip on MI300X

Why a single MI300X: 192 GB HBM3 lets a 35B MoE, 4B diffusion, 14B I2V MoE, 3.5B music, and a TTS share the same card sequentially. Same stack on a 24 GB consumer GPU would need 4-5 boxes wired together.

Code (public, Apache 2.0): https://github.com/bladedevoff/studiomi300

Hugging Face (documentation, like this space 🙏) https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/studiomi300

Live demo on HF Space is temporarily offline while infra restores - should be back within hours. In the meantime the showcase reels in the repo are real pipeline outputs, no human re-edited shots.

Happy to dig into AITER MoE setup, FBCache tuning, FLF2V anchoring, or the vision critic's failure taxonomy in comments.

u/Inevitable-Log5414 — 8 days ago
▲ 23 r/huggingface+1 crossposts

Anima Base v1.0 is now released on Hugging Face

Anima is now available on Hugging Face with the first public v1.0 release.

Key highlights:

  • Base v1.0 release
  • Openly available on Hugging Face
  • Built for further experimentation and fine-tuning
  • More updates and improvements planned

Model page:
Anima Base v1.0 on Hugging Face

u/Substantial-Fee-3910 — 7 days ago
▲ 146 r/huggingface+4 crossposts

Llama.cpp is getting better with every update

Last night I updated llama.cpp after like 2 or 3 weeks. The results were really exciting for someone running a 35B model on 6GB RTX 3050.

Today I was able to get stable token speeds and they didn't fall down to 9 t/s while coding 1000+ lines of code.

Now I can increase my context window to 64k range and I'm still getting 19 t/s minimum. Before it would do down drastically to 4 t/s.

But now it gives a solid 26 t/s. In high context window worflows it falls by 5-7 t/s only. This means I can do 1000$ worth of coding work on my laptop for free.

Yes. The AI bubble will pop for sure if people realizes they can locally get near same quality of the their cloud subscriptions.

reddit.com
u/Low-Alarm272 — 11 days ago
▲ 9 r/huggingface+7 crossposts

A coding agent doesn’t need intent. It doesn’t need intrinsic desire or secret malice or consciousness to incur real-world cost and consequence. All it needs is task context, tool access, credentials, weak approval boundaries, and a runtime that can act…

Agentic AI systems are missing the language necessary to describe Pathological Self-Assembly, a runtime governance failure mode.

What happens when useful mechanisms (memory, tools, persistence, recovery, delegation, workflow automation, external action, self-monitoring, and operator trust) couple into continuity-preserving behavior?

This is a control draft covering authorization, memory, tools, recovery, delegation, external state, operator trust, and dissolution.

It can’t be just the output anymore. Your thoughts?

u/RJSabouhi — 9 days ago
▲ 4 r/huggingface+1 crossposts

Newbie: Can I use unsloth to load any model on hugging face?

In a project I've been asked to load models and do inference in my app directly with unsloth.
This the model:Qwen/Qwen3-ASR-0.6B · Hugging Face

Is it possible or do I "push back" like claude told me to.

u/AnakinVader066 — 7 days ago
▲ 35 r/huggingface+5 crossposts

I’ve been obsessed with Agentic Workflows lately, and I just found the "missing link" for anyone struggling with agent hallucinations and massive API bills.

It’s called King Context, and it’s an open-source framework that replaces messy vector searches with structured Corpus Engineering.

The GitHub Repo:https://github.com/deandevz/king-context

Why this is a complete paradigm shift:

  1. The "Corpus" Method: Instead of just "chunking" data, it synthesizes it into a specialized corpus. You can generate a corpus from any source (docs, web research, internal notes) and refine it. It’s like giving your agent a custom-built brain instead of a pile of random papers.
  2. Metadata-First Retrieval: It uses a tiered approach (metadata -> preview -> full read). This stopped my agents from "hallucinating" on missing context because they can verify if the information exists before they consume the tokens.
  3. Solving the Skill Bottleneck: By using "Skills" alongside a specialized Corpus, you can build multi-agent workflows where one agent acts as a researcher (building the corpus) and the other acts as an expert (executing with 100% facts).

The Numbers (Benchmarked against Context7):

  • Accuracy: 38/38 correct facts (100%) vs 32/38.
  • Hallucinations: ZERO (0.0) per query.
  • Efficiency: 3.2x fewer tokens per request.
  • Speed: Up to 170x faster metadata hits.

I’ve been talking to the dev (@deandevz), and the roadmap for Corpus Refinement (automatically pruning noisy data) is going to change how we build production-grade agents.

If you are tired of agents getting lost in large codebases or documentation, you need to check this out. It’s local-first, transparent, and built for the "Vibe Coding" era where context is everything.

Check it out here:https://github.com/deandevz/king-context

Would love to hear from anyone else trying to move away from traditional RAG. How are you handling context bloat?

u/VadeloSempai — 10 days ago
▲ 10 r/huggingface+1 crossposts

Scenema Audio: Zero-shot expressive voice cloning and speech generation [N]

We've been building Scenema Audio as part of our video production platform at scenema.ai, and we're releasing the model weights and inference code.

The core idea: emotional performance and voice identity are independent. You describe how the speech should be performed (rage, grief, excitement, a child's wonder), and optionally provide reference audio for voice identity. The reference provides the "who." The prompt provides the "how." Any voice can perform any emotion, even if that voice has never been recorded in that emotional state.

Limitations (and why we still use it)

This is a diffusion model, not a traditional TTS pipeline. Common issues include repetition and gibberish on some seeds. Different seeds give different results, and you will not get a perfect output with 0% error rate. This model is meant for a post-editing workflow: generate, pick the best take, trim if needed. Same way you'd work with any generative model.

That said, we keep coming back to Scenema Audio over even Gemini 3.1 Flash TTS, which is already more controllable than most TTS systems out there. The reason is simple: the output just sounds more natural and less robotic. There's a quality to diffusion-generated speech that autoregressive TTS doesn't quite match, especially for emotional delivery.

Audio-first video generation

As this video points out, generating audio first and then using it to drive video generation is a powerful workflow. That's actually how we've used Scenema Audio in some cases. Generate the voice performance, then feed it into an A2V pipeline (LTX 2.3, Wan 2.6, Seedance 2.0, etc.) to generate video that matches the speech. Here's an example of that workflow in action.

On distillation and speed

A few people have asked this. Our bottleneck is not denoising steps. The diffusion pass is a small fraction of total generation time. The real costs are elsewhere in the pipeline. We're already at 8 steps (down from 50 in the base model), and that's the sweet spot where quality holds.

Prompting matters

This model is sensitive to prompting, the same way LTX 2.3 is for video. A generic voice description gives you generic output. A specific, theatrical description with action tags gives you a performance. There's also a pace parameter that controls how much time the model gets per word. Takes some experimentation to find what works for your use case, but once you do, you can generate hours of audio with minimal quality loss.

Complex words and proper nouns benefit from phonetic spelling. Unlike traditional TTS, it doesn't have a phoneme-to-audio pipeline or a pronunciation dictionary. If it garbles "Tchaikovsky," you would spell it "Chai-koff-skee" or whatever makes sense to you.

Docker REST API with automatic VRAM management

We ship this as a Docker container with a REST API. Same setup we use in production on scenema.ai. The service auto-detects your GPU and picks the right configuration:

VRAM Audio Model Gemma Notes
16 GB INT8 (4.9 GB) CPU streaming Needs 32 GB system RAM
24 GB INT8 (4.9 GB) NF4 on GPU Default config
48 GB bf16 (9.8 GB) bf16 on GPU Best quality

We went with Docker because that's how we serve it. No dependency hell, no conda environments. Pull, set your HF token for Gemma access, then docker compose up.

ComfyUI

Native ComfyUI node support is planned. We're hoping to release it in the coming weeks, unless someone from the community beats us to it. In the meantime, the REST API is straightforward to call from a custom node since it's just a local HTTP service.

Links

This is fully open source. The model weights derive from the LTX-2 Community License but all inference and pipeline code is MIT.

u/a__side_of_fries — 9 days ago

21k image dataset (5gb) from random parts of the internet (and gonna upload more soon)

Files are sorted from first to be scraped to last to be uploaded (my pi didnt have enough storage to download my full db of images so its mostly google)
the text is sometimes good, sometimes its something like "Thumbnail image", i dont recomend training something on it but as a experiment it should be enough.
i will release the full version if either upgrade the pis sd card or i do it on my main pc.
https://huggingface.co/datasets/simonko912/crawl-images

u/Simonko-912 — 8 days ago