r/FunMachineLearning

I built a tool that shows you what GPT-2 is "thinking" in real-time as it generates 3D graph of concept activations per token
▲ 9 r/FunMachineLearning+1 crossposts

I built a tool that shows you what GPT-2 is "thinking" in real-time as it generates 3D graph of concept activations per token

Been going down a mechanistic interpretability rabbit hole for the past few weeks and ended up building this thing called AXON.

The idea: every time GPT-2 generates a token, its residual stream gets passed through a Sparse Autoencoder (Joseph Bloom's pretrained SAE). The SAE decomposes it into human-interpretable feature: hings like "European geography", "capital cities", "French language" and streams those to the browser over WebSocket, where they show up as a live 3D force graph.

Nodes = SAE features. Edges = features that fired together on the same token. Node brightness = activation strength. The whole graph evolves token by token.

What surprised me most: type "The capital of France is" and you can literally watch geography features, proper noun features, and completion-pattern features light up before the word "Paris" even gets generated. It's not what the model outputs that's interesting it's what's happening right before it decides.

Stack: TransformerLens + SAELens on the backend, FastAPI WebSocket for streaming, Three.js + 3d-force-graph on the frontend. Runs on CPU (~800ms/token) or GPU (~35ms on a 4050). Labels come from Neuronpedia's API and get cached locally.

You can also swap in other models — GPT-2 medium/large/xl, Pythia variants, Gemma-2-2B — as long as there's a pretrained SAE for it in SAELens.

GitHub: https://github.com/09Catho/axon

Would love feedback and stars especially from anyone who's worked with SAEs before curious whether the co-activation edges are actually meaningful or just noise at this layer.

u/Financial_World_9730 — 3 days ago
▲ 2 r/FunMachineLearning+1 crossposts

Is it possible to be self taught in Machine Learning along with pursuing a college degree.

Hello I am student and entering college next month.
I sadly didn't get the course that I wanted and now will be joining a college with lower branch as the college is really good and is competitive and have good exposure.

But the branch I am choosing doesn't have much scope in my country nor I would want to go all in that. I have always been into computers.

I want to learn machine learning myself so that I can hopefully in future land a job in to or pursue further. I guess without a college degree it will be hard. Is there any way I can learn Machine learning myself like how it is taught in colleges? I don't know how and what to do.
If anyone of you is a ML engineer who is self taught from a course online or anything.
Can you please guide me. Please

Thank You

reddit.com
u/coderbiee — 6 days ago
▲ 126 r/FunMachineLearning+17 crossposts

Built an open-source one-prompt-to-cinematic-reel pipeline on a single GPU — FLUX.2 [klein] for character keyframes, Wan2.2-I2V for animation, vision critic with auto-retry, music + 9-language narration in the same pipeline

Shipped this for the AMD x lablab hackathon. Attached video is one of the actual reels the pipeline produced - one English sentence in, finished mp4 with characters, story, music, and voice-over out. ~45 minutes end-to-end on a single AMD Instinct MI300X. Every model is Apache 2.0 or MIT.

Pipeline (8 stages, all sequential on the same GPU):

  1. Director Agent - Qwen3.5-35B-A3B (vLLM + AITER MoE) plans 6 shots from one sentence, returns structured JSON with character bibles, shot prompts, music brief, per-shot voice-over script, narration language
  2. Character masters - FLUX.2 [klein] paints one canonical portrait per character. No LoRA training step - reference editing pins identity across shots by construction
  3. Per-shot keyframes - FLUX.2 again with reference image. Sub-second per keyframe after warmup
  4. Animation - Wan2.2-I2V-A14B, 81 frames @ 16 fps native. FLF2V for cut:false continuation arcs (last frame of shot N anchors first frame of shot N+1)
  5. Vision critic - same Qwen3.5-35B reloaded with 10 structured failure labels (character drift, extras invade frame, camera ignored, walking backwards, object morphing, hand/finger artifact, wardrobe drift, neon glow leak, stylized AI look, random intimacy). Bad clips re-render with targeted retry strategies (different seed, FLF2V anchor, prompt simplification)
  6. Music - ACE-Step v1 generates a 30s instrumental from Director's brief
  7. Narration - Kokoro-82M, 9 languages. Director picks language to match setting (Tokyo→Japanese, Paris→French, Mumbai→Hindi)
  8. Mix - ffmpeg with per-shot vo aligned via adelay

Wan 2.2 specifics (the bit this sub will care about):

  • 1280×720, not 640×640 default. Costs more but matches what producers want
  • 121 frames at 24 fps was my first attempt - gave temporal rippling. Switched to 81 @ 16 fps native (the distribution Wan was trained on) and it cleaned up
  • flow_shift = 5 for hero shots, 8 for b-roll (upstream wan_i2v_A14B.py defaults)
  • Negative prompt: verbatim Chinese trained negative from shared_config.py. umT5 was multilingual-pretrained against those exact tokens. English translation is observably weaker
  • Camera language: ONE camera verb per shot, sentence-case, placed first ("Tracking shot following from behind"). Multiple verbs in one prompt cancel each other out
  • Avoid the word "cinematic" - triggers Wan's stylization branch, gives the AI look. Use lens/film tags instead ("Arri Alexa, anamorphic, 35mm film grain")

Performance work:

  • ParaAttention FBCache (lossless 2× on Wan2.2)
  • torch.compile on transformer_2 (selective, the dual-expert MoE makes full compile flaky) - another 1.2×
  • AITER MoE acceleration on Qwen director (vLLM)
  • End-to-end: 25.9 min → 10.4 min per 720p clip on MI300X

Why a single MI300X: 192 GB HBM3 lets a 35B MoE, 4B diffusion, 14B I2V MoE, 3.5B music, and a TTS share the same card sequentially. Same stack on a 24 GB consumer GPU would need 4-5 boxes wired together.

Code (public, Apache 2.0): https://github.com/bladedevoff/studiomi300

Hugging Face (documentation, like this space 🙏) https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/studiomi300

Live demo on HF Space is temporarily offline while infra restores - should be back within hours. In the meantime the showcase reels in the repo are real pipeline outputs, no human re-edited shots.

Happy to dig into AITER MoE setup, FBCache tuning, FLF2V anchoring, or the vision critic's failure taxonomy in comments.

u/Inevitable-Log5414 — 9 days ago
▲ 11 r/FunMachineLearning+4 crossposts

ML with Finance

Hi, I am an MTech student in computer science. I want to work on finance domain with machine learning. So can you suggest me some research topic. On which we can work for last year thesis. During my MTech my major focus on machine learning and deep learning around topic. But I have an interest in the finance domain also I did some project like https://github.com/Zdong104/FNSPID_Financial_News_Dataset with market regime. But now I am finding an solid research topic for the my final year. Is there any suggestion for this ?

u/Gullible_Space_4070 — 7 days ago
▲ 8 r/FunMachineLearning+3 crossposts

Nobody told me that building a fintech brand in 2025 meant playing a game where the rules change every 3 months 😭

Okay real talk.

When I started Astra I thought the hard part would be the product. Regulations, compliance, building trust in a skeptical market. And yeah — that was hard.

But this? Trying to stay visible when Google is changing, AI search is exploding, and half your customers now make decisions based on what ChatGPT tells them?

Nobody warned me about this part.

We're currently showing up in about 8% of AI responses for our key queries. Which sounds small because it is small. I want to get Astra to 30% by Q3 and I have no idea if that's realistic or delusional.

Been looking at getting proper help. Absolute Digital Media, Impression Digital, and Growthner keep coming up in my research. Has anyone worked with them? Are they actually solid or is this another case of paying for confidence and getting spreadsheets?

Also genuinely curious — what % AI citation share are others sitting at? Is 8% embarrassing or actually normal for a brand our size? Help me feel better or worse, I can take it

u/Plenty-Shelter654 — 9 days ago
▲ 11 r/FunMachineLearning+4 crossposts

ZERO-VRAM-SPEC Which speeds up 1.3X in code genarationg without taking any extra vram

https://github.com/neerajdad123-byte/zero-vram-spec
I replaced draft model entirely with a python rule based AST predictor which seems working well in predicting grammer forced tokens and also indentations

While doing this project i learnt many things about implementation of all types of spec decoding and also
how tokens work and everything about MTP(multi token prediction) and many things

Looking up for an intenship
passion is to build things
Leave a star for me it would be very much helpful to me

u/PangolinLegitimate39 — 9 days ago