u/Comfortable-Edge-915

https://preview.redd.it/orpvoyzmg6wg1.jpg?width=796&format=pjpg&auto=webp&s=6a80bcb82b1aa4c10ca3e59a242fcbf220d486ab

Built a browser-based molecule viewer with a voice layer on top of Mol*. An OpenFold3 prediction comes back from NVIDIA BioNeMo as mmCIF + confidence scores, gets rendered with Mol*, and a Web Speech API layer accepts commands like "walk through chain A", "focus glutamate 4", or "start guided tour".

No install, no login. Works in Chromium-based browsers and Edge. iOS Safari is click-only (Web Speech isn't supported there).

Stack

  • Prediction: OpenFold3 on NVIDIA AI Enterprise / BioNeMo. Demo ships two pre-computed complexes (a zinc-finger–DNA binder and λ Cro bound to operator DNA).
  • Rendering: Mol* 4.9.0 via CDN. Cartoon + residue-index coloring for proteins, default nucleic-acid representation for DNA, black canvas.
  • Voice: Web Speech API — SpeechRecognition for commands, SpeechSynthesis for narration.
  • Residue parsing: normalizes number words (three → 3), spelled letters (dee gee three → DG3), full amino-acid names, and common Chrome mishears (glue → GLUtire → TYRisle → ILEtrip → TRP). Falls back to Levenshtein with a length-sensitive threshold.
  • Camera: a custom focusFacing() — Mol*'s default camera.focus(center, radius) dollies along the view vector, which drops the camera inside the structure when focusing on a back-facing residue. focusFacing() orbits to the outside of the structure centroid first, then focuses.
  • Tour orchestration: narration + camera moves + auto-spin between steps + mic pause during TTS (otherwise you get a feedback loop).

Features that were annoying to build

  • Chrome transcribes 3-letter codes as homophones. Lysine → lice/like. Tryptophan → trip. Threonine → thor. Had to hand-curate an alias table.
  • playsInline + muted autoplay differences across iOS Safari vs Chromium.
  • Coordinating the SpeechRecognition state machine with Mol*'s render loop during tours — the mic has to stop before TTS starts, restart after, and the start button also has to act as a stop-tour button.

What it doesn't do (yet)

  • No prediction job submission — hardcoded to the two pre-computed outputs.
  • No MSA handling.
  • pLDDT per-residue comes back in the JSON but isn't painted on the surface yet. Trivial to add via Mol*'s plddt-confidence theme — just haven't.
  • No ligands. OpenFold3 supports them; I haven't added a non-polymer representation.
  • No export — PNG snapshot, downloadable mmCIF, neither is wired up.

Try it
https://sheldonbarnes.com/tools/ai-voice-guided-molecule-viewer

Click the icon in the bottom-right to activate the mic. Say "what can I do" for a 30-second narrated capability demo. If voice isn't your thing, the sidebar has a residue picker and per-chain walkthrough buttons.

Looking for feedback

  • Voice command grammar — what commands would actually be useful in a real bioinformatics workflow vs demo territory?
  • Is pLDDT painted on the surface worth the cognitive load for non-specialists, or does it overwhelm the initial read?
  • Export needs — does anyone here actually want to render → download, or is the goal always to link out to the underlying structure?
  • Teaching — anyone using anything like this in an undergrad biochem or structural-bio course? Interested in what the gaps look like there.
reddit.com
u/Comfortable-Edge-915 — 1 month ago