▲ 1 r/DNA

Allelix: an open-source CLI that annotates raw genotype files against ClinVar/PharmGKB/GWAS/SNPedia, with full source attribution so you can verify every call

Allelix is a free, open-source tool that takes a raw data file (23andMe, Ancestry, FTDNA, MyHeritage, Living DNA, Tempus) and annotates the variants against ClinVar, PharmGKB/CPIC, GWAS Catalog, and SNPedia. It runs entirely on your own machine, much closer in spirit to the original local Promethease than the current web version. AGPL, no uploads, no account.

Every single annotation is attributed to its source.

Example: "ClinVar classifies this as pathogenic," never "this is pathogenic".

Also provided is an extract command that prints the raw diploid genotype for any rsID (or list of rsIDs) so that you can check a ClinVar or PharmGKB hit individually.

SNPedia is included under CC BY-NC-SA with attribution and a commercial-mode switch that disables it.

A few more features:

  • Build is detected from position data, not the header. Some exports were found to label themselves "build 37.1" while shipping GRCh38 coordinates, which otherwise would produce a false pathogenic call.
  • Known artifacts are documented, not hidden: e.g. a homozygous PKD1 stop-gain reads as "pathogenic" but is biologically implausible (autosomal dominant, lethal homozygous) and almost certainly a chip artifact.

There are two focused modes: pharmacogenomics (PharmGKB + CPIC drug-gene), methylation (MTHFR, MTR, MTRR, COMT, CBS and related).

CADD and VCF support are planned next.

Critique from this sub specifically, especially on false-positive handling and anything that's wrong would be much appreciated.

Allelix aims to provide accessible local first, privacy first open source genetic analysis to everyone.

Repo: https://github.com/dial481/allelix

Allelix v1.4.0 sample report (html)

reddit.com
u/PenfieldLabs — 13 hours ago
▲ 2 r/MTHFR

Is this methylation gene panel complete, or am I missing important ones?

Currently building on an open-source tool that checks methylation-related genes.

Here's the current panel: MTHFR, MTR, MTRR, COMT, CBS, VDR, BHMT, SHMT, PEMT.

Would you add anything?

reddit.com
u/PenfieldLabs — 2 days ago

Free, open-source tool to analyze your raw 23andMe data locally (no upload required)

Allelix is a fully open source, completely free genotype analysis toolkit.

It runs on your own computer, your data never leaves your machine. It pulls from actively maintained public databases (ClinVar, PharmGKB, CPIC, gnomAD, GWAS Catalog) but also includes SNPedia (no longer updated). Allelix is a free alternative to Promethease with many additional data sources.

What it does: takes your raw data file, cross-references your variants against clinical and pharmacogenomic databases, and generates an HTML report you can open in any browser.

Works with 23andMe, AncestryDNA, and MyHeritage formats.

Python CLI. Free, no account, no cloud.

Happy to answer questions.

Link in first comment.

Allelix v1.4.0 sample report

reddit.com
u/PenfieldLabs — 2 days ago

Free, offline, open-source alternative to Promethease - no upload, no $12

Allelix is Promethease-like tool that doesn't require uploading your genome to a third party or paying per report.

It takes your raw data file from 23andMe, AncestryDNA, FTDNA, LivingDNA, or MyHeritage and generates a report annotating your variants against ClinVar, PharmGKB (pharmacogenomics), GWAS Catalog, and SNPedia. Similar to what Promethease does, but:

  • Free. AGPL open source, no cost, no account.
  • Offline. Your genotype file never leaves your computer. The tool downloads public databases (ClinVar, PharmGKB, etc.) once, caches them locally, and runs everything on your machine.
  • CLI-based. Three commands from zero to report:

Reports come out as HTML (like the screenshot), JSON, or directly in the terminal.

  • Allelix auto-detects everything. Format (23andMe vs AncestryDNA vs ...), genome build (GRCh37 vs GRCh38) - it's all handled automatically.

It's not a 1:1 Promethease clone - the report format is different and it doesn't have Promethease's custom wiki content. What it does have is direct annotation against the primary source databases with full attribution, so you can see exactly where each classification comes from and verify it yourself. If you run allelix db update before analysis, you'll always have the latest information.

Pharmacogenomics mode (allelix pharmacogenomics your_file.txt) gives you a focused drug-gene interaction report from PharmGKB + CPIC data.

Methylation mode (allelix methylation your_file.txt) gives you a focused report on methylation pathway genes - MTHFR, MTR, MTRR, COMT, CBS, and related variants.

Extract (allelix extract your_file.txt --snps rs1801133,rs4680) prints the raw diploid genotype for specific rsIDs - useful for spot-checking a ClinVar or PharmGKB hit against what the array actually called.

GitHub: https://github.com/dial481/allelix

Allelix v1.2.0 Sample Report

reddit.com
u/PenfieldLabs — 4 days ago

Allelix - open-source CLI for genotype annotation against ClinVar, PharmGKB, GWAS Catalog, and SNPedia

Allelix is an open-source tool that takes a raw genotype file (23andMe, AncestryDNA, FTDNA, LivingDNA, MyHeritage, or any tab-delimited genotype format) and annotates it against ClinVar, PharmGKB, GWAS Catalog, and SNPedia. Outputs HTML, JSON, or terminal reports.

Some things this sub may find of interest:

Build detection. The tool auto-detects genome build (GRCh36/37/38) from the genotype data itself - no user input is required. It samples rsID positions and checks them against known build-specific coordinates. GRCh36 files get a safety guard that warns users ClinVar dropped GRCh36 support after a certain date, so annotations may be incomplete.

Multi-source annotation. Each variant is checked against ClinVar (both GRCh37 and GRCh38 databases), PharmGKB clinical annotations with CPIC allele function data for pharmacogenomics, GWAS Catalog associations, and SNPedia. Sources are attributed per-annotation - the tool surfaces what the databases say, it does not make independent classifications.

Database freshness. Databases refresh detects if the remote source has changed (ETag/Last-Modified for ClinVar and PharmGKB, content hash for GWAS) and automatically updates. This can be disabled with --no-update. Everything is cached locally as SQLite.

No upload. Everything runs locally. Your genotype file(s) never leave your machine.

It handles all the common consumer genotyping formats and auto-detects which one you have. Install is pip install git+https://github.com/dial481/allelix.git, allelix db update, then allelix analyze myfile.txt. Depending on your connection speed, your first analysis should be completed within 5-15 minutes. Most future analyses will take 1-3 minutes (depending on your system resources) for a typical 600k-900k variant file.

AGPL-3.0. Feedback and scrutiny welcome - especially on the annotation logic and the non-finding filter (Allelix suppresses ClinVar "benign" and PharmGKB non-findings using a combination of ClinVar REF checks and CPIC allele function classification).

GitHub: https://github.com/dial481/allelix

https://preview.redd.it/utreszr0at5h1.png?width=1848&format=png&auto=webp&s=d7b204d78b0697edab6d144df06042156f94f022

reddit.com
u/PenfieldLabs — 4 days ago

Many of us have the same problem. Thousands of notes, PDFs, docs, maybe some audio or video - and no real structure connecting any of it.

PENgram is a free, open source, MIT licensed pipeline that takes that mess and extracts a typed knowledge graph from it. You point it at a folder, it reads what's already there, and outputs an Obsidian vault with typed relationships between everything it finds.

It does not write notes for you. It works on your existing content.

It handles: code, markdown, PDFs, text files, YouTube channels (captions/subtitles), audio and video via local Whisper transcription, and images. It supports local models for the privacy conscious.

The relationship vocabulary is the same 24 types from our Wikilink Types plugin - supersedes, contradicts, supports, evolution_of, parent_of, and so on. If you're already using Wikilink Types, the output plugs straight in.

Three passes:

  1. Deterministic - tree-sitter for code, no LLM, no tokens burned
  2. Local - faster-whisper for audio/video on CPU or GPU, no API required
  3. LLM - entity and relationship extraction per document, then a typing pass to assign one of the 24 relationship types to every edge

Everything is content-hashed so re-runs only reprocess what changed. Crashes resume cleanly.

Fully local. Supports local LLMs or API keys - your choice. MIT.

Quick start:

pip install 'pengram[all]'
pengram run ./your-notes
open pengram-out/graph.html

Graph View of a PENgram generated Obsidian Vault

Output is a graph.html file you can open in any browser, plus an Obsidian vault export when you want it.

GitHub: https://github.com/penfieldlabs/pengram

Happy to answer questions.

reddit.com
u/PenfieldLabs — 1 month ago