r/genomics

▲ 96 r/genomics+46 crossposts

Most people who followed $CYDY remember March 30, 2021. The FDA publicly stated that CytoDyn's claims about leronlimab were "misleading and not supported by the data", no benefit was shown in COVID-19 treatment trials. The stock dropped 25%+ that day.

What happened afterward was a class action lawsuit covering investors who held $CYDY between March 27, 2020 and March 30, 2022.

A $500,000 settlement has been reached and terms are now submitted to the court for approval.

Who qualifies?

Anyone who held $CYDY during the class period and suffered losses from the alleged misrepresentations about leronlimab's effectiveness for HIV and COVID-19.

Can I still apply?

Yes, you can submit your application now and it will be processed once claims filing officially opens after court approval.

If you were damaged by this don't forget to check your eligibility. GL!

u/JuniorCharge4571 — 1 day ago

AI-Assisted Oncology Variant Reconciliation Platform — Seeking Technical & Clinical Feedback

Hi everyone,

I’m organizing a small team project for an AI/healthcare innovation competition focused on oncology molecular data interoperability and reconciliation.

Our proposed project is:

OncoReconcile AI

An AI-assisted platform designed to standardize and reconcile oncology genomic information across:

  • VCF files
  • molecular pathology PDF reports
  • vendor-specific biomarker formats
  • structured clinical/genomic data

The goal is to transform fragmented molecular oncology data into explainable, standardized, and interoperable outputs that could support:

  • molecular tumor board workflows
  • cohort generation
  • downstream analytics
  • clinical research
  • interoperability pipelines

Current Technical Direction

We are exploring a hybrid architecture combining:

  • HGNC gene normalization
  • HGVS variant normalization
  • ontology-grounded mappings
  • biomedical NLP / entity extraction
  • LLM-assisted reconciliation
  • explainable confidence scoring
  • human-in-the-loop review workflows

Potential standards/tools under evaluation include:

  • HL7 FHIR / mCODE
  • ClinVar / ClinGen
  • HGVS
  • BioBERT / SciSpacy
  • RAG-based architectures

Current MVP Scope

To keep the project realistic for a small team and limited timeline, we are likely focusing on:

  • NSCLC initially
  • a limited hotspot gene set (EGFR, KRAS, ALK, BRAF, etc.)
  • 2–3 molecular vendor formats
  • PDF + VCF reconciliation workflows

Feedback We Are Looking For

We would greatly appreciate feedback from people working in:

  • oncology informatics
  • molecular pathology
  • bioinformatics
  • clinical genomics
  • healthcare interoperability
  • biomedical NLP
  • precision medicine platforms

Especially around:

  1. Common real-world reconciliation pain points
  2. Vendor-specific genomic reporting inconsistencies
  3. Explainability and validation expectations
  4. Existing open-source tools/frameworks we should evaluate
  5. Clinical workflow considerations we may overlook
  6. FHIR/mCODE/genomics interoperability best practices
  7. Public datasets suitable for realistic MVP development

We are intentionally positioning this as:

  • AI-assisted,
  • explainable,
  • standards-aligned,
  • human-reviewed,

rather than fully autonomous interpretation.

Thanks in advance for any guidance, references, or suggestions.

reddit.com
u/Few-Bullfrog3807 — 3 days ago

I got frustrated with my lab's organization

I'm a biology and public health undergraduate who's been doing wet lab research for four years. When I first started it was overwhelming. Protocols full of terms I didn't know, a PI who was too busy to answer every question, and no good way to troubleshoot when something went wrong. I'd reread the same protocol five times and still feel lost.

At some point I started wondering why every other field has integrated tech into its workflows but research still runs on printed protocols, scattered files, and troubleshooting knowledge that lives in people's heads and gets passed down informally.

So, I built something as a side project. A tool that helps with protocol guidance, experiment troubleshooting, and keeping lab resources organized in one place. I built it for myself first. Then showed a few people and they found it useful too.

Not promoting anything. I’m just sharing something I made out of genuine frustration. If you want to try it and give me honest feedback on whether it actually solves a real problem or completely misses the mark, PM me.

reddit.com
u/SuspiciousAide9461 — 5 days ago
▲ 3 r/genomics+1 crossposts

VCF file to annotation

Can someone help me in making a pipeline for VCF file variant annotation , i just know basics of Linux .
If someone knows pls help me !
Thanks in advance

reddit.com
u/boundbyhabits — 8 days ago

Do you know of reliable Direct-to-Consumer Whole Genome Sequencing (WGS)?

I am interested in doing whole genome sequencing (WGS). Does anyone here have any experience, positive or negative, with current DTC providers?

Prior recommendations seem like they aren't a great idea. Nebula has a huge backlog and dubious financial position. Dante labs also seems to be collapsing. Sequencing.com uses Chinese labs currently blacklisted by the DOD. Invitae was bought by LabCorp and no longer DTC. Researcher providers like All of Us Research seem to have stopped providing people with their WGS results.

Some names that do come up that I am curious about: Psomagen, YSEQ, tellmeGen, SelfDecode, Nucleus Genomics, Sano Genetics.

Disclaimer: This is already in collaboration with my doctor. We are looking for some specific things and having them all go through clinical genomic testing is far more expensive than a DTC 30x WGS test. I do not need any assistance with data interpretation, just need reliable raw data. If a major health risk is flagged, I am prepared to do confirmatory clinical testing.

reddit.com
u/MatchaManiak — 9 days ago

Getting sequencing data and insights

I recently had a stillbirth at 24 weeks, and one of the issues associated with the timing of the preterm birth is cervical insufficiency, which could be a genetic thing for some people (ie collagen deficiencies). It’s really hard to tell though because there’s many things associated with preterm birth. However, I am curious and want to dig further by looking into my genetics.

My friends have talked about how they uploaded their 23andMe data to chatGPT and have gotten some findings that resonate with them, which prompt them to take supplements or eat differently or pay attention to different things.

I’m hoping to learn something about my genetic health risks so that my next pregnancy can be the best it can be (of course, I will also see a MFM high risk doctor). I’m wondering what kind of sequencing I should do? I’m worried about doing WGS because it’s too much data for ChatGPT to process. Should I do something smaller? Is 23andMe even around still? What do you guys recommend?

reddit.com
u/Aardvark_Adorable — 10 days ago
▲ 7 r/genomics+1 crossposts

Random Forest Classifier Training for population structure identification QC in a GWAS analysis

Hello,

I am currently performing a GWAS and am at the quality control stage, more precisely at the "ancestry" analysis. My goal is to select a homogeneous subpopulation to prevent population stratification during the subsequent statistical analysis.

To achieve this, I followed the plinkQC tutorial tilted "Training a Random Forest Classifier for Population Structure Identification", using the HapMap Phase III dataset (as suggested in the tutorial).

https://meyer-lab-cshl.github.io/plinkQC/articles/AncestryCheck.html

I trained my model using 77 individuals per subpopulation, which corresponds to the size of the least represented group (MXL).

https://preview.redd.it/f6ved33thl0h1.png?width=564&format=png&auto=webp&s=d815f571391c0ddcc3fcc7cc47d7e2ae5e0bc18d

I chose this approach to avoid class imbalance, which could bias the classifier. However, the estimated OOB (Out-of-Bag) error rate after training is 22.67%, which is too high (I'm going to select CEU subpopulation).

https://preview.redd.it/ptdx80mvhl0h1.png?width=652&format=png&auto=webp&s=50d63b8bcc84d1053e0f22c76e0aeb9096b1a5c3

To improve accuracy, I have explored several approaches :

- Principal Component Analysis: I observed that the accuracy of my model increases as I include more PCs.

https://preview.redd.it/meb314rmhl0h1.png?width=2880&format=png&auto=webp&s=d7f840f96358c75b62a9276d75d4a2c1b4aa2dd9

- Sampling Strategy: Using an equivalent proportion per subpopulation rather than a fixed count to maximize the total number of individuals used for training.

- Reference Panel Uprgade: Replacing HapMap III with 1000 Genomes Project Phase III data, which offers a significantly larger sample size (this is my current focus).

My questions:

1 - Would using 1000 Genome Phase III data significantly imporve the classifier's accuracy compared to HapMap III?

2 - Are the other reference datasets available that might further enhance the model's accuracy?

3 - Is using a proportion of individuals per subpopulation rather that a fixed count considered a valid practice, and does it effectively imporve accuracy?

Note: I should clarify that I am not a ML engineer, I am a Master 2 bioinformatics sutdent . My utlimate objective is to identifiy variants associated with a specific population through statistical analysis, rahter than achieving a perfectly optimized classifier. While I understand that QC is the most critical stage of a GWAS, unfortunately my current deadling do not allow me to spend excessive time on this specific sted. Thank you for taking this into consideration in your response !

reddit.com
u/Mathyato_ — 10 days ago
▲ 2 r/genomics+1 crossposts

Disclaimer! Illustrative DNA does not use official Davidski G25 coordinates.

This is a follow‑up post regarding the Kenyan Kalenjin results. After getting inconsistent outcomes from Illustrative DNA, I decided to dig deeper. It appears they use scaled coordinates, which in my case were highly inaccurate.

After taking due diligence, I purchased the official Davidski G25 coordinates, and the results aligned perfectly with my ethnic background. Honestly, charging $30 for scaled coordinates feels excessive, especially since similar data can be found free of charge on various DNA sites.

If you genuinely want to understand your ethnic background, I strongly recommend buying the official Davidski G25 coordinates and analyzing them with tools like Vahaduo or DNA Genics — it’s cheaper and far more accurate.

For those on a budget, LM Genetics K47 is an excellent alternative, particularly for individuals with African heritage. It’s the only calculator that closely matches my raw G25 results.

Honorable mention: Eurogenes K36 also performs decently.

For comparison, the second slide is the result from using  raw Davidski G25 coordinates on Vahaduo (Global G25 PCA) which is super precise considering my Kalenjin ancestry. The third slide is also using the same official coordinates on the AfroGeno Modern (Unscaled) IY8 calculator on DNA Genics G25 Studio.

TLDR: If you want official coordinates, request them directly from Davidski.

If you come from a well‑referenced population, Illustrative DNA might still be relatively accurate — but not for the price.

https://preview.redd.it/d6zco4kxmb0h1.jpg?width=1290&format=pjpg&auto=webp&s=914a2e7af38d6d878bace4a041e3091319dfa140

https://preview.redd.it/hofowfdzmb0h1.png?width=515&format=png&auto=webp&s=9516f51fed177eb2f6a73833b17ba6cb7dd9c295

https://preview.redd.it/31ov3s5hnb0h1.png?width=796&format=png&auto=webp&s=7931e193254538a7e4a2e5ade5e5c30a072eb977

reddit.com
u/genealogykenya — 11 days ago

Nucleus Genomics experience

I just did whole genome sequencing with Nucleus for me and a few family members. Solid A/A- experience - the whole process took a few weeks (fewer than expected) and we received high quality files that I was able to run through a local genomics pipeline to get detailed analysis for the family.

The - here is for the Nucleus probability reports and analyses, which are OK, but have the detailed information hidden and are presented in too "risk forward" of a way. They also missed a few things that my local genomics pipeline caught.

In any case, for anyone looking to do WGS for their family, as a first-timer who is technical, I thought this was a very solid offering.

reddit.com
u/baalzephon — 13 days ago