AI-Assisted Oncology Variant Reconciliation Platform — Seeking Technical & Clinical Feedback
Hi everyone,
I’m organizing a small team project for an AI/healthcare innovation competition focused on oncology molecular data interoperability and reconciliation.
Our proposed project is:
OncoReconcile AI
An AI-assisted platform designed to standardize and reconcile oncology genomic information across:
- VCF files
- molecular pathology PDF reports
- vendor-specific biomarker formats
- structured clinical/genomic data
The goal is to transform fragmented molecular oncology data into explainable, standardized, and interoperable outputs that could support:
- molecular tumor board workflows
- cohort generation
- downstream analytics
- clinical research
- interoperability pipelines
Current Technical Direction
We are exploring a hybrid architecture combining:
- HGNC gene normalization
- HGVS variant normalization
- ontology-grounded mappings
- biomedical NLP / entity extraction
- LLM-assisted reconciliation
- explainable confidence scoring
- human-in-the-loop review workflows
Potential standards/tools under evaluation include:
- HL7 FHIR / mCODE
- ClinVar / ClinGen
- HGVS
- BioBERT / SciSpacy
- RAG-based architectures
Current MVP Scope
To keep the project realistic for a small team and limited timeline, we are likely focusing on:
- NSCLC initially
- a limited hotspot gene set (EGFR, KRAS, ALK, BRAF, etc.)
- 2–3 molecular vendor formats
- PDF + VCF reconciliation workflows
Feedback We Are Looking For
We would greatly appreciate feedback from people working in:
- oncology informatics
- molecular pathology
- bioinformatics
- clinical genomics
- healthcare interoperability
- biomedical NLP
- precision medicine platforms
Especially around:
- Common real-world reconciliation pain points
- Vendor-specific genomic reporting inconsistencies
- Explainability and validation expectations
- Existing open-source tools/frameworks we should evaluate
- Clinical workflow considerations we may overlook
- FHIR/mCODE/genomics interoperability best practices
- Public datasets suitable for realistic MVP development
We are intentionally positioning this as:
- AI-assisted,
- explainable,
- standards-aligned,
- human-reviewed,
rather than fully autonomous interpretation.
Thanks in advance for any guidance, references, or suggestions.