Brainstorming new project ideas that involve ML, bio, and visualization. Any ideas?
Hey everyone, I'm an MS student in Applied Data Science with a computational biology background (wet lab → dry lab pipeline). I'm looking to build some portfolio projects that are actually employable and not just interesting to me.
I've worked with:
Python, R, Altair, Streamlit, Nextflow, nf-core, AlphaFold, FoldSeek, DIAMOND-blastp, Kmerseek, Sourmash, PyMC, Altair, SQLite, PostgreSQL, Deno KV, SPARQL, scikit-learn, pandas, numpy, bash, git, AWS EC2, Docker, Jupyter, ggplot2, tidyverse, Opentrons, ImageJ.
I'm genuinely interested in biological data visualization like employing Tufte, or grammar of graphics with ML. I also enjoy running models of molecular mechanisms and making the outputs intuitive and communicable for wet lab scientists.
I've been thinking about:
- End-to-end Nextflow pipeline on a public proteomics dataset with an Altair based QC/annotation dashboard as output
- ESM-2 protein embeddings visualization colored by functional annotation or taxonomy
- Benchmarking k-mer vs alignment-based annotation approaches with interactive visual output
Would love to hear what you'd prioritize or what problems you're actually running into that could use a tool such as I am describing. Open to totally different directions too!