▲ 6 r/bioinformatics
Finding protein sequence clusters and motifs
I have about 100,000 20-30 amino acid sequences and I want to find clusters and motifs like A-X-P-G-X-N or anything of the sort, and each cluster/motif must have at least 100 members in it. What is the best way to go about it?
ChatGPT suggested MMseqs2 then MEME. I already converted the excel file to CSV then FASTA and I think the clustering worked with MMseqs2, but now I’m struggling to extract the clusters and transfer it to MEME
u/Auto6890 — 13 days ago