u/Auto6890

Finding protein sequence clusters and motifs

I have about 100,000 20-30 amino acid sequences and I want to find clusters and motifs like A-X-P-G-X-N or anything of the sort, and each cluster/motif must have at least 100 members in it. What is the best way to go about it?

ChatGPT suggested MMseqs2 then MEME. I already converted the excel file to CSV then FASTA and I think the clustering worked with MMseqs2, but now I’m struggling to extract the clusters and transfer it to MEME

reddit.com
u/Auto6890 — 13 days ago