A phenome-wide association study of CNVs genotyped from genome sequencing read depth in the UK Biobank
Summary
We developed a read-depth-based approach that allows accurate and scalable copy-number genotyping from genome sequencing data, including mosaic, recurrent, and multiallelic copy-number variants (CNVs) that are difficult to genotype using other methods. We genotyped each 5-kb segment throughout the genome in the UK Biobank cohort and performed phenome-wide association studies (PheWASs) using 13,215 traits under three different association models, identifying 501 CNVs associated with 1,537 traits. Of these, almost 75% were not found by comparable single-nucleotide variant (SNV)-based PheWASs. We detected signals with multiallelic CNVs, including a coding repeat within MUC1 (mucin 1, cell-surface associated) associated with stomach/duodenal polyps (p = 7.7 × 10−24), copy number of AMY1 (amylase alpha) genes associated with denture use (p = 2.4 × 10−29), and a multiallelic coding CNV within NEB, encoding muscle sarcomere protein, associated with muscle mass (p = 9.7 × 10−24). We also identified intergenic CNVs with effects on traits known to be regulated by nearby genes. For example, carriers of rare non-coding deletions ∼100 kb upstream of MC4R, coding mutations in which are the most common cause of monogenic obesity, were, on average, ∼14 kg heavier than control subjects. In some cases, non-coding CNVs encompassed regulatory elements of the adjacent candidate gene. Using burden tests, we identified an excess of rare damaging non-coding SNVs within some of these regulatory elements associated with the same traits observed in CNV carriers. Our study provides a detailed map of functional CNVs, including complex loci that are recalcitrant to other methods, providing numerous insights into their effects on human traits.