Affiliations 

  • 1 Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
  • 2 European Molecular Biology Laboratory, Genome Biology Unit, 69117, Heidelberg, Germany
  • 3 Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
  • 4 The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
  • 5 European Research Institute for the Biology of Ageing, University of Groningen, University Medical Centre Groningen, Groningen, AV, NL-9713, The Netherlands
  • 6 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
  • 7 Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
  • 8 The School of Life Science and Technology of Xi'an Jiaotong University, 710049, Xi'an, China
  • 9 Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA, 02114, USA
  • 10 Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
  • 11 Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
  • 12 Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
  • 13 European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
  • 14 Yale University Medical School, Computational Biology and Bioinformatics Program, New Haven, CT, 06520, USA
  • 15 Biochemistry and Molecular Medicine, University of California Davis, Davis, CA, 95616, USA
  • 16 USTAR Center for Genetic Discovery and Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT, 84112, USA
  • 17 Pacific Biosciences, Menlo Park, CA, 94025, USA
  • 18 Bionano Genomics, San Diego, CA, 92121, USA
  • 19 Beyster Center for Genomics of Psychiatric Diseases, Department of Psychiatry University of California San Diego, La Jolla, CA, 92093, USA
  • 20 10X Genomics, Pleasanton, CA, 94566, USA
  • 21 Illumina Clinical Services Laboratory, Illumina, Inc., 5200 Illumina Way, San Diego, CA, 92122, USA
  • 22 Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, 92093, USA
  • 23 DNA Link, Seodaemun-gu, Seoul, South Korea
  • 24 TreeCode Sdn Bhd, Bandar Botanic, 41200, Klang, Malaysia
  • 25 Ludwig Institute for Cancer Research, La Jolla, CA, 92093, USA
  • 26 School of Biomedical Engineering, Drexel University, Philadelphia, PA, 19104, USA
  • 27 Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77225, USA
  • 28 Department of Medicine, McDonnell Genome Institute, Siteman Cancer Center, Washington University School of Medicine, St. Louis, MI, 63108, USA
  • 29 High Impact Research, University of Malaya, 50603, Kuala Lumpur, Malaysia
  • 30 Institute for Human Genetics, University of California-San Francisco, San Francisco, CA, 94143, USA
  • 31 MOE Key Lab for Intelligent Networks & Networks Security, School of Electronics and Information Engineering, Xi'an Jiaotong University, 710049, Xi'an, China
  • 32 Center for Bioinformatics, Saarland University and the Max Planck Institute for Informatics, 66123, Saarbrücken, Germany
  • 33 European Molecular Biology Laboratory, Genome Biology Unit, 69117, Heidelberg, Germany. [email protected]
  • 34 Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA. [email protected]
  • 35 The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA. [email protected]
Nat Commun, 2019 04 16;10(1):1784.
PMID: 30992455 DOI: 10.1038/s41467-018-08148-z

Abstract

The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.