Research on the role of copy number variations (CNVs) in the genetic risk of diseases in Asian populations has been hampered by a relative lack of reference CNV maps for Asian populations outside the East Asians. In this article, we report the population characteristics of CNVs in Chinese, Malay, and Asian Indian populations in Singapore. Using the Illumina Human 1M Beadchip array, we identify 1,174 CNV loci in these populations that corroborated with findings when the same samples were typed on the Affymetrix 6.0 platform. We identify 441 novel loci not previously reported in the Database of Genomic Variations (DGV). We observe a considerable number of loci that span all three populations and were previously unreported, as well as population-specific loci that are quite common in the respective populations. From this we observe the distribution of CNVs in the Asian Indian population to be considerably different from the Chinese and Malay populations. About half of the deletion loci and three-quarters of duplication loci overlap UCSC genes. Tens of loci show population differentiation and overlap with genes previously known to be associated with genetic risk of diseases. One of these loci is the CYP2A6 deletion, previously linked to reduced susceptibility to lung cancer.
Genetic diseases are a pressing global health problem that requires comprehensive access to basic clinical and genetic data to counter. The creation of regional and international databases that can be easily accessed by clinicians and diagnostic labs will greatly improve our ability to accurately diagnose and treat patients with genetic disorders. The Human Variome Project is currently working in conjunction with human genetics societies to achieve this by establishing systems to collect every mutation reported by a diagnostic laboratory, clinic, or research laboratory in a country and store these within a national repository, or HVP Country Node. Nodes have already been initiated in Australia, Belgium, China, Egypt, Malaysia, and Kuwait. Each is examining how to systematically collect and share genetic, clinical, and biochemical information in a country-specific manner that is sensitive to local ethical and cultural issues. This article gathers cases of genetic data collection within countries and takes recommendations from the global community to develop a procedure for countries wishing to establish their own collection system as part of the Human Variome Project. We hope this may lead to standard practices to facilitate global collection of data and allow efficient use in clinical practice, research and therapy.
There is a need for country/population-specific databases because the existence of population-specific mutations for single gene disorders is well documented, and there is also good evidence for ethnic differences in the frequencies of genetic variations involved in complex disorders. Thus the Singapore Human Mutation/Polymorphism Database (SHMPD) was created to provide clinicians and scientists access to a central genetic database for the Singapore population. The data catalogued in the database include mutations identified in Singapore for Mendelian diseases, and frequencies of polymorphisms that have been investigated in either healthy controls or samples associated with specific phenotypes. Data from journal articles identified by searches in PubMed and other online resources, and via personal communications with researchers were compiled and assembled into a single database. Genes are categorized alphabetically and are also searchable by name and disease. The information provided for each variant of the gene includes the protein encoded, phenotype association, gender, size, and ethnic origin of the sample, as well as the reported genotype and allele frequencies, and direct links to the corresponding abstracts on PubMed. Our database will facilitate molecular diagnosis of Mendelian disorders and improve study designs for complex traits. It will be useful not only for researchers in Singapore, but also for those in countries with similar ethnic backgrounds, such as China, Taiwan, Hong Kong, Indonesia, and Malaysia.
The mutation spectrum of the BRCA1 gene among ethnic groups from Asia has not been well studied. We investigated the frequency of mutations in the BRCA1 gene among Malay breast cancer patients from Singapore, independent of family history. By using the protein truncation test (PTT) and direct sequencing, BRCA1 mutations were detected in 6 of 49 (12.2%) unrelated patients. Four novel missense mutations in exon 11, T557A (1788A>G), T582A (1863A>G), N656S (2086A>G) and P684S (2169C>T) were identified in one patient. Two patients had missense mutations in exon 23, V1809A (5545T>C), which has been previously detected in individuals from Central and Eastern Europe. Three unrelated patients had the deleterious 2846insA frameshift mutation in exon 11. Methylation specific PCR (MSP) of the promoter region of the BRCA1 gene detected hypermethylation of tumor DNA in an additional 2 patients. Haplotype analysis using the microsatellite markers D17S855, D17S1323 and D17S1325 revealed a common haplotype for the three unrelated patients and their three relatives with the 2846insA mutation. These findings strongly suggest that the 2846insA mutation, the most common deleterious mutation in this study, may possibly be a founder mutation in breast cancer patients of Malay ethnic background.
We screened 38 G6PD-deficient male Chinese neonates for known G6PD mutations using established PCR-based techniques. We found 50.0% (19 of 38) were mutation 1376G>T, 34.2% (13 of 38) were mutation 1388G>A, 5.2% (2 of 38 ) were mutation 95A>G and 2.2% (1 of 38) was mutation 1024C>T. In 7% (3 of 38) of the cases the mutations remained uncharacterised. Sixty three percent (24 of 38) of the G6PD deficient neonates had neonatal jaundice with 28.9 % (11 of 38) developing moderate to severe hyperbilirubinemia. The group of neonates with 1388 mutation showed the highest incidence of moderate to severe hyperbilirubinemia requiring phototherapy and/or exchange transfusion respectively. Majority (70%) of the G6PD deficient neonates showed severe enzyme deficiency. However, there was no meaningful association between the level of enzyme activity and the severity of neonatal jaundice. In summary, four mutations account for more than 90% of the G6PD deficiency cases among the Chinese in Malaysia and the pattern of distribution of the molecular variants is similar to those found among the Chinese in Taiwan and southern mainland China. Our findings also suggest the possible association of nt 1388 mutation with severe neonatal jaundice.
Beta-thalassemia major is one of the commonest genetic disorders in South-East Asia. The spectrum of beta-thalassemia mutations in the various ethnic sub-populations on the island of Borneo is unknown. We studied 20 Dusun children from the East Malaysian state of Sabah (North Borneo) with a severe beta-thalassemia major phenotype, using a combination of Southern analysis, polymerase chain reaction analysis and direct sequencing. We found the children to be homozygous for a large deletion, which has a 5' breakpoint at position -4279 from the cap site of the beta-globin gene (HBB) with the 3' breakpoint located in a L1 family of repetitive sequences at an unknown distance from the beta-globin gene. This was similar to a recent finding of a large deletion causing beta-thalassemia first described in unrelated beta-thalassemia heterozygotes of Filipino descent. This report describes the first 20 families with homozygosity of the deletion causing a severe phenotype. It provides the first information on the molecular epidemiology of beta-thalassemia in Sabah. This finding has implications for the population genetics and preventative strategies for beta-thalassemia major for nearly 300 million individuals in South-East Asia.
We performed DNA analysis using cord blood samples on 86 male Malay neonates diagnosed as G6PD deficiency in the National University of Malaysia Hospital by a combination of rapid PCR-based techniques, single-stranded conformation polymorphism analysis (SSCP) and DNA sequencing. We found 37.2% were 871G>A (G6PD Viangchan), 26.7% were nt 563 C>T (G6PD Mediterranean) and 15.1% were 487G>A (G6PD Mahidol) followed by 4.7% 1376G>T (G6PD Canton), 3.5% 383T>C (G6PD Vanua Lava), 3.5% 592C>T (G6PD Coimbra), 2.3% 1388G>A (G6PD Kaiping), 2.3% 1360C>T (G6PD Union), 2.3% 1003G>A (G6PD Chatham), 1.2% 131C>G (G6PD Orissa) and 1.2% 1361G>A (G6PD Andalus). Seventy-one (82.6%) of the 86 G6PD-deficient neonates had neonatal jaundice. Fifty seven (80%) of the 71 neonates with jaundice required phototherapy with only one neonate progressing to severe hyperbilirubinemia (serum bilirubin >340 micromol/l) requiring exchange transfusion. There was no significant difference in the incidence of neonatal jaundice, mean serum bilirubin level, mean age for peak serum bilirubin, percentage of babies requiring phototherapy and mean number of days of phototherapy between the three common variants. In conclusion, the molecular defects of Malay G6PD deficiency is heterogeneous and G6PD Viangchan, Mahidol and Mediterranean account for at least 80% of the cases. Our findings support the observation that G6PD Viangchan and Mahidol are common Southeast Asian variants. Their presence in the Malays suggests a common ancestral origin with the Cambodians, Laotians and Thais. Our findings together with other preliminary data on the presence of the Mediterranean variant in this region provide evidence of strong Arab influence in the Malay Archipelago.
The prevalence and spectrum of germline mutations in BRCA1 and BRCA2 have been reported in single populations, with the majority of reports focused on White in Europe and North America. The Consortium of Investigators of Modifiers of BRCA1/2 (CIMBA) has assembled data on 18,435 families with BRCA1 mutations and 11,351 families with BRCA2 mutations ascertained from 69 centers in 49 countries on six continents. This study comprehensively describes the characteristics of the 1,650 unique BRCA1 and 1,731 unique BRCA2 deleterious (disease-associated) mutations identified in the CIMBA database. We observed substantial variation in mutation type and frequency by geographical region and race/ethnicity. In addition to known founder mutations, mutations of relatively high frequency were identified in specific racial/ethnic or geographic groups that may reflect founder mutations and which could be used in targeted (panel) first pass genotyping for specific populations. Knowledge of the population-specific mutational spectrum in BRCA1 and BRCA2 could inform efficient strategies for genetic testing and may justify a more broad-based oncogenetic testing in some populations.
Many algorithms to detect copy number variations (CNVs) using exome sequencing (ES) data have been reported and evaluated on their sensitivity and specificity, reproducibility, and precision. However, operational optimization of such algorithms for a better performance has not been fully addressed. ES of 1199 samples including 763 patients with different disease profiles was performed. ES data were analyzed to detect CNVs by both the eXome Hidden Markov Model (XHMM) and modified Nord's method. To efficiently detect rare CNVs, we aimed to decrease sequencing biases by analyzing, at the same time, the data of all unrelated samples sequenced in the same flow cell as a batch, and to eliminate sex effects of X-linked CNVs by analyzing female and male sequences separately. We also applied several filtering steps for more efficient CNV selection. The average number of CNVs detected in one sample was <5. This optimization together with targeted CNV analysis by Nord's method identified pathogenic/likely pathogenic CNVs in 34 patients (4.5%, 34/763). In particular, among 142 patients with epilepsy, the current protocol detected clinically relevant CNVs in 19 (13.4%) patients, whereas the previous protocol identified them in only 14 (9.9%) patients. Thus, this batch-based XHMM analysis efficiently selected rare pathogenic CNVs in genetic diseases.
We report heterozygous CELF2 (NM_006561.3) variants in five unrelated individuals: Individuals 1-4 exhibited developmental and epileptic encephalopathy (DEE) and Individual 5 had intellectual disability and autistic features. CELF2 encodes a nucleocytoplasmic shuttling RNA-binding protein that has multiple roles in RNA processing and is involved in the embryonic development of the central nervous system and heart. Whole-exome sequencing identified the following CELF2 variants: two missense variants [c.1558C>T:p.(Pro520Ser) in unrelated Individuals 1 and 2, and c.1516C>G:p.(Arg506Gly) in Individual 3], one frameshift variant in Individual 4 that removed the last amino acid of CELF2 c.1562dup:p.(Tyr521Ter), possibly resulting in escape from nonsense-mediated mRNA decay (NMD), and one canonical splice site variant, c.272-1G>C in Individual 5, also probably leading to NMD. The identified variants in Individuals 1, 2, 4, and 5 were de novo, while the variant in Individual 3 was inherited from her mosaic mother. Notably, all identified variants, except for c.272-1G>C, were clustered within 20 amino acid residues of the C-terminus, which might be a nuclear localization signal. We demonstrated the extranuclear mislocalization of mutant CELF2 protein in cells transfected with mutant CELF2 complementary DNA plasmids. Our findings indicate that CELF2 variants that disrupt its nuclear localization are associated with DEE.
The human amylase gene locus at chromosome 1p21.1 is structurally complex. This region contains two pancreatic amylase genes, AMY2B, AMY2A, and a salivary gene AMY1. The AMY1 gene harbors extensive copy number variation (CNV), and recent studies have implicated this variation in adaptation to starch-rich diets and in association to obesity for European and Asian populations. In this study, we showed that by combining quantitative PCR and digital PCR, coupled with careful experimental design and calibration, we can improve the resolution of genotyping CNV with high copy numbers (CNs). In two East Asian populations of Chinese and Malay ethnicity studied, we observed a unique non-normal distribution of AMY1 diploid CN genotypes with even:odd CNs ratio of 4.5 (3.3-4.7), and an association between the common AMY2A CN = 2 genotype and odd CNs of AMY1, that could be explained by the underlying haplotypic structure. In two further case-control cohorts (n = 932 and 145, for Chinese and Malays, respectively), we did not observe the previously reported association between AMY1 and obesity or body mass index. Improved methods for accurately genotyping multiallelic CNV loci and understanding the haplotype complexity at the AMY1 locus are necessary for population genetics and association studies.
Accurate and consistent interpretation of sequence variants is integral to the delivery of safe and reliable diagnostic genetic services. To standardize the interpretation process, in 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published a joint guideline based on a set of shared standards for the classification of variants in Mendelian diseases. The generality of these standards and their subjective interpretation between laboratories has prompted efforts to reduce discordance of variant classifications, with a focus on the expert specification of the ACMG/AMP guidelines for individual genes or diseases. Herein, we describe our experience as a ClinGen Variant Curation Expert Panel to adapt the ACMG/AMP criteria for the classification of variants in three globin genes (HBB, HBA2, and HBA1) related to recessively inherited hemoglobinopathies, including five evidence categories, as use cases demonstrating the process of specification and the underlying rationale.
The discovery of high-risk breast cancer susceptibility genes, such as Breast cancer associated gene 1 (BRCA1) and Breast cancer associated gene 2 (BRCA2) has led to accurate identification of individuals for risk management and targeted therapy. The rapid decline in sequencing costs has tremendously increased the number of individuals who are undergoing genetic testing world-wide. However, given the significant differences in population-specific variants, interpreting the results of these tests can be challenging especially for novel genetic variants in understudied populations. Here we report the characterization of novel variants in the Malaysian and Singaporean population that consist of different ethnic groups (Malays, Chinese, Indian, and other indigenous groups). We have evaluated the functional significance of 14 BRCA2 variants of uncertain clinical significance by using multiple in silico prediction tools and examined their frequency in a cohort of 7840 breast cancer cases and 7928 healthy controls. In addition, we have used a mouse embryonic stem cell (mESC)-based functional assay to assess the impact of these variants on BRCA2 function. We found these variants to be functionally indistinguishable from wild-type BRCA2. These variants could fully rescue the lethality of Brca2-null mESCs and exhibited no sensitivity to six different DNA damaging agents including a poly ADP ribose polymerase inhibitor. Our findings strongly suggest that all 14 evaluated variants are functionally neutral. Our findings should be valuable in risk assessment of individuals carrying these variants.
Although the spliceogenic nature of the BRCA2 c.68-7T > A variant has been demonstrated, its association with cancer risk remains controversial. In this study, we accurately quantified by real-time PCR and digital PCR (dPCR), the BRCA2 isoforms retaining or missing exon 3. In addition, the combined odds ratio for causality of the variant was estimated using genetic and clinical data, and its associated cancer risk was estimated by case-control analysis in 83,636 individuals. Co-occurrence in trans with pathogenic BRCA2 variants was assessed in 5,382 families. Exon 3 exclusion rate was 4.5-fold higher in variant carriers (13%) than controls (3%), indicating an exclusion rate for the c.68-7T > A allele of approximately 20%. The posterior probability of pathogenicity was 7.44 × 10-115 . There was neither evidence for increased risk of breast cancer (OR 1.03; 95% CI 0.86-1.24) nor for a deleterious effect of the variant when co-occurring with pathogenic variants. Our data provide for the first time robust evidence of the nonpathogenicity of the BRCA2 c.68-7T > A. Genetic and quantitative transcript analyses together inform the threshold for the ratio between functional and altered BRCA2 isoforms compatible with normal cell function. These findings might be exploited to assess the relevance for cancer risk of other BRCA2 spliceogenic variants.
Skipping of BRCA2 exon 3 (∆E3) is a naturally occurring splicing event, complicating clinical classification of variants that may alter ∆E3 expression. This study used multiple evidence types to assess pathogenicity of 85 variants in/near BRCA2 exon 3. Bioinformatically predicted spliceogenic variants underwent mRNA splicing analysis using minigenes and/or patient samples. ∆E3 was measured using quantitative analysis. A mouse embryonic stem cell (mESC) based assay was used to determine the impact of 18 variants on mRNA splicing and protein function. For each variant, population frequency, bioinformatic predictions, clinical data, and existing mRNA splicing and functional results were collated. Variant class was assigned using a gene-specific adaptation of ACMG/AMP guidelines, following a recently proposed points-based system. mRNA and mESC analysis combined identified six variants with transcript and/or functional profiles interpreted as loss of function. Cryptic splice site use for acceptor site variants generated a transcript encoding a shorter protein that retains activity. Overall, 69/85 (81%) variants were classified using the points-based approach. Our analysis shows the value of applying gene-specific ACMG/AMP guidelines using a points-based approach and highlights the consideration of cryptic splice site usage to appropriately assign PVS1 code strength.
A large number of variants identified through clinical genetic testing in disease susceptibility genes, are of uncertain significance (VUS). Following the recommendations of the American College of Medical Genetics and Genomics (ACMG) and Association for Molecular Pathology (AMP), the frequency in case-control datasets (PS4 criterion), can inform their interpretation. We present a novel case-control likelihood ratio-based method that incorporates gene-specific age-related penetrance. We demonstrate the utility of this method in the analysis of simulated and real datasets. In the analyses of simulated data, the likelihood ratio method was more powerful compared to other methods. Likelihood ratios were calculated for a case-control dataset of BRCA1 and BRCA2 variants from the Breast Cancer Association Consortium (BCAC), and compared with logistic regression results. A larger number of variants reached evidence in favor of pathogenicity, and a substantial number of variants had evidence against pathogenicity - findings that would not have been reached using other case-control analysis methods. Our novel method provides greater power to classify rare variants compared to classical case-control methods. As an initiative from the ENIGMA Analytical Working Group, we provide user-friendly scripts and pre-formatted excel calculators for implementation of the method for rare variants in BRCA1, BRCA2 and other high-risk genes with known penetrance.
The multifactorial likelihood analysis method has demonstrated utility for quantitative assessment of variant pathogenicity for multiple cancer syndrome genes. Independent data types currently incorporated in the model for assessing BRCA1 and BRCA2 variants include clinically calibrated prior probability of pathogenicity based on variant location and bioinformatic prediction of variant effect, co-segregation, family cancer history profile, co-occurrence with a pathogenic variant in the same gene, breast tumor pathology, and case-control information. Research and clinical data for multifactorial likelihood analysis were collated for 1,395 BRCA1/2 predominantly intronic and missense variants, enabling classification based on posterior probability of pathogenicity for 734 variants: 447 variants were classified as (likely) benign, and 94 as (likely) pathogenic; and 248 classifications were new or considerably altered relative to ClinVar submissions. Classifications were compared with information not yet included in the likelihood model, and evidence strengths aligned to those recommended for ACMG/AMP classification codes. Altered mRNA splicing or function relative to known nonpathogenic variant controls were moderately to strongly predictive of variant pathogenicity. Variant absence in population datasets provided supporting evidence for variant pathogenicity. These findings have direct relevance for BRCA1 and BRCA2 variant evaluation, and justify the need for gene-specific calibration of evidence types used for variant classification.