Affiliations 

  • 1 Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, 50603, Malaysia
  • 2 Institute of Mathematical Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, 50603, Malaysia. [email protected]
BMC Bioinformatics, 2017 12 28;18(Suppl 16):575.
PMID: 29297307 DOI: 10.1186/s12859-017-1974-4

Abstract

BACKGROUND: In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis.

RESULTS: We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data.

CONCLUSIONS: Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.