In metabolomics, identification of metabolic pathways altered by disease, genetics, or environmental perturbations is crucial to uncover the underlying biological mechanisms. A number of pathway analysis methods are currently available, which are generally based on equal-probability, topological-centrality, or model-separability methods. In brief, prior identification of significant metabolites is needed for the first two types of methods, while each pathway is modeled separately in the model-separability-based methods. In these methods, interactions between metabolic pathways are not taken into consideration. The current study aims to develop a novel metabolic pathway identification method based on multi-block partial least squares (MB-PLS) analysis by including all pathways into a global model to facilitate biological interpretation. The detected metabolites are first assigned to pathway blocks based on their roles in metabolism as defined by the KEGG pathway database. The metabolite intensity or concentration data matrix is then reconstructed as data blocks according to the metabolite subsets. Then, a MB-PLS model is built on these data blocks. A new metric, named the pathway importance in projection (PIP), is proposed for evaluation of the significance of each metabolic pathway for group separation. A simulated dataset was generated by imposing artificial perturbation on four pre-defined pathways of the healthy control group of a colorectal cancer study. Performance of the proposed method was evaluated and compared with seven other commonly used methods using both an actual metabolomics dataset and the simulated dataset. For the real metabolomics dataset, most of the significant pathways identified by the proposed method were found to be consistent with the published literature. For the simulated dataset, the significant pathways identified by the proposed method are highly consistent with the pre-defined pathways. The experimental results demonstrate that the proposed method is effective for identification of significant metabolic pathways, which may facilitate biological interpretation of metabolomics data.
Metabolite set enrichment analysis (MSEA) has gained increasing research interest for identification of perturbed metabolic pathways in metabolomics. The method incorporates predefined metabolic pathways information in the analysis where metabolite sets are typically assumed to be mutually exclusive to each other. However, metabolic pathways are known to contain common metabolites and intermediates. This situation, along with limitations in metabolite detection or coverage leads to overlapping, incomplete metabolite sets in pathway analysis. For overlapping metabolite sets, MSEA tends to result in high false positives due to improper weights allocated to the overlapping metabolites. Here, we proposed an extended partial least squares (PLS) model with a new sparse scheme for overlapping metabolite set enrichment analysis, named overlapping group PLS (ogPLS) analysis. The weight vector of the ogPLS model was decomposed into pathway-specific subvectors, and then a group lasso penalty was imposed on these subvectors to achieve a proper weight allocation for the overlapping metabolites. Two strategies were adopted in the proposed ogPLS model to identify the perturbed metabolic pathways. The first strategy involves debiasing regularization, which was used to reduce inequalities amongst the predefined metabolic pathways. The second strategy is stable selection, which was used to rank pathways while avoiding the nuisance problems of model parameter optimization. Both simulated and real-world metabolomic datasets were used to evaluate the proposed method and compare with two other MSEA methods including Global-test and the multiblock PLS (MB-PLS)-based pathway importance in projection (PIP) methods. Using a simulated dataset with known perturbed pathways, the average true discovery rate for the ogPLS method was found to be higher than the Global-test and the MB-PLS-based PIP methods. Analysis with a real-world metabolomics dataset also indicated that the developed method was less prone to select pathways with highly overlapped detected metabolite sets. Compared with the two other methods, the proposed method features higher accuracy, lower false-positive rate, and is more robust when applied to overlapping metabolite set analysis. The developed ogPLS method may serve as an alternative MSEA method to facilitate biological interpretation of metabolomics data for overlapping metabolite sets.
In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputation remains a challenging problem. Here, we propose a non-negative matrix factorization (NMF)-based method for NA imputation in MS-based metabolomics data, which makes use of both global and local information of the data. The proposed method was compared with three commonly used methods: k-nearest neighbors (kNN), random forest (RF), and outlier-robust (ORI) missing values imputation. These methods were evaluated from the perspectives of accuracy of imputation, retrieval of data structures, and rank of imputation superiority. The experimental results showed that the NMF-based method is well-adapted to various cases of data missingness and the presence of outliers in MS-based metabolic profiles. It outperformed kNN and ORI and showed results comparable with the RF method. Furthermore, the NMF method is more robust and less susceptible to outliers as compared with the RF method. The proposed NMF-based scheme may serve as an alternative NA imputation method which may facilitate biological interpretations of metabolomics data.
Drug combinations are commonly used to treat various diseases to achieve synergistic therapeutic effects or to alleviate drug resistance. Nevertheless, some drug combinations might lead to adverse effects, and thus, it is crucial to explore the mechanisms of drug interactions before clinical treatment. Generally, drug interactions have been studied using nonclinical pharmacokinetics, toxicology, and pharmacology. Here, we propose a complementary strategy based on metabolomics, which we call interaction metabolite set enrichment analysis, or iMSEA, to decipher drug interactions. First, a digraph-based heterogeneous network model was constructed to model the biological metabolic network based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Second, treatment-specific influences on all detected metabolites were calculated and propagated across the whole network model. Third, pathway activity was defined and enriched to quantify the influence of each treatment on the predefined functional metabolite sets, i.e., metabolic pathways. Finally, drug interactions were identified by comparing the pathway activity enriched by the drug combination treatments and the single drug treatments. A data set consisting of hepatocellular carcinoma (HCC) cells that were treated with oxaliplatin (OXA) and/or vitamin C (VC) was used to illustrate the effectiveness of the iMSEA strategy for evaluation of drug interactions. Performance evaluation using synthetic noise data was also performed to evaluate sensitivities and parameter settings for the iMSEA strategy. The iMSEA strategy highlighted synergistic effects of combined OXA and VC treatments including the alterations in the glycerophospholipid metabolism pathway and glycine, serine, and threonine metabolism pathway. This work provides an alternative method to reveal the mechanisms of drug combinations from the viewpoint of metabolomics.
Metabolic pathways are regarded as functional and basic components of the biological system. In metabolomics, metabolite set enrichment analysis (MSEA) is often used to identify the altered metabolic pathways (metabolite sets) associated with phenotypes of interest (POI), e.g., disease. However, in most studies, MSEA suffers from the limitation of low metabolite coverage. Random walk (RW)-based algorithms can be used to propagate the perturbation of detected metabolites to the undetected metabolites through a metabolite network model prior to MSEA. Nevertheless, most of the existing RW-based algorithms run on a general metabolite network constructed based on public databases, such as KEGG, without taking into consideration the potential influence of POI on the metabolite network, which may reduce the phenotypic specificities of the MSEA results. To solve this problem, a novel pathway analysis strategy, namely, differential correlation-informed MSEA (dci-MSEA), is proposed in this paper. Statistically, differential correlations between metabolites are used to evaluate the influence of POI on the metabolite network, so that a phenotype-specific metabolite network is constructed for RW-based propagation. The experimental results show that dci-MSEA outperforms the conventional RW-based MSEA in identifying the altered metabolic pathways associated with colorectal cancer. In addition, by incorporating the individual-specific metabolite network, the dci-MSEA strategy is easily extended to disease heterogeneity analysis. Here, dci-MSEA was used to decipher the heterogeneity of colorectal cancer. The present results highlight the clustering of colorectal cancer samples with their cluster-specific selection of differential pathways and demonstrate the feasibility of dci-MSEA in heterogeneity analysis. Taken together, the proposed dci-MSEA may provide insights into disease mechanisms and determination of disease heterogeneity.