To date, no genome of any of the species from the genus Spiroplasma has been completely sequenced. Long repetitive sequences similar to mobile units present a major obstacle for current genome sequencing technologies. Here, we report the assembly of the Spiroplasma melliferum KC3 genome into 4 contigs, followed by proteogenomic annotation and metabolic reconstruction based on the discovery of 521 expressed proteins and comprehensive metabolomic profiling. A systems approach allowed us to elucidate putative pathogenicity mechanisms and to discover major virulence factors, such as Chitinase utilization enzymes and toxins never before reported for insect pathogenic spiroplasmas.
In the rapidly growing economies of Asia and Oceania, food security has become a primary concern. With the rising population, growing more food at affordable prices is becoming even more important. In addition, the predicted climate change will lead to drastic changes in global surface temperature and changes in rainfall patterns that in turn will pose a serious threat to plant vegetation worldwide. As a result, understanding how plants will survive in a changing climate will be increasingly important. Such challenges require integrated approaches to increase agricultural production and cope with environmental threats. Proteomics can play a role in unraveling the underlying mechanisms for food production to address the growing demand for food. In this review, the current status of food crop proteomics is discussed, especially in regard to the Asia and Oceania regions. Furthermore, the future perspective in relation to proteomic techniques for the important food crops is highlighted.
Tumorigenesis involves a complex interplay between genetically modified cancer cells and their adjacent normal tissue, the stroma. We used an established breast cancer mouse model to investigate this inter-relationship. Conditional activation of Rho-associated protein kinase (ROCK) in a model of mammary tumorigenesis enhances tumor growth and progression by educating the stroma and enhancing the production and remodeling of the extracellular matrix. We used peptide matrix-assisted laser desorption/ionization mass spectrometry imaging (MALDI-MSI) to quantify the proteomic changes occurring within tumors and their stroma in their regular spatial context. Peptides were ranked according to their ability to discriminate between the two groups, using a receiver operating characteristic tool. Peptides were identified by liquid chromatography tandem mass spectrometry, and protein expression was validated by quantitative immunofluorescence using an independent set of tumor samples. We have identified and validated four key proteins upregulated in ROCK-activated mammary tumors relative to those expressing kinase-dead ROCK, namely, collagen I, α-SMA, Rab14, and tubulin-β4. Rab14 and tubulin-β4 are expressed within tumor cells, whereas collagen I is localized within the stroma. α-SMA is predominantly localized within the stroma but is also expressed at higher levels in the epithelia of ROCK-activated tumors. High expression of COL1A, the gene encoding the pro-α 1 chain of collagen, correlates with cancer progression in two human breast cancer genomic data sets, and high expression of COL1A and ACTA2 (the gene encoding α-SMA) are associated with a low survival probability (COLIA, p = 0.00013; ACTA2, p = 0.0076) in estrogen receptor-negative breast cancer patients. To investigate whether ROCK-activated tumor cells cause stromal cancer-associated fibroblasts (CAFs) to upregulate expression of collagen I and α-SMA, we treated CAFs with medium conditioned by primary mammary tumor cells in which ROCK had been activated. This led to abundant production of both proteins in CAFs, clearly highlighting the inter-relationship between tumor cells and CAFs and identifying CAFs as the potential source of high levels of collagen 1 and α-SMA and associated enhancement of tissue stiffness. Our research emphasizes the capacity of MALDI-MSI to quantitatively assess tumor-stroma inter-relationships and to identify potential prognostic factors for cancer progression in human patients, using sophisticated mouse cancer models.
Identification of phosphorylation sites is an important step in the function study and drug design of proteins. In recent years, there have been increasing applications of the computational method in the identification of phosphorylation sites because of its low cost and high speed. Most of the currently available methods focus on using local information around potential phosphorylation sites for prediction and do not take the global information of the protein sequence into consideration. Here, we demonstrated that the global information of protein sequences may be also critical for phosphorylation site prediction. In this paper, a new deep neural network model, called DeepPSP, was proposed for the prediction of protein phosphorylation sites. In the DeepPSP model, two parallel modules were introduced to extract both local and global features from protein sequences. Two squeeze-and-excitation blocks and one bidirectional long short-term memory block were introduced into each module to capture effective representations of the sequences. Comparative studies were carried out to evaluate the performance of DeepPSP, and four other prediction methods using public data sets The F1-score, area under receiver operating characteristic curves (AUROC), and area under precision-recall curves (AUPRC) of DeepPSP were found to be 0.4819, 0.82, and 0.50, respectively, for S/T general site prediction and 0.4206, 0.73, and 0.39, respectively, for Y general site prediction. Compared with the MusiteDeep method, the F1-score, AUROC, and AUPRC of DeepPSP were found to increase by 8.6, 2.5, and 8.7%, respectively, for S/T general site prediction and by 20.6, 5.8, and 18.2%, respectively, for Y general site prediction. Among the tested methods, the developed DeepPSP method was also found to produce best results for different kinase-specific site predictions including CDK, mitogen-activated protein kinase, CAMK, AGC, and CMGC. Taken together, the developed DeepPSP method may offer a more accurate phosphorylation site prediction by including global information. It may serve as an alternative model with better performance and interpretability for protein phosphorylation site prediction.
In metabolomics, identification of metabolic pathways altered by disease, genetics, or environmental perturbations is crucial to uncover the underlying biological mechanisms. A number of pathway analysis methods are currently available, which are generally based on equal-probability, topological-centrality, or model-separability methods. In brief, prior identification of significant metabolites is needed for the first two types of methods, while each pathway is modeled separately in the model-separability-based methods. In these methods, interactions between metabolic pathways are not taken into consideration. The current study aims to develop a novel metabolic pathway identification method based on multi-block partial least squares (MB-PLS) analysis by including all pathways into a global model to facilitate biological interpretation. The detected metabolites are first assigned to pathway blocks based on their roles in metabolism as defined by the KEGG pathway database. The metabolite intensity or concentration data matrix is then reconstructed as data blocks according to the metabolite subsets. Then, a MB-PLS model is built on these data blocks. A new metric, named the pathway importance in projection (PIP), is proposed for evaluation of the significance of each metabolic pathway for group separation. A simulated dataset was generated by imposing artificial perturbation on four pre-defined pathways of the healthy control group of a colorectal cancer study. Performance of the proposed method was evaluated and compared with seven other commonly used methods using both an actual metabolomics dataset and the simulated dataset. For the real metabolomics dataset, most of the significant pathways identified by the proposed method were found to be consistent with the published literature. For the simulated dataset, the significant pathways identified by the proposed method are highly consistent with the pre-defined pathways. The experimental results demonstrate that the proposed method is effective for identification of significant metabolic pathways, which may facilitate biological interpretation of metabolomics data.
Metabolite set enrichment analysis (MSEA) has gained increasing research interest for identification of perturbed metabolic pathways in metabolomics. The method incorporates predefined metabolic pathways information in the analysis where metabolite sets are typically assumed to be mutually exclusive to each other. However, metabolic pathways are known to contain common metabolites and intermediates. This situation, along with limitations in metabolite detection or coverage leads to overlapping, incomplete metabolite sets in pathway analysis. For overlapping metabolite sets, MSEA tends to result in high false positives due to improper weights allocated to the overlapping metabolites. Here, we proposed an extended partial least squares (PLS) model with a new sparse scheme for overlapping metabolite set enrichment analysis, named overlapping group PLS (ogPLS) analysis. The weight vector of the ogPLS model was decomposed into pathway-specific subvectors, and then a group lasso penalty was imposed on these subvectors to achieve a proper weight allocation for the overlapping metabolites. Two strategies were adopted in the proposed ogPLS model to identify the perturbed metabolic pathways. The first strategy involves debiasing regularization, which was used to reduce inequalities amongst the predefined metabolic pathways. The second strategy is stable selection, which was used to rank pathways while avoiding the nuisance problems of model parameter optimization. Both simulated and real-world metabolomic datasets were used to evaluate the proposed method and compare with two other MSEA methods including Global-test and the multiblock PLS (MB-PLS)-based pathway importance in projection (PIP) methods. Using a simulated dataset with known perturbed pathways, the average true discovery rate for the ogPLS method was found to be higher than the Global-test and the MB-PLS-based PIP methods. Analysis with a real-world metabolomics dataset also indicated that the developed method was less prone to select pathways with highly overlapped detected metabolite sets. Compared with the two other methods, the proposed method features higher accuracy, lower false-positive rate, and is more robust when applied to overlapping metabolite set analysis. The developed ogPLS method may serve as an alternative MSEA method to facilitate biological interpretation of metabolomics data for overlapping metabolite sets.
Hepatocellular carcinoma (HCC) is a leading cause of cancer death worldwide. Because of its high recurrence rate and heterogeneity, effective treatment for advanced stage of HCC is currently lacking. There are accumulating evidences showing the therapeutic potential of pharmacologic vitamin C (VC) on HCC. However, the metabolic basis underlying the anticancer property of VC remains to be elucidated. In this study, we used a high-resolution proton nuclear magnetic resonance-based metabolomics technique to assess the global metabolic changes in HCC cells following VC treatment. In addition, the HCC cells were also treated with oxaliplatin (OXA) to explore the potential synergistic effect induced by the combined VC and OXA treatment. The current metabolomics data suggested different mechanisms of OXA and VC in modulating cell growth and metabolism. In general, VC treatment led to inhibition of energy metabolism via NAD+ depletion and amino acid deprivation. On the other hand, OXA caused significant perturbation in phospholipid biosynthesis and phosphatidylcholine biosynthesis pathways. The current results highlighted glutathione metabolism, and pathways related to succinate and choline may play central roles in conferring the combined effect between OXA and VC. Taken together, this study provided metabolic evidence of VC and OXA in treating HCC and may contribute toward the potential application of combined VC and OXA as complementary HCC therapies.
Although acetylation is regarded as a common protein modification, a detailed proteome-wide profile of this post-translational modification may reveal important biological insight regarding differential acetylation of individual proteins. Here we optimized a novel peptide IEF fractionation method for use prior to LC-MS/MS analysis to obtain a more in depth coverage of N-terminally acetylated proteins from complex samples. Application of the method to the analysis of the serous ovarian cancer cell line OVCAR-5 identified 344 N-terminally acetylated proteins, 12 of which are previously unreported. The protein peptidyl-prolyl cis-trans isomerase A (PPIA) was detected in both the N-terminally acetylated and unmodified forms and was further analyzed by data-independent acquisition in carboplatin-responsive parental OVCAR-5 cells and carboplatin-resistant OVCAR-5 cells. This revealed a higher ratio of unacetylated to acetylated N-terminal PPIA in the parental compared with the carboplatin-resistant OVCAR-5 cells and a 4.1-fold increase in PPIA abundance overall in the parental cells relative to carboplatin-resistant OVCAR-5 cells (P = 0.015). In summary, the novel IEF peptide fractionation method presented here is robust, reproducible, and can be applied to the profiling of N-terminally acetylated proteins. All mass spectrometry data is available as a ProteomeXchange repository (PXD003547).
The proteogenomic search pipeline developed in this work has been applied for reanalysis of 40 publicly available shotgun proteomic datasets from various human tissues comprising more than 8000 individual LC-MS/MS runs, of which 5442 .raw data files were processed in total. This reanalysis was focused on searching for ADAR-mediated RNA editing events, their clustering across samples of different origins, and classification. In total, 33 recoded protein sites were identified in 21 datasets. Of those, 18 sites were detected in at least two datasets, representing the core human protein editome. In agreement with prior artworks, neural and cancer tissues were found to be enriched with recoded proteins. Quantitative analysis indicated that recoding the rate of specific sites did not directly depend on the levels of ADAR enzymes or targeted proteins themselves, rather it was governed by differential and yet undescribed regulation of interaction of enzymes with mRNA. Nine recoding sites conservative between humans and rodents were validated by targeted proteomics using stable isotope standards in the murine brain cortex and cerebellum, and an additional one was validated in human cerebrospinal fluid. In addition to previous data of the same type from cancer proteomes, we provide a comprehensive catalog of recoding events caused by ADAR RNA editing in the human proteome.
We performed quantitative metabolic phenotyping of blood plasma in parallel with cytokine/chemokine analysis from participants who were either SARS-CoV-2 (+) (n = 10) or SARS-CoV-2 (-) (n = 49). SARS-CoV-2 positivity was associated with a unique metabolic phenotype and demonstrated a complex systemic response to infection, including severe perturbations in amino acid and kynurenine metabolic pathways. Nine metabolites were elevated in plasma and strongly associated with infection (quinolinic acid, glutamic acid, nicotinic acid, aspartic acid, neopterin, kynurenine, phenylalanine, 3-hydroxykynurenine, and taurine; p < 0.05), while four metabolites were lower in infection (tryptophan, histidine, indole-3-acetic acid, and citrulline; p < 0.05). This signature supports a systemic metabolic phenoconversion following infection, indicating possible neurotoxicity and neurological disruption (elevations of 3-hydroxykynurenine and quinolinic acid) and liver dysfunction (reduction in Fischer's ratio and elevation of taurine). Finally, we report correlations between the key metabolite changes observed in the disease with concentrations of proinflammatory cytokines and chemokines showing strong immunometabolic disorder in response to SARS-CoV-2 infection.
We present a multivariate metabotyping approach to assess the functional recovery of nonhospitalized COVID-19 patients and the possible biochemical sequelae of "Post-Acute COVID-19 Syndrome", colloquially known as long-COVID. Blood samples were taken from patients ca. 3 months after acute COVID-19 infection with further assessment of symptoms at 6 months. Some 57% of the patients had one or more persistent symptoms including respiratory-related symptoms like cough, dyspnea, and rhinorrhea or other nonrespiratory symptoms including chronic fatigue, anosmia, myalgia, or joint pain. Plasma samples were quantitatively analyzed for lipoproteins, glycoproteins, amino acids, biogenic amines, and tryptophan pathway intermediates using Nuclear Magnetic Resonance (NMR) spectroscopy and mass spectrometry. Metabolic data for the follow-up patients (n = 27) were compared with controls (n = 41) and hospitalized severe acute respiratory syndrome SARS-CoV-2 positive patients (n = 18, with multiple time-points). Univariate and multivariate statistics revealed variable patterns of functional recovery with many patients exhibiting residual COVID-19 biomarker signatures. Several parameters were persistently perturbed, e.g., elevated taurine (p = 3.6 × 10-3 versus controls) and reduced glutamine/glutamate ratio (p = 6.95 × 10-8 versus controls), indicative of possible liver and muscle damage and a high energy demand linked to more generalized tissue repair or immune function. Some parameters showed near-complete normalization, e.g., the plasma apolipoprotein B100/A1 ratio was similar to that of healthy controls but significantly lower (p = 4.2 × 10-3) than post-acute COVID-19 patients, reflecting partial reversion of the metabolic phenotype (phenoreversion) toward the healthy metabolic state. Plasma neopterin was normalized in all follow-up patients, indicative of a reduction in the adaptive immune activity that has been previously detected in active SARS-CoV-2 infection. Other systemic inflammatory biomarkers such as GlycA and the kynurenine/tryptophan ratio remained elevated in some, but not all, patients. Correlation analysis, principal component analysis (PCA), and orthogonal-partial least-squares discriminant analysis (O-PLS-DA) showed that the follow-up patients were, as a group, metabolically distinct from controls and partially comapped with the acute-phase patients. Significant systematic metabolic differences between asymptomatic and symptomatic follow-up patients were also observed for multiple metabolites. The overall metabolic variance of the symptomatic patients was significantly greater than that of nonsymptomatic patients for multiple parameters (χ2p = 0.014). Thus, asymptomatic follow-up patients including those with post-acute COVID-19 Syndrome displayed a spectrum of multiple persistent biochemical pathophysiology, suggesting that the metabolic phenotyping approach may be deployed for multisystem functional assessment of individual post-acute COVID-19 patients.
Metabolomics is now widely used to characterize metabolic phenotypes associated with lifestyle risk factors such as obesity. The objective of the present study was to explore the associations of body mass index (BMI) with 145 metabolites measured in blood samples in the European Prospective Investigation into Cancer and Nutrition (EPIC) study. Metabolites were measured in blood from 392 men from the Oxford (UK) cohort (EPIC-Oxford) and in 327 control subjects who were part of a nested case-control study on hepatobiliary carcinomas (EPIC-Hepatobiliary). Measured metabolites included amino acids, acylcarnitines, hexoses, biogenic amines, phosphatidylcholines, and sphingomyelins. Linear regression models controlled for potential confounders and multiple testing were run to evaluate the associations of metabolite concentrations with BMI. 40 and 45 individual metabolites showed significant differences according to BMI variations, in the EPIC-Oxford and EPIC-Hepatobiliary subcohorts, respectively. Twenty two individual metabolites (kynurenine, one sphingomyelin, glutamate and 19 phosphatidylcholines) were associated with BMI in both subcohorts. The present findings provide additional knowledge on blood metabolic signatures of BMI in European adults, which may help identify mechanisms mediating the relationship of BMI with obesity-related diseases.
MD2 pineapple (Ananas comosus) is the second most important tropical crop that preserves crassulacean acid metabolism (CAM), which has high water-use efficiency and is fast becoming the most consumed fresh fruit worldwide. Despite the significance of environmental efficiency and popularity, until very recently, its genome sequence has not been determined and a high-quality annotated proteome has not been available. Here, we have undertaken a pilot proteogenomic study, analyzing the proteome of MD2 pineapple leaves using liquid chromatography-mass spectrometry (LC-MS/MS), which validates 1781 predicted proteins in the annotated F153 (V3) genome. In addition, a further 603 peptide identifications are found that map exclusively to an independent MD2 transcriptome-derived database but are not found in the standard F153 (V3) annotated proteome. Peptide identifications derived from these MD2 transcripts are also cross-referenced to a more recent and complete MD2 genome annotation, resulting in 402 nonoverlapping peptides, which in turn support 30 high-quality gene candidates novel to both pineapple genomes. Many of the validated F153 (V3) genes are also supported by an independent proteomics data set collected for an ornamental pineapple variety. The contigs and peptides have been mapped to the current F153 genome build and are available as bed files to display a custom gene track on the Ensembl Plants region viewer. These analyses add to the knowledge of experimentally validated pineapple genes and demonstrate the utility of transcript-derived proteomics to discover both novel genes and genetic structure in a plant genome, adding value to its annotation.
Leptospirosis, a notifiable endemic disease in Malaysia, has higher mortality rates than regional dengue fever. Diverse clinical symptoms and limited diagnostic methods complicate leptospirosis diagnosis. The demand for accurate biomarker-based diagnostics is increasing. This study investigated the plasma proteome of leptospirosis patients with leptospiraemia and seroconversion compared with dengue patients and healthy subjects using isobaric tags for relative and absolute quantitation (iTRAQ)-mass spectrometry (MS). The iTRAQ analysis identified a total of 450 proteins, which were refined to a list of 290 proteins through a series of exclusion criteria. Differential expression in the plasma proteome of leptospirosis patients compared to the control groups identified 11 proteins, which are apolipoprotein A-II (APOA2), C-reactive protein (CRP), fermitin family homolog 3 (FERMT3), leucine-rich alpha-2-glycoprotein 1 (LRG1), lipopolysaccharide-binding protein (LBP), myosin-9 (MYH9), platelet basic protein (PPBP), platelet factor 4 (PF4), profilin-1 (PFN1), serum amyloid A-1 protein (SAA1), and thrombospondin-1 (THBS1). Following a study on a verification cohort, a panel of eight plasma protein biomarkers was identified for potential leptospirosis diagnosis: CRP, LRG1, LBP, MYH9, PPBP, PF4, SAA1, and THBS1. In conclusion, a panel of eight protein biomarkers offers a promising approach for leptospirosis diagnosis, addressing the limitations of the "one disease, one biomarker" concept.
Crossbreeding of zebu cattle (Bos indicus) with European breeds (Bos taurus) producing crossbred cattle was performed to overcome the low growth rates and milk production of indigenous tropical cattle breeds. However, zebu cattle fertility is higher than those of crossbred cattle and European breeds under warm conditions. Combination study of proteomics and metabolomics toward Malaysian indigenous breed Kedah × Kelantan-KK (B. indicus) and crossbreed Mafriwal-M (B. taurus × B. indicus) to understand physiological reasons for higher thermotolerance and fertility in Zebu cattle sperm. 161 regulated metabolites and 96 regulated proteins in KK and M (p < 0.05) showed more efficient carbohydrate and energy metabolism, higher integrity of the DNA and plasma membrane, a lower level of reactive oxygen species, and higher levels of phospholipids, which confirmed higher sperm plasma membrane integrity in KK. A stronger antioxidant system and lower polyunsaturated fatty acids help KK sperm cope with oxidative stress under warm conditions. The higher abundance of flagella structural proteins in KK provides a stronger structure that supports sperm motility. Abnormality of flagella, plasma membrane disruption, and DNA fragmentation were higher in M. These findings provide selective molecular markers for developing high-producing and more thermotolerant cattle breeds in tropical areas (197 words).