MyMedR

Displaying all 4 publications

Abstract:

Sort:

Fulltext AnkPlex: algorithmic structure for refinement of near-native ankyrin-protein docking

Wisitponchai T, Shoombuatong W, Lee VS, Kitidee K, Tayapiwatana C

BMC Bioinformatics, 2017 Apr 19;18(1):220.
PMID: 28424069 DOI: 10.1186/s12859-017-1628-6

BACKGROUND: Computational analysis of protein-protein interaction provided the crucial information to increase the binding affinity without a change in basic conformation. Several docking programs were used to predict the near-native poses of the protein-protein complex in 10 top-rankings. The universal criteria for discriminating the near-native pose are not available since there are several classes of recognition protein. Currently, the explicit criteria for identifying the near-native pose of ankyrin-protein complexes (APKs) have not been reported yet.
RESULTS: In this study, we established an ensemble computational model for discriminating the near-native docking pose of APKs named "AnkPlex". A dataset of APKs was generated from seven X-ray APKs, which consisted of 3 internal domains, using the reliable docking tool ZDOCK. The dataset was composed of 669 and 44,334 near-native and non-near-native poses, respectively, and it was used to generate eleven informative features. Subsequently, a re-scoring rank was generated by AnkPlex using a combination of a decision tree algorithm and logistic regression. AnkPlex achieved superior efficiency with ≥1 near-native complexes in the 10 top-rankings for nine X-ray complexes compared to ZDOCK, which only obtained six X-ray complexes. In addition, feature analysis demonstrated that the van der Waals feature was the dominant near-native pose out of the potential ankyrin-protein docking poses.
CONCLUSION: The AnkPlex model achieved a success at predicting near-native docking poses and led to the discovery of informative characteristics that could further improve our understanding of the ankyrin-protein complex. Our computational study could be useful for predicting the near-native poses of binding proteins and desired targets, especially for ankyrin-protein complexes. The AnkPlex web server is freely accessible at http://ankplex.ams.cmu.ac.th .
Fulltext A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides

Charoenkwan P, Chotpatiwetchkul W, Lee VS, Nantasenamat C, Shoombuatong W

Sci Rep, 2021 Dec 10;11(1):23782.
PMID: 34893688 DOI: 10.1038/s41598-021-03293-w

Owing to their ability to maintain a thermodynamically stable fold at extremely high temperatures, thermophilic proteins (TTPs) play a critical role in basic research and a variety of applications in the food industry. As a result, the development of computation models for rapidly and accurately identifying novel TTPs from a large number of uncharacterized protein sequences is desirable. In spite of existing computational models that have already been developed for characterizing thermophilic proteins, their performance and interpretability remain unsatisfactory. We present a novel sequence-based thermophilic protein predictor, termed SCMTPP, for improving model predictability and interpretability. First, an up-to-date and high-quality dataset consisting of 1853 TPPs and 3233 non-TPPs was compiled from published literature. Second, the SCMTPP predictor was created by combining the scoring card method (SCM) with estimated propensity scores of g-gap dipeptides. Benchmarking experiments revealed that SCMTPP had a cross-validation accuracy of 0.883, which was comparable to that of a support vector machine-based predictor (0.906-0.910) and 2-17% higher than that of commonly used machine learning models. Furthermore, SCMTPP outperformed the state-of-the-art approach (ThermoPred) on the independent test dataset, with accuracy and MCC of 0.865 and 0.731, respectively. Finally, the SCMTPP-derived propensity scores were used to elucidate the critical physicochemical properties for protein thermostability enhancement. In terms of interpretability and generalizability, comparative results showed that SCMTPP was effective for identifying and characterizing TPPs. We had implemented the proposed predictor as a user-friendly online web server at http://pmlabstack.pythonanywhere.com/SCMTPP in order to allow easy access to the model. SCMTPP is expected to be a powerful tool for facilitating community-wide efforts to identify TPPs on a large scale and guiding experimental characterization of TPPs.
Fulltext Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features

Mahmud SMH, Goh KOM, Hosen MF, Nandi D, Shoombuatong W

Sci Rep, 2024 Feb 05;14(1):2961.
PMID: 38316843 DOI: 10.1038/s41598-024-52653-9

DNA-binding proteins (DBPs) play a significant role in all phases of genetic processes, including DNA recombination, repair, and modification. They are often utilized in drug discovery as fundamental elements of steroids, antibiotics, and anticancer drugs. Predicting them poses the most challenging task in proteomics research. Conventional experimental methods for DBP identification are costly and sometimes biased toward prediction. Therefore, developing powerful computational methods that can accurately and rapidly identify DBPs from sequence information is an urgent need. In this study, we propose a novel deep learning-based method called Deep-WET to accurately identify DBPs from primary sequence information. In Deep-WET, we employed three powerful feature encoding schemes containing Global Vectors, Word2Vec, and fastText to encode the protein sequence. Subsequently, these three features were sequentially combined and weighted using the weights obtained from the elements learned through the differential evolution (DE) algorithm. To enhance the predictive performance of Deep-WET, we applied the SHapley Additive exPlanations approach to remove irrelevant features. Finally, the optimal feature subset was input into convolutional neural networks to construct the Deep-WET predictor. Both cross-validation and independent tests indicated that Deep-WET achieved superior predictive performance compared to conventional machine learning classifiers. In addition, in extensive independent test, Deep-WET was effective and outperformed than several state-of-the-art methods for DBP prediction, with accuracy of 78.08%, MCC of 0.559, and AUC of 0.805. This superior performance shows that Deep-WET has a tremendous predictive capacity to predict DBPs. The web server of Deep-WET and curated datasets in this study are available at https://deepwet-dna.monarcatechnical.com/ . The proposed Deep-WET is anticipated to serve the community-wide effort for large-scale identification of potential DBPs.
Fulltext Improved prediction and characterization of anticancer activities of peptides using a novel flexible scoring card method

Charoenkwan P, Chiangjong W, Lee VS, Nantasenamat C, Hasan MM, Shoombuatong W

Sci Rep, 2021 Feb 04;11(1):3017.
PMID: 33542286 DOI: 10.1038/s41598-021-82513-9

As anticancer peptides (ACPs) have attracted great interest for cancer treatment, several approaches based on machine learning have been proposed for ACP identification. Although existing methods have afforded high prediction accuracies, however such models are using a large number of descriptors together with complex ensemble approaches that consequently leads to low interpretability and thus poses a challenge for biologists and biochemists. Therefore, it is desirable to develop a simple, interpretable and efficient predictor for accurate ACP identification as well as providing the means for the rational design of new anticancer peptides with promising potential for clinical application. Herein, we propose a novel flexible scoring card method (FSCM) making use of propensity scores of local and global sequential information for the development of a sequence-based ACP predictor (named iACP-FSCM) for improving the prediction accuracy and model interpretability. To the best of our knowledge, iACP-FSCM represents the first sequence-based ACP predictor for rationalizing an in-depth understanding into the molecular basis for the enhancement of anticancer activities of peptides via the use of FSCM-derived propensity scores. The independent testing results showed that the iACP-FSCM provided accuracies of 0.825 and 0.910 as evaluated on the main and alternative datasets, respectively. Results from comparative benchmarking demonstrated that iACP-FSCM could outperform seven other existing ACP predictors with marked improvements of 7% and 17% for accuracy and MCC, respectively, on the main dataset. Furthermore, the iACP-FSCM (0.910) achieved very comparable results to that of the state-of-the-art ensemble model AntiCP2.0 (0.920) as evaluated on the alternative dataset. Comparative results demonstrated that iACP-FSCM was the most suitable choice for ACP identification and characterization considering its simplicity, interpretability and generalizability. It is highly anticipated that the iACP-FSCM may be a robust tool for the rapid screening and identification of promising ACPs for clinical use.