Displaying publications 1 - 20 of 49 in total

Abstract:
Sort:
  1. Lewis TE, Sillitoe I, Dawson N, Lam SD, Clarke T, Lee D, et al.
    Nucleic Acids Res, 2018 01 04;46(D1):D435-D439.
    PMID: 29112716 DOI: 10.1093/nar/gkx1069
    Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of globular domain annotations for millions of available protein sequences. Gene3D has previously featured in the Database issue of NAR and here we report a significant update to the Gene3D database. The current release, Gene3D v16, has significantly expanded its domain coverage over the previous version and now contains over 95 million domain assignments. We also report a new method for dealing with complex domain architectures that exist in Gene3D, arising from discontinuous domains. Amongst other updates, we have added visualization tools for exploring domain annotations in the context of other sequence features and in gene families. We also provide web-pages to visualize other domain families that co-occur with a given query domain family.
    Matched MeSH terms: Databases, Protein
  2. Mohamed R, Degac J, Helms V
    PLoS One, 2015;10(10):e0140965.
    PMID: 26517868 DOI: 10.1371/journal.pone.0140965
    Protein-protein interactions (PPIs) play a major role in many biological processes and they represent an important class of targets for therapeutic intervention. However, targeting PPIs is challenging because often no convenient natural substrates are available as starting point for small-molecule design. Here, we explored the characteristics of protein interfaces in five non-redundant datasets of 174 protein-protein (PP) complexes, and 161 protein-ligand (PL) complexes from the ABC database, 436 PP complexes, and 196 PL complexes from the PIBASE database and a dataset of 89 PL complexes from the Timbal database. In all cases, the small molecule ligands must bind at the respective PP interface. We observed similar amino acid frequencies in all three datasets. Remarkably, also the characteristics of PP contacts and overlapping PL contacts are highly similar.
    Matched MeSH terms: Databases, Protein
  3. Bordin N, Sillitoe I, Nallapareddy V, Rauer C, Lam SD, Waman VP, et al.
    Commun Biol, 2023 Feb 08;6(1):160.
    PMID: 36755055 DOI: 10.1038/s42003-023-04488-9
    Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
    Matched MeSH terms: Databases, Protein
  4. Klausen MS, Jespersen MC, Nielsen H, Jensen KK, Jurtz VI, Sønderby CK, et al.
    Proteins, 2019 06;87(6):520-527.
    PMID: 30785653 DOI: 10.1002/prot.25674
    The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unraveling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1000 proteins in less than 2 hours, and complete proteomes in less than 1 day.
    Matched MeSH terms: Databases, Protein*
  5. Afiqah-Aleng N, Mohamed-Hussein ZA
    Methods Mol Biol, 2021;2189:119-132.
    PMID: 33180298 DOI: 10.1007/978-1-0716-0822-7_10
    In this post-genomic era, protein network can be used as a complementary way to shed light on the growing amount of data generated from current high-throughput technologies. Protein network is a powerful approach to describe the molecular mechanisms of the biological events through protein-protein interactions. Here, we describe the computational methods used to construct the protein network using expression data. We provide a list of available tools and databases that can be used in constructing the network.
    Matched MeSH terms: Databases, Protein*
  6. Patro CP, Khan AM, Tan TW, Fu XY
    PLoS One, 2014;9(8):e104597.
    PMID: 25157689 DOI: 10.1371/journal.pone.0104597
    Signal transducers and activators of transcription (STAT) proteins are key signalling molecules in metazoans, implicated in various cellular processes. Increased research in the field has resulted in the accumulation of STAT sequence and structure data, which are scattered across various public databases, missing extensive functional annotations, and prone to effort redundancy because of the dearth of community sharing. Therefore, there is a need to integrate the existing sequence, structure and functional data into a central repository, one that is enriched with annotations and provides a platform for community contributions. Herein, we present STATdb (publicly available at http://statdb.bic.nus.edu.sg/), the first integrated resource for STAT sequences comprising 1540 records representing the known STATome, enriched with existing structural and functional information from various databases and literature and including manual annotations. STATdb provides advanced features for data visualization, analysis and prediction, and community contributions. A key feature is a meta-predictor to characterise STAT sequences based on a novel classification that integrates STAT domain architecture, lineage and function. A curation policy workflow has been devised for regulated and structured community contributions, with an update policy for the seamless integration of new data and annotations.
    Matched MeSH terms: Databases, Protein*
  7. Matra DD, Ritonga AW, Natawijaya A, Poerwanto R, Sobir, Widodo WD, et al.
    Data Brief, 2019 Feb;22:332-335.
    PMID: 30596128 DOI: 10.1016/j.dib.2018.12.031
    Baccaurea motleyana Müll. Arg. (rambai) is one of the underutilized fruit natives to Indonesia, Thailand, and Malaya Peninsula and it is mostly cultivated in Java island (Lim, 2012) [1]. The edible part of fruits is white and reddish arillodes in which having sweet to acid-sweet tastes. However, nucleotide as well as transcriptome information of this species is still scarce, no information has been deposited in GenBank. In this data article, we performed for the first time of de novo assembly of transcriptome using paired-end Illumina technology. The assembled contigs were constructed using Trinity and after filtering and clustering, produced 37,077 contigs. The contig ranged 201-4972 bp and N50 has 696 bp. The contig was annotated with several database such as SwissProt, TrEMBL, nr and nt NCBI databases. The raw reads were deposited in DDBJ with DRA numbers, DRA007358. The assembled contigs of transcriptome are deposited in the DDBJ TSA with accession number, IADP01000001-IADP01037077 and also can be accessed at http://rujakbase.id.
    Matched MeSH terms: Databases, Protein
  8. Sablok G, Pérez-Pulido AJ, Do T, Seong TY, Casimiro-Soriguer CS, La Porta N, et al.
    Front Plant Sci, 2016;7:878.
    PMID: 27446111 DOI: 10.3389/fpls.2016.00878
    Analysis of repetitive DNA sequence content and divergence among the repetitive functional classes is a well-accepted approach for estimation of inter- and intra-generic differences in plant genomes. Among these elements, microsatellites, or Simple Sequence Repeats (SSRs), have been widely demonstrated as powerful genetic markers for species and varieties discrimination. We present PlantFuncSSRs platform having more than 364 plant species with more than 2 million functional SSRs. They are provided with detailed annotations for easy functional browsing of SSRs and with information on primer pairs and associated functional domains. PlantFuncSSRs can be leveraged to identify functional-based genic variability among the species of interest, which might be of particular interest in developing functional markers in plants. This comprehensive on-line portal unifies mining of SSRs from first and next generation sequencing datasets, corresponding primer pairs and associated in-depth functional annotation such as gene ontology annotation, gene interactions and its identification from reference protein databases. PlantFuncSSRs is freely accessible at: http://www.bioinfocabd.upo.es/plantssr.
    Matched MeSH terms: Databases, Protein
  9. Amir SH, Yuswan MH, Aizat WM, Mansor MK, Desa MNM, Yusof YA, et al.
    J Proteomics, 2021 06 15;241:104240.
    PMID: 33894373 DOI: 10.1016/j.jprot.2021.104240
    Mass spectrometry-based proteomics relies on dedicated software for peptide and protein identification. These software include open-source or commercial-based search engines; wherein, they employ different algorithms to establish their scoring and identified proteins. Although previous comparative studies have differentiated the proteomics results from different software, there are still yet studies specifically been conducted to compare and evaluate the search engine in the field of halal analysis. This is important because the halal analysis is often using commercial meat samples that have been subjected to various processing, further complicating its analysis. Thus, this study aimed to assess three open-source search engines (Comet, X! Tandem, and ProteinProspector) and a commercial-based search engine (ProteinPilot™) against 135 raw tandem mass spectrometry data files from 15 types of pork-based food products for halal analysis. Each database search engine contained high false-discovery rate (FDR); however, a post-searching algorithm called PeptideProphet managed to reduce the FDR, except for ProteinProspector and ProteinPilot™. From this study, the combined database search engine (executed by iProphet) reveals a thorough protein list for pork-based food products; wherein the most abundant proteins are myofibrillar proteins. Thus, this proteomics study will aid the identification of potential peptide and protein biomarkers for future precision halal analysis. SIGNIFICANCE: A critical challenge of halal proteomics is the availability of a database to confirm the inferential peptides as well as proteins. Currently, the established database such as UniProtKB is related to animal proteome; however, the halal proteomics is related to the highly processed meat-based food products. This study highlights the use of different database search engines (Comet, X! Tandem, ProteinProspector, and ProteinPilot™) and their respective algorithms to analyse 135 raw tandem mass spectrometry data files from 15 types of pork-based food products. This is the first attempt that has compared different database search engines in the context of halal proteomics to ensure the effectiveness of controlling the FDR. Previous studies were just focused on the advantages of a certain algorithm over another. Moreover, other previous studies also have mainly reported the use of mass spectrometry-based shotgun proteomics for meat authentication (the most similar field to halal analysis), but none of the studies were reported on halal aspects that used samples originated from highly processed food products. Hence, a systematic comparative study is duly needed for a more comprehensive and thorough proteomics analysis for such samples. In this study, our combinatorial approach for halal proteomics results from the different search engines used (Comet, X! Tandem, and ProteinProspector) has successfully generated a comprehensive spectral library for the pork-based meat products. This combined spectral library is freely available at https://data.mendeley.com/datasets/6dmm8659rm/3. Thus far, this is the first and new attempt at establishing a spectral library for halal proteomics. We also believe this study is a pioneer for halal proteomics that aimed at non-conventional and non-model organism proteomics, protein analytics, protein bioinformatics, and potential biomarker discovery.
    Matched MeSH terms: Databases, Protein
  10. Sillitoe I, Andreeva A, Blundell TL, Buchan DWA, Finn RD, Gough J, et al.
    Nucleic Acids Res, 2020 Jan 08;48(D1):D314-D319.
    PMID: 31733063 DOI: 10.1093/nar/gkz967
    Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.
    Matched MeSH terms: Databases, Protein
  11. Nadzirin N, Willett P, Artymiuk PJ, Firdaus-Raih M
    Nucleic Acids Res, 2013 Jul;41(Web Server issue):W432-40.
    PMID: 23716645 DOI: 10.1093/nar/gkt431
    We describe a server that allows the interrogation of the Protein Data Bank for hypothetical 3D side chain patterns that are not limited to known patterns from existing 3D structures. A minimal side chain description allows a variety of side chain orientations to exist within the pattern, and generic side chain types such as acid, base and hydroxyl-containing can be additionally deployed in the search query. Moreover, only a subset of distances between the side chains need be specified. We illustrate these capabilities in case studies involving arginine stacks, serine-acid group arrangements and multiple catalytic triad-like configurations. The IMAAAGINE server can be accessed at http://mfrlab.org/grafss/imaaagine/.
    Matched MeSH terms: Databases, Protein
  12. Nadzirin N, Firdaus-Raih M
    Int J Mol Sci, 2012;13(10):12761-72.
    PMID: 23202924 DOI: 10.3390/ijms131012761
    Proteins of uncharacterized functions form a large part of many of the currently available biological databases and this situation exists even in the Protein Data Bank (PDB). Our analysis of recent PDB data revealed that only 42.53% of PDB entries (1084 coordinate files) that were categorized under "unknown function" are true examples of proteins of unknown function at this point in time. The remainder 1465 entries also annotated as such appear to be able to have their annotations re-assessed, based on the availability of direct functional characterization experiments for the protein itself, or for homologous sequences or structures thus enabling computational function inference.
    Matched MeSH terms: Databases, Protein
  13. Afiqah-Aleng N, Altaf-Ul-Amin M, Kanaya S, Mohamed-Hussein ZA
    Reprod Biomed Online, 2020 Feb;40(2):319-330.
    PMID: 32001161 DOI: 10.1016/j.rbmo.2019.11.012
    RESEARCH QUESTION: Polycystic ovary syndrome (PCOS) is a complex endocrine disorder with diverse clinical implications, such as infertility, metabolic disorders, cardiovascular diseases and psychological problems among others. The heterogeneity of conditions found in PCOS contribute to its various phenotypes, leading to difficulties in identifying proteins involved in this abnormality. Several studies, however, have shown the feasibility in identifying molecular evidence underlying other diseases using graph cluster analysis. Therefore, is it possible to identify proteins and pathways related to PCOS using the same approach?

    METHODS: Known PCOS-related proteins (PCOSrp) from PCOSBase and DisGeNET were integrated with protein-protein interactions (PPI) information from Human Integrated Protein-Protein Interaction reference to construct a PCOS PPI network. The network was clustered with DPClusO algorithm to generate clusters, which were evaluated using Fisher's exact test. Pathway enrichment analysis using gProfileR was conducted to identify significant pathways.

    RESULTS: The statistical significance of the identified clusters has successfully predicted 138 novel PCOSrp with 61.5% reliability and, based on Cronbach's alpha, this prediction is acceptable. Androgen signalling pathway and leptin signalling pathway were among the significant PCOS-related pathways corroborating the information obtained from the clinical observation, where androgen signalling pathway is responsible in producing male hormones in women with PCOS, whereas leptin signalling pathway is involved in insulin sensitivity.

    CONCLUSIONS: These results show that graph cluster analysis can provide additional insight into the pathobiology of PCOS, as the pathways identified as statistically significant correspond to earlier biological studies. Therefore, integrative analysis can reveal unknown mechanisms, which may enable the development of accurate diagnosis and effective treatment in PCOS.

    Matched MeSH terms: Databases, Protein
  14. Ab Ghani NS, Emrizal R, Makmur H, Firdaus-Raih M
    Comput Struct Biotechnol J, 2020;18:2931-2944.
    PMID: 33101604 DOI: 10.1016/j.csbj.2020.10.013
    Structures of protein-drug-complexes provide an atomic level profile of drug-target interactions. In this work, the three-dimensional arrangements of amino acid side chains in known drug binding sites (substructures) were used to search for similarly arranged sites in SARS-CoV-2 protein structures in the Protein Data Bank for the potential repositioning of approved compounds. We were able to identify 22 target sites for the repositioning of 16 approved drug compounds as potential therapeutics for COVID-19. Using the same approach, we were also able to investigate the potentially promiscuous binding of the 16 compounds to off-target sites that could be implicated in toxicity and side effects that had not been provided by any previous studies. The investigations of binding properties in disease-related proteins derived from the comparison of amino acid substructure arrangements allows for effective mechanism driven decision making to rank and select only the compounds with the highest potential for success and safety to be prioritized for clinical trials or treatments. The intention of this work is not to explicitly identify candidate compounds but to present how an integrated drug repositioning and potential toxicity pipeline using side chain similarity searching algorithms are of great utility in epidemic scenarios involving novel pathogens. In the case of the COVID-19 pandemic caused by the SARS-CoV-2 virus, we demonstrate that the pipeline can identify candidate compounds quickly and sustainably in combination with associated risk factors derived from the analysis of potential off-target site binding by the compounds to be repurposed.
    Matched MeSH terms: Databases, Protein
  15. Akbar R, Jusoh SA, Amaro RE, Helms V
    Chem Biol Drug Des, 2017 May;89(5):762-771.
    PMID: 27995760 DOI: 10.1111/cbdd.12900
    Finding pharmaceutically relevant target conformations from an arbitrary set of protein conformations remains a challenge in structure-based virtual screening (SBVS). The growth in the number of available conformations, either experimentally determined or computationally derived, obscures the situation further. While the inflated conformation space potentially contains viable druggable targets, the increase of conformational complexity, as a consequence, poses a selection problem. To address this challenge, we took advantage of machine learning methods, namely an over-sampling and a binary classification procedure, and present a novel method to select druggable receptor conformations. Specifically, we trained a binary classifier on a set of nuclear receptor conformations, wherein each conformation was labeled with an enrichment measure for a corresponding SBVS. The classifier enabled us to formulate suggestions and identify enriching SBVS targets for six of seven nuclear receptors. Further, the classifier can be extended to other proteins of interest simply by feeding new training data sets to the classifier. Our work, thus, provides a methodology to identify pharmaceutically interesting receptor conformations for nuclear receptors and other drug targets.
    Matched MeSH terms: Databases, Protein
  16. Fotoohifiroozabadi S, Mohamad MS, Deris S
    J Bioinform Comput Biol, 2017 Apr;15(2):1750004.
    PMID: 28274174 DOI: 10.1142/S0219720017500044
    Protein structure alignment and comparisons that are based on an alphabetical demonstration of protein structure are more simple to run with faster evaluation processes; thus, their accuracy is not as reliable as three-dimension (3D)-based tools. As a 1D method candidate, TS-AMIR used the alphabetic demonstration of secondary-structure elements (SSE) of proteins and compared the assigned letters to each SSE using the [Formula: see text]-gram method. Although the results were comparable to those obtained via geometrical methods, the SSE length and accuracy of adjacency between SSEs were not considered in the comparison process. Therefore, to obtain further information on accuracy of adjacency between SSE vectors, the new approach of assigning text to vectors was adopted according to the spherical coordinate system in the present study. Moreover, dynamic programming was applied in order to account for the length of SSE vectors. Five common datasets were selected for method evaluation. The first three datasets were small, but difficult to align, and the remaining two datasets were used to compare the capability of the proposed method with that of other methods on a large protein dataset. The results showed that the proposed method, as a text-based alignment approach, obtained results comparable to both 1D and 3D methods. It outperformed 1D methods in terms of accuracy and 3D methods in terms of runtime.
    Matched MeSH terms: Databases, Protein
  17. Abdul Rahman SN, Bakar MFA, Singham GV, Othman AS
    3 Biotech, 2019 Nov;9(11):388.
    PMID: 31656726 DOI: 10.1007/s13205-019-1921-3
    In this study, RNA sequencing of several Hevea brasiliensis clones grown in Malaysia with different annual rubber production yields and disease resistance was performed on the Illumina platform. A total of 29,862,548 reads were generated, resulting in 101,269 assembled transcripts that were used as the reference transcripts. A similarity search against the non-redundant (nr) protein databases presented 83,771 (83%) positive BLASTx hits. The transcriptome was annotated using gene ontology (GO), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Pfam database. A search for putative molecular markers was performed to identify single-nucleotide polymorphisms (SNPs). Overall, 3,210,629 SNPs were detected and a total of 1314 SNPs associated with the genes involved in MVA and MEP pathways were identified. A total of 176 SNP primer pairs were designed from sequences that were related to the MVA and MEP pathways. The transcriptome of RRIM 3001 and RRIM 712 were subjected to pairwise comparison and the results revealed that there were 1262 significantly differentially expressed genes unique to RRIM 3001, 1499 significantly differentially expressed genes unique to RRIM 712 and several genes related to the MVA and MEP pathways such as AACT, HMGS, PMK, MVD, DXS and HDS were included. The results will facilitate the characterization of H. brasiliensis transcriptomes and the development of a new set of molecular markers in the form of SNPs from transcriptome assembly for the genotype identification of various rubber varieties with superior traits in Malaysia.
    Matched MeSH terms: Databases, Protein
  18. Hussin NA, Najimudin N, Ab Majid AH
    Heliyon, 2019 Dec;5(12):e02969.
    PMID: 31872129 DOI: 10.1016/j.heliyon.2019.e02969
    The subterranean termite Globitermus sulphureus is an important Southeast Asian pest with limited genomic resources that causes damages to agriculture crops and building structures. Therefore, the main goal of this study was to survey the G. sulphureus transcriptome composition. Here, we performed de novo transcriptome for G. sulphureus workers' heads using Illumina HiSeq paired-end sequencing technology. A total of 88, 639, 408 clean reads were collected and assembled into 243, 057 transcripts and 193, 344 putative genes. The transcripts were annotated with the Trinotate pipeline. In total, 27, 061 transcripts were successfully annotated using BLASTX against the SwissProt database and 17, 816 genes were assigned to 47, 598 GO terms. We classified 14, 223 transcripts into COG classification, resulting in 25 groups of functional annotations. Next, a total of 12, 194 genes were matched in the KEGG pathway and 392 metabolic pathways were predicted based on the annotation. Moreover, we detected two endogenous cellulases in the sequences. The RT-qPCR analysis showed that there were significant differences in the expression levels of two genes β-glucosidase and endo-β-1,4-glucanase between worker and soldier heads of G. sulphureus. This is the first study to characterize the complete head transcriptome of a higher termite G. sulphureus using a high-throughput sequencing. Our study may provide an overview and comprehensive molecular resource for comparative studies of the transcriptomics and genomics of termites.
    Matched MeSH terms: Databases, Protein
  19. Hanna GS, Choo YM, Harbit R, Paeth H, Wilde S, Mackle J, et al.
    J Nat Prod, 2021 Nov 26;84(11):3001-3007.
    PMID: 34677966 DOI: 10.1021/acs.jnatprod.1c00625
    The pressing need for SARS-CoV-2 controls has led to a reassessment of strategies to identify and develop natural product inhibitors of zoonotic, highly virulent, and rapidly emerging viruses. This review article addresses how contemporary approaches involving computational chemistry, natural product (NP) and protein databases, and mass spectrometry (MS) derived target-ligand interaction analysis can be utilized to expedite the interrogation of NP structures while minimizing the time and expense of extraction, purification, and screening in BioSafety Laboratories (BSL)3 laboratories. The unparalleled structural diversity and complexity of NPs is an extraordinary resource for the discovery and development of broad-spectrum inhibitors of viral genera, including Betacoronavirus, which contains MERS, SARS, SARS-CoV-2, and the common cold. There are two key technological advances that have created unique opportunities for the identification of NP prototypes with greater efficiency: (1) the application of structural databases for NPs and target proteins and (2) the application of modern MS techniques to assess protein-ligand interactions directly from NP extracts. These approaches, developed over years, now allow for the identification and isolation of unique antiviral ligands without the immediate need for BSL3 facilities. Overall, the goal is to improve the success rate of NP-based screening by focusing resources on source materials with a higher likelihood of success, while simultaneously providing opportunities for the discovery of novel ligands to selectively target proteins involved in viral infection.
    Matched MeSH terms: Databases, Protein
  20. Cao MY, Zainudin S, Daud KM
    BMC Genomics, 2024 May 13;25(1):466.
    PMID: 38741045 DOI: 10.1186/s12864-024-10361-8
    BACKGROUND: Protein-protein interactions (PPIs) hold significant importance in biology, with precise PPI prediction as a pivotal factor in comprehending cellular processes and facilitating drug design. However, experimental determination of PPIs is laborious, time-consuming, and often constrained by technical limitations.

    METHODS: We introduce a new node representation method based on initial information fusion, called FFANE, which amalgamates PPI networks and protein sequence data to enhance the precision of PPIs' prediction. A Gaussian kernel similarity matrix is initially established by leveraging protein structural resemblances. Concurrently, protein sequence similarities are gauged using the Levenshtein distance, enabling the capture of diverse protein attributes. Subsequently, to construct an initial information matrix, these two feature matrices are merged by employing weighted fusion to achieve an organic amalgamation of structural and sequence details. To gain a more profound understanding of the amalgamated features, a Stacked Autoencoder (SAE) is employed for encoding learning, thereby yielding more representative feature representations. Ultimately, classification models are trained to predict PPIs by using the well-learned fusion feature.

    RESULTS: When employing 5-fold cross-validation experiments on SVM, our proposed method achieved average accuracies of 94.28%, 97.69%, and 84.05% in terms of Saccharomyces cerevisiae, Homo sapiens, and Helicobacter pylori datasets, respectively.

    CONCLUSION: Experimental findings across various authentic datasets validate the efficacy and superiority of this fusion feature representation approach, underscoring its potential value in bioinformatics.

    Matched MeSH terms: Databases, Protein
Filters
Contact Us

Please provide feedback to Administrator ([email protected])

External Links