RESULTS: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized.
CONCLUSIONS: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.
Methods: Herein, we report a comprehensive study of the dynamics of H5N1 mutations by analysis of the aligned overlapping nonamer positions (1-9, 2-10, etc.) of more than 13,000 protein sequences of avian and human influenza A (H5N1) viruses, reported over at least 50 years. Entropy calculations were performed on 9,408 overlapping nonamer position of the proteome to study the diversity in the context of immune system. The nonamers represent the predominant length of the binding cores for peptides recognized by the cellular immune system. To further dissect the sequence diversity, each overlapping nonamer position was quantitatively analyzed for four patterns of sequence diversity motifs: index, major, minor and unique.
Results: Almost all of the aligned overlapping nonamer positions of each viral proteome exhibited variants (major, minor, and unique) to the predominant index sequence. Each variant motif displayed a characteristic pattern of incidence change in relation to increased total variants. The major variant exhibited a restrictive pyramidal incidence pattern, with peak incidence at 50% total variants. Post this peak incidence, the minor variants became the predominant motif for majority of the positions. Unique variants, each sequence observed only once, were present at nearly all of the nonamer positions. The diversity motifs (index and variants) demonstrated complex inter-relationships, with motif switching being a common phenomenon. Additionally, 25 highly conserved sequences were identified to be shared across viruses of both hosts, with half conserved to several other influenza A subtypes.
Discussion: The presence of distinct sequences (nonatypes) at nearly all nonamer positions represents a large repertoire of reported viral variants in the proteome, which influence the variability dynamics of the viral population. This work elucidated and provided important insights on the components that make up the viral diversity, delineating inherent patterns in the organization of sequence changes that function in the viral fitness-selection. Additionally, it provides a catalogue of all the mutational changes involved in the dynamics of H5N1 viral diversity for both avian and human host populations. This work provides data relevant for the design of prophylactics and therapeutics that overcome the diversity of the virus, and can aid in the surveillance of existing and future strains of influenza viruses.
RESULTS: Prediction in two and three state classification systems with several thresholds are provided. Our prediction method achieved the accuracy level upto 90% for training and 88% for test data sets. Three state prediction results provide a maximum 65% accuracy for training and 63% for the test data. Applicability of neural networks for ASA prediction has been confirmed with a larger data set and wider range of state thresholds. Salient differences between a linear and exponential network for ASA prediction have been analysed.
AVAILABILITY: Online predictions are freely available at: http://www.netasa.org. Linux ix86 binaries of the program written for this work may be obtained by email from the corresponding author.
METHODS AND RESULTS: The tropomyosin gene was cloned and expressed in the Escherichia coli system, followed by SDS-PAGE and immunoblotting test to identify the allergenic potential of the recombinant protein. The 855-base pair of tropomyosin gene produced was found to be 99.18% homologous to Scylla serrata. Its 284 amino acids matched the tropomyosin of crustaceans, arachnids, insects, and Klebsiella pneumoniae, ranging from 79.03 to 95.77%. The tropomyosin contained 89.44% alpha-helix folding with a tertiary structure of two-chain alpha-helical coiled-coil structures comprising a homodimer heptad chain. IPTG-induced histidine tagged-recombinant tropomyosin was purified at the size of 42 kDa and confirmed as tropomyosin using anti-tropomyosin monoclonal antibodies. The IgE binding of recombinant tropomyosin protein was reactive in 90.9% (20/22) of the sera from crab-allergic patients.
CONCLUSIONS: This study has successfully produced an allergenic recombinant tropomyosin from S. olivacea. This recombinant tropomyosin may be used as a specific allergen for the diagnosis of allergy.
METHODS: A total of 36 full-length pkmsp1p sequences along with the reference H-strain and 40 C-terminal pkmsp1p sequences from clinical isolates of Malaysia were downloaded from published genomes. Genetic diversity, polymorphism, haplotype and natural selection were determined using DnaSP 5.10 and MEGA 5.0 software. Genealogical relationships were determined using haplotype network tree in NETWORK software v5.0. Population genetic differentiation index (F ST ) and population structure of parasite was determined using Arlequin v3.5 and STRUCTURE v2.3.4 software.
RESULTS: Comparison of 36 full-length pkmsp1p sequences along with the H-strain identified 339 SNPs (175 non-synonymous and 164 synonymous substitutions). The nucleotide diversity across the full-length gene was low compared to its ortholog pvmsp1p. The nucleotide diversity was higher toward the N-terminal domains (pkmsp1p-83 and 30) compared to the C-terminal domains (pkmsp1p-38, 33 and 19). Phylogenetic analysis of full-length genes identified 2 distinct clusters of P. knowlesi from Malaysian Borneo. The 40 pkmsp1p-19 sequences showed low polymorphisms with 16 polymorphisms leading to 18 haplotypes. In total there were 10 synonymous and 6 non-synonymous substitutions and 12 cysteine residues were intact within the two EGF domains. Evidence of strong purifying selection was observed within the full-length sequences as well in all the domains. Shared haplotypes of 40 pkmsp1p-19 were identified within Malaysian Borneo haplotypes.
CONCLUSIONS: This study is the first to report on the genetic diversity and natural selection of pkmsp1p. A low level of genetic diversity and strong evidence of negative selection was detected and observed in all the domains of pkmsp1p of P. knowlesi indicating functional constrains. Shared haplotypes were identified within pkmsp1p-19 highlighting further evaluation using larger number of clinical samples from Malaysia.