RESULTS: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized.
CONCLUSIONS: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.
RESULTS: Here, we analyzed genetic data of 230 B. flabellifer accessions across Thailand using 17 EST-SSR and 12 gSSR polymorphic markers. Clustering analysis revealed that the population consisted of two genetic clusters (STRUCTURE K = 2). Cluster I is found mainly in southern Thailand, while Cluster II is found mainly in the northeastern. Those found in the central are of an extensive mix between the two. These two clusters are in moderate differentiation (F ST = 0.066 and N M = 3.532) and have low genetic diversity (HO = 0.371 and 0.416; AR = 2.99 and 3.19, for the cluster I and II respectively). The minimum numbers of founders for each genetic group varies from 3 to 4 individuals, based on simulation using different allele frequency assumptions. These numbers coincide with that B. flabellifer is dioecious, and a number of seeds had to be simultaneously introduced for obtaining both male and female founders.
CONCLUSIONS: From these data and geographical and historical evidence, we hypothesize that there were at least two different invasive events of B. flabellifer in Thailand. B. flabellifer was likely brought through the Straits of Malacca to be propagated in the southern Thailand as one of the invasive events before spreading to the central Thailand. The second event likely occurred in Khmer Empire, currently Cambodia, before spreading to the northeastern Thailand.
FINDINGS: We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long reads, and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from 3 different tissue types from 3 other species of squid (Onychoteuthis banksii, Dosidicus gigas, and Sthenoteuthis oualaniensis) to assist genome annotation. We annotated 33,406 protein-coding genes supported by evidence, and the genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome.
CONCLUSIONS: This annotated draft genome of A. dux provides a critical resource to investigate the unique traits of this species, including its gigantism and key adaptations to deep-sea environments.
RESULTS: We propose a succinct representation of the distance matrices which tremendously reduces the space requirement. We give a complete solution, called SuperRec, for the inference of chromosomal structures from Hi-C data, through iterative solving the large-scale weighted multidimensional scaling problem.
CONCLUSIONS: SuperRec runs faster than earlier systems without compromising on result accuracy. The SuperRec package can be obtained from http://www.cs.cityu.edu.hk/~shuaicli/SuperRec .