RESULTS: Individuals from villages with higher prevalences of helminth infections have more unmapped reads and greater microbial diversity. Microbial community diversity and composition were most strongly associated with different villages and the effects of helminth infection status on the microbiome varies by village. Longitudinal changes in the microbiome in response to albendazole anthelmintic treatment were observed in both helminth infected and uninfected individuals. Inference of bacterial population replication rates from origin of replication analysis identified specific replicating taxa associated with helminth infection.
CONCLUSIONS: Our results indicate that helminth effects on the microbiota were highly dependent on context, and effects of albendazole on the microbiota can be confounding for the interpretation of deworming studies. Furthermore, a substantial quantity of the microbiome remains unannotated, and this large dataset from an indigenous population associated with helminth infections is a valuable resource for future studies. Video Abstract.
RESULTS: We developed a stand-alone software that implements the FineMAV statistic. To graphically visualise the FineMAV scores, it outputs the statistics as bigWig files, which is a common file format supported by many genome browsers. It is available as a command-line and graphical user interface. The software was tested by replicating the FineMAV scores obtained using 1000 Genomes Project African, European, East and South Asian populations and subsequently applied to whole-genome sequencing datasets from Singapore and China to highlight population specific variants that can be subsequently modelled. The software tool is publicly available at https://github.com/fadilla-wahyudi/finemav .
CONCLUSIONS: The software tool described here determines genome-wide FineMAV scores, using low or high-coverage whole-genome sequencing datasets, that can be used to prioritize a list of population specific, highly differentiated candidate variants for in vitro or in vivo functional screens. The tool displays these scores on the human genome browsers for easy visualisation, annotation and comparison between different genomic regions in worldwide human populations.
METHODS: A total of 322 samples of mainly human origin were analysed using eight protocols, applying a wide variety of laboratory components. Several samples (60% of human specimens) were processed using different protocols. In total, 712 sequencing libraries were investigated for viral sequence contamination.
RESULTS: Among sequences showing similarity to viruses, 493 were significantly associated with the use of laboratory components. Each of these viral sequences had sporadic appearance, only being identified in a subset of the samples treated with the linked laboratory component, and some were not identified in the non-template control samples. Remarkably, more than 65% of all viral sequences identified were within viral clusters linked to the use of laboratory components.
CONCLUSIONS: We show that high prevalence of contaminating viral sequences can be expected in HTS-based virome data and provide an extensive list of novel contaminating viral sequences that can be used for evaluation of viral findings in future virome and metagenome studies. Moreover, we show that detection can be problematic due to stochastic appearance and limited non-template controls. Although the exact origin of these viral sequences requires further research, our results support laboratory-component-linked viral sequence contamination of both biological and synthetic origin.
FINDINGS: Our high-throughput workflow minimizes these risks via a 4-step strategy: (i) technical replication with 2 PCR replicates and 2 extraction replicates; (ii) using multi-markers (12S,16S,CytB); (iii) a "twin-tagging," 2-step PCR protocol; and (iv) use of the probabilistic taxonomic assignment method PROTAX, which can account for incomplete reference databases. Because annotation errors in the reference sequences can result in taxonomic misassignment, we supply a protocol for curating sequence datasets. For some taxonomic groups and some markers, curation resulted in >50% of sequences being deleted from public reference databases, owing to (i) limited overlap between our target amplicon and reference sequences, (ii) mislabelling of reference sequences, and (iii) redundancy. Finally, we provide a bioinformatic pipeline to process amplicons and conduct PROTAX assignment and tested it on an invertebrate-derived DNA dataset from 1,532 leeches from Sabah, Malaysia. Twin-tagging allowed us to detect and exclude sequences with non-matching tags. The smallest DNA fragment (16S) amplified most frequently for all samples but was less powerful for discriminating at species rank. Using a stringent and lax acceptance criterion we found 162 (stringent) and 190 (lax) vertebrate detections of 95 (stringent) and 109 (lax) leech samples.
CONCLUSIONS: Our metabarcoding workflow should help research groups increase the robustness of their results and therefore facilitate wider use of environmental and invertebrate-derived DNA, which is turning into a valuable source of ecological and conservation information on tetrapods.
RESULTS: Sequence data were obtained for both A. dorsata and H. itama. The raw sequence data for A. dorsata was 5 Mb, which was assembled into 5 contigs with a size of 6,098,728 bp, an N50 of 15,534, and a GC average of 57.42. Similarly, the raw sequence data for H. itama was 6.3 Mb, which was assembled into 11 contigs with a size of 7,642,048 bp, an N50 of 17,180, and a GC average of 55.38. In the honey sample of A. dorsata, we identified five different plant/pollen species, with only one of the five species exhibiting a relative abundance of less than 1%. For H. itama, we identified seven different plant/pollen species, with only three of the species exhibiting a relative abundance of less than 1%. All of the identified plant species were native to Peninsular Malaysia, especially the East Coast area of Terengganu.
DATA DESCRIPTION: Our data offers valuable insights into honey's geographical and botanical origin and authenticity. Metagenomic studies could help identify the plant species that honeybees forage and provide preliminary data for researchers studying the biological development of A. dorsata and H. itama. The identification of various flowers from the eDNA of honey that are known for their medicinal properties could aid in regional honey with accurate product origin labeling, which is crucial for guaranteeing product authenticity to consumers.
RESULTS: One of the samples was successfully sequenced with enough sequencing yield for further analysis. After depleting the reads mapped to host DNA, the remaining reads were shown to map to Theileria orientalis using BLAST and OneCodex. Although the reads were also mapped to Clostridium botulinum, those were found to be artifacts derived from the cow genome. An effort to construct a consensus sequence was successful using a reference-based approach with Pomoxis. Hence, we concluded that the asymptomatic cow might be infected with T. orientalis and showed the usefulness of sequencing technology, specifically the MinION platform, in a developing country.