Plasmodium knowlesi is a newly described zoonosis that causes malaria in the human population that can be severe and fatal. The study of P. knowlesi parasites from human clinical isolates is relatively new and, in order to obtain maximum information from patient sample collections, we explored the possibility of generating P. knowlesi genome sequences from archived clinical isolates. Our patient sample collection consisted of frozen whole blood samples that contained excessive human DNA contamination and, in that form, were not suitable for parasite genome sequencing. We developed a method to reduce the amount of human DNA in the thawed blood samples in preparation for high throughput parasite genome sequencing using Illumina HiSeq and MiSeq sequencing platforms. Seven of fifteen samples processed had sufficiently pure P. knowlesi DNA for whole genome sequencing. The reads were mapped to the P. knowlesi H strain reference genome and an average mapping of 90% was obtained. Genes with low coverage were removed leaving 4623 genes for subsequent analyses. Previously we identified a DNA sequence dimorphism on a small fragment of the P. knowlesi normocyte binding protein xa gene on chromosome 14. We used the genome data to assemble full-length Pknbpxa sequences and discovered that the dimorphism extended along the gene. An in-house algorithm was developed to detect SNP sites co-associating with the dimorphism. More than half of the P. knowlesi genome was dimorphic, involving genes on all chromosomes and suggesting that two distinct types of P. knowlesi infect the human population in Sarawak, Malaysian Borneo. We use P. knowlesi clinical samples to demonstrate that Plasmodium DNA from archived patient samples can produce high quality genome data. We show that analyses, of even small numbers of difficult clinical malaria isolates, can generate comprehensive genomic information that will improve our understanding of malaria parasite diversity and pathobiology.
The widespread distribution and relapsing nature of Plasmodium vivax infection present major challenges for the elimination of malaria. To characterize the genetic diversity of this parasite in individual infections and across the population, we performed deep genome sequencing of >200 clinical samples collected across the Asia-Pacific region and analyzed data on >300,000 SNPs and nine regions of the genome with large copy number variations. Individual infections showed complex patterns of genetic structure, with variation not only in the number of dominant clones but also in their level of relatedness and inbreeding. At the population level, we observed strong signals of recent evolutionary selection both in known drug resistance genes and at new loci, and these varied markedly between geographical locations. These findings demonstrate a dynamic landscape of local evolutionary adaptation in the parasite population and provide a foundation for genomic surveillance to guide effective strategies for control and elimination of P. vivax.
This report describes the MalariaGEN Pv4 dataset, a new release of curated genome variation data on 1,895 samples of Plasmodium vivax collected at 88 worldwide locations between 2001 and 2017. It includes 1,370 new samples contributed by MalariaGEN and VivaxGEN partner studies in addition to previously published samples from these and other sources. We provide genotype calls at over 4.5 million variable positions including over 3 million single nucleotide polymorphisms (SNPs), as well as short indels and tandem duplications. This enlarged dataset highlights major compartments of parasite population structure, with clear differentiation between Africa, Latin America, Oceania, Western Asia and different parts of Southeast Asia. Each sample has been classified for drug resistance to sulfadoxine, pyrimethamine and mefloquine based on known markers at the dhfr, dhps and mdr1 loci. The prevalence of all of these resistance markers was much higher in Southeast Asia and Oceania than elsewhere. This open resource of analysis-ready genome variation data from the MalariaGEN and VivaxGEN networks is driven by our collective goal to advance research into the complex biology of P. vivax and to accelerate genomic surveillance for malaria control and elimination.
Traditionally, patient travel history has been used to distinguish imported from autochthonous malaria cases, but the dormant liver stages of Plasmodium vivax confound this approach. Molecular tools offer an alternative method to identify, and map imported cases. Using machine learning approaches incorporating hierarchical fixation index and decision tree analyses applied to 799 P. vivax genomes from 21 countries, we identified 33-SNP, 50-SNP and 55-SNP barcodes (GEO33, GEO50 and GEO55), with high capacity to predict the infection's country of origin. The Matthews correlation coefficient (MCC) for an existing, commonly applied 38-SNP barcode (BR38) exceeded 0.80 in 62% countries. The GEO panels outperformed BR38, with median MCCs > 0.80 in 90% countries at GEO33, and 95% at GEO50 and GEO55. An online, open-access, likelihood-based classifier framework was established to support data analysis (vivaxGEN-geo). The SNP selection and classifier methods can be readily amended for other use cases to support malaria control programs.