Affiliations 

  • 1 School of Physics, Universiti Sains Malaysia, 11800 Penang, Malaysia. Electronic address: [email protected]
  • 2 School of Physics, Universiti Sains Malaysia, 11800 Penang, Malaysia; School of Pharmaceutical Sciences, Universiti Sains Malaysia, 11800 Penang, Malaysia
  • 3 Material Characterization Team, PerkinElmer, Inc. Petaling Jaya, Malaysia
  • 4 School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing 102488, China
  • 5 Research Center for Medicinal Plant, Institute of Agricultural Bio-resource, Fujian Academy of Agricultural Sciences, Fuzhou 350003, Fujian, China
  • 6 School of Pharmaceutical Sciences, Universiti Sains Malaysia, 11800 Penang, Malaysia; College of Pharmacy, Fujian University of Traditional Chinese Medicine, Fuzhou 350122, Fujian, China. Electronic address: [email protected]
PMID: 34627017 DOI: 10.1016/j.saa.2021.120440

Abstract

A proof-of-concept medicinal herbs identification scheme using machine learning classifiers is proposed in the form of an automated computational package. The scheme makes use of two-dimensional correlation Fourier Transformed Infrared (FTIR) fingerprinting maps derived from the FTIR of raw herb spectra as digital input. The prototype package admits a collection of 11 machine learning classifiers to form a voting pool. A common set of oversampled dataset containing 5 different herbal classes is used to train the pool of classifiers on a one-verses-others manner. The collections of trained models, dubbed the voting classifiers, are deployed in a collective manner to cast their votes to support or against a given inference fingerprint whether it belongs to a particular class. By collecting the votes casted by all voting classifiers, a logically designed scoring system will select out the most probable guess of the identity of the inference fingerprint. The same scoring system is also capable of discriminating an inference fingerprint that does not belong to any of the classes the voting classifiers are trained for as the 'others' type. The proposed classification scheme is stress-tested to evaluate its performance and expected consistency. Our experimental runs show that, by and large, a satisfactory performance of the classification scheme of up to 90 % accuracy is achieved, providing a proof-of-concept viability that the proposed scheme is a feasible, practical, and convenient tool for herbal classification. The scheme is implemented in the form of a packaged Python code, dubbed the "Collective Voting" (CV) package, which is easily scalable, maintained and used in practice.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.