Affiliations 

  • 1 Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow Campus, Lucknow, Uttar Pradesh, India. [email protected]
  • 2 Computer and Information Sciences Department, Universiti Teknologi Petronas, 32610, Seri Iskander, Perak, Malaysia
  • 3 Department of Applied Science, Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
  • 4 Department of Bioengineering, Integral University, Dasauli, P.O. Basha, Kursi Road, Lucknow, Uttar Pradesh, India
  • 5 West China School of Nursing / Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu 610041, Sichuan, China
  • 6 Pre-Clinical Research Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia. [email protected]
  • 7 Pre-Clinical Research Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
  • 8 Department of Pharmacy, Southeast University, Dhaka, Bangladesh. [email protected]
Environ Sci Pollut Res Int, 2021 Sep;28(34):47641-47650.
PMID: 33895950 DOI: 10.1007/s11356-021-14028-9

Abstract

We are exposed to various chemical compounds present in the environment, cosmetics, and drugs almost every day. Mutagenicity is a valuable property that plays a significant role in establishing a chemical compound's safety. Exposure and handling of mutagenic chemicals in the environment pose a high health risk; therefore, identification and screening of these chemicals are essential. Considering the time constraints and the pressure to avoid laboratory animals' use, the shift to alternative methodologies that can establish a rapid and cost-effective detection without undue over-conservation seems critical. In this regard, computational detection and identification of the mutagens in environmental samples like drugs, pesticides, dyes, reagents, wastewater, cosmetics, and other substances is vital. From the last two decades, there have been numerous efforts to develop the prediction models for mutagenicity, and by far, machine learning methods have demonstrated some noteworthy performance and reliability. However, the accuracy of such prediction models has always been one of the major concerns for the researchers working in this area. The mutagenicity prediction models were developed using deep neural network (DNN), support vector machine, k-nearest neighbor, and random forest. The developed classifiers were based on 3039 compounds and validated on 1014 compounds; each of them encoded with 1597 molecular feature vectors. DNN-based prediction model yielded highest prediction accuracy of 92.95% and 83.81% with the training and test data, respectively. The area under the receiver's operating curve and precision-recall curve values were found to be 0.894 and 0.838, respectively. The DNN-based classifier not only fits the data with better performance as compared to traditional machine learning algorithms, viz., support vector machine, k-nearest neighbor, and random forest (with and without feature reduction) but also yields better performance metrics. In current work, we propose a DNN-based model to predict mutagenicity of compounds.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.