Affiliations 

  • 1 Institute of Medical Science Technology, Universiti Kuala Lumpur, Kajang, Selangor, Malaysia
  • 2 Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Hofuf, Kingdom of Saudi Arabia
  • 3 Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur, Kuala Lumpur, Malaysia
  • 4 Occupational and Environmental Health Unit, Kedah State Health Department, Alor Setar, Kedah, Malaysia
PeerJ Comput Sci, 2024;10:e1985.
PMID: 38660193 DOI: 10.7717/peerj-cs.1985

Abstract

BACKGROUND: This study introduced a novel approach for predicting occupational injury severity by leveraging deep learning-based text classification techniques to analyze unstructured narratives. Unlike conventional methods that rely on structured data, our approach recognizes the richness of information within injury narrative descriptions with the aim of extracting valuable insights for improved occupational injury severity assessment.

METHODS: Natural language processing (NLP) techniques were harnessed to preprocess the occupational injury narratives obtained from the US Occupational Safety and Health Administration (OSHA) from January 2015 to June 2023. The methodology involved meticulous preprocessing of textual narratives to standardize text and eliminate noise, followed by the innovative integration of Term Frequency-Inverse Document Frequency (TF-IDF) and Global Vector (GloVe) word embeddings for effective text representation. The proposed predictive model adopts a novel Bidirectional Long Short-Term Memory (Bi-LSTM) architecture and is further refined through model optimization, including random search hyperparameters and in-depth feature importance analysis. The optimized Bi-LSTM model has been compared and validated against other machine learning classifiers which are naïve Bayes, support vector machine, random forest, decision trees, and K-nearest neighbor.

RESULTS: The proposed optimized Bi-LSTM models' superior predictability, boasted an accuracy of 0.95 for hospitalization and 0.98 for amputation cases with faster model processing times. Interestingly, the feature importance analysis revealed predictive keywords related to the causal factors of occupational injuries thereby providing valuable insights to enhance model interpretability.

CONCLUSION: Our proposed optimized Bi-LSTM model offers safety and health practitioners an effective tool to empower workplace safety proactive measures, thereby contributing to business productivity and sustainability. This study lays the foundation for further exploration of predictive analytics in the occupational safety and health domain.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.