Displaying publications 61 - 80 of 88 in total

Abstract:
Sort:
  1. Muhammed Yusuf N, Abu Bakar K, Isyaku B, Abdelmaboud A, Nagmeldin W
    PeerJ Comput Sci, 2023;9:e1698.
    PMID: 38192471 DOI: 10.7717/peerj-cs.1698
    Software-defined networking (SDN) is a networking architecture with improved efficiency achieved by moving networking decisions from the data plane to provide them critically at the control plane. In a traditional SDN, typically, a single controller is used. However, the complexity of modern networks due to their size and high traffic volume with varied quality of service requirements have introduced high control message communications overhead on the controller. Similarly, the solution found using multiple distributed controllers brings forth the 'controller placement problem' (CPP). Incorporating switch roles in the CPP modelling during network partitioning for controller placement has not been adequately considered by any existing CPP techniques. This article proposes the controller placement algorithm with network partition based on critical switch awareness (CPCSA). CPCSA identifies critical switch in the software defined wide area network (SDWAN) and then partition the network based on the criticality. Subsequently, a controller is assigned to each partition to improve control messages communication overhead, loss, throughput, and flow setup delay. The CPSCSA experimented with real network topologies obtained from the Internet Topology Zoo. Results show that CPCSA has achieved an aggregate reduction in the controller's overhead by 73%, loss by 51%, and latency by 16% while improving throughput by 16% compared to the benchmark algorithms.
  2. Karam SN, Bilal K, Khan AN, Shuja J, Abdulkadir SJ
    PeerJ Comput Sci, 2024;10:e1908.
    PMID: 38435610 DOI: 10.7717/peerj-cs.1908
    The oil and gas industries (OGI) are the primary global energy source, with pipelines as vital components for OGI transportation. However, pipeline leaks pose significant risks, including fires, injuries, environmental harm, and property damage. Therefore, maintaining an effective pipeline maintenance system is critical for ensuring a safe and sustainable energy supply. The Internet of Things (IoT) has emerged as a cutting-edge technology for efficient OGI pipeline leak detection. However, deploying IoT in OGI monitoring faces significant challenges due to hazardous environments and limited communication infrastructure. Energy efficiency and fault tolerance, typical IoT concerns, gain heightened importance in the OGI context. In OGI monitoring, IoT devices are linearly deployed with no alternative communication mechanism available along OGI pipelines. Thus, the absence of both communication routes can disrupt crucial data transmission. Therefore, ensuring energy-efficient and fault-tolerant communication for OGI data is paramount. Critical data needs to reach the control center on time for faster actions to avoid loss. Low latency communication for critical data is another challenge of the OGI monitoring environment. Moreover, IoT devices gather a plethora of OGI parameter data including redundant values that hold no relevance for transmission to the control center. Thus, optimizing data transmission is essential to conserve energy in OGI monitoring. This article presents the Priority-Based, Energy-Efficient, and Optimal Data Routing Protocol (PO-IMRP) to tackle these challenges. The energy model and congestion control mechanism optimize data packets for an energy-efficient and congestion-free network. In PO-IMRP, nodes are aware of their energy status and communicate node's depletion status timely for network robustness. Priority-based routing selects low-latency routes for critical data to avoid OGI losses. Comparative analysis against linear LEACH highlights PO-IMRP's superior performance in terms of total packet transmission by completing fewer rounds with more packet's transmissions, attributed to the packet optimization technique implemented at each hop, which helps mitigate network congestion. MATLAB simulations affirm the effectiveness of the protocol in terms of energy efficiency, fault-tolerance, and low latency communication.
  3. Tahir YS, Rosdi BA
    PeerJ Comput Sci, 2024;10:e1837.
    PMID: 38435623 DOI: 10.7717/peerj-cs.1837
    Several deep neural networks have been introduced for finger vein recognition over time, and these networks have demonstrated high levels of performance. However, most current state-of-the-art deep learning systems use networks with increasing layers and parameters, resulting in greater computational costs and complexity. This can make them impractical for real-time implementation, particularly on embedded hardware. To address these challenges, this article concentrates on developing a lightweight convolutional neural network (CNN) named FV-EffResNet for finger vein recognition, aiming to find a balance between network size, speed, and accuracy. The key improvement lies in the utilization of the proposed novel convolution block named the Efficient Residual (EffRes) block, crafted to facilitate efficient feature extraction while minimizing the parameter count. The block decomposes the convolution process, employing pointwise and depthwise convolutions with a specific rectangular dimension realized in two layers (n × 1) and (1 × m) for enhanced handling of finger vein data. The approach achieves computational efficiency through a combination of squeeze units, depthwise convolution, and a pooling strategy. The hidden layers of the network use the Swish activation function, which has been shown to enhance performance compared to conventional functions like ReLU or Leaky ReLU. Furthermore, the article adopts cyclical learning rate techniques to expedite the training process of the proposed network. The effectiveness of the proposed pipeline is demonstrated through comprehensive experiments conducted on four benchmark databases, namely FV-USM, SDUMLA, MMCBNU_600, and NUPT-FV. The experimental results reveal that the EffRes block has a remarkable impact on finger vein recognition. The proposed FV-EffResNet achieves state-of-the-art performance in both identification and verification settings, leveraging the benefits of being lightweight and incurring low computational costs.
  4. Santhiran R, Varathan KD, Chiam YK
    PeerJ Comput Sci, 2024;10:e1821.
    PMID: 38435547 DOI: 10.7717/peerj-cs.1821
    Opinion mining is gaining significant research interest, as it directly and indirectly provides a better avenue for understanding customers, their sentiments toward a service or product, and their purchasing decisions. However, extracting every opinion feature from unstructured customer review documents is challenging, especially since these reviews are often written in native languages and contain grammatical and spelling errors. Moreover, existing pattern rules frequently exclude features and opinion words that are not strictly nouns or adjectives. Thus, selecting suitable features when analyzing customer reviews is the key to uncovering their actual expectations. This study aims to enhance the performance of explicit feature extraction from product review documents. To achieve this, an approach that employs sequential pattern rules is proposed to identify and extract features with associated opinions. The improved pattern rules total 41, including 16 new rules introduced in this study and 25 existing pattern rules from previous research. An average calculated from the testing results of five datasets showed that the incorporation of this study's 16 new rules significantly improved feature extraction precision by 6%, recall by 6% and F-measure value by 5% compared to the contemporary approach. The new set of rules has proven to be effective in extracting features that were previously overlooked, thus achieving its objective of addressing gaps in existing rules. Therefore, this study has successfully enhanced feature extraction results, yielding an average precision of 0.91, an average recall value of 0.88, and an average F-measure of 0.89.
  5. Li F, Majid NA, Ding S
    PeerJ Comput Sci, 2024;10:e1875.
    PMID: 38435555 DOI: 10.7717/peerj-cs.1875
    This article aims to address the challenge of predicting the salaries of college graduates, a subject of significant practical value in the fields of human resources and career planning. Traditional prediction models often overlook diverse influencing factors and complex data distributions, limiting the accuracy and reliability of their predictions. Against this backdrop, we propose a novel prediction model that integrates maximum likelihood estimation (MLE), Jeffreys priors, Kullback-Leibler risk function, and Gaussian mixture models to optimize LSTM models in deep learning. Compared to existing research, our approach has multiple innovations: First, we successfully improve the model's predictive accuracy through the use of MLE. Second, we reduce the model's complexity and enhance its interpretability by applying Jeffreys priors. Lastly, we employ the Kullback-Leibler risk function for model selection and optimization, while the Gaussian mixture models further refine the capture of complex characteristics of salary distribution. To validate the effectiveness and robustness of our model, we conducted experiments on two different datasets. The results show significant improvements in prediction accuracy, model complexity, and risk performance. This study not only provides an efficient and reliable tool for predicting the salaries of college graduates but also offers robust theoretical and empirical foundations for future research in this field.
  6. Cherukuru P, Mustafa MB
    PeerJ Comput Sci, 2024;10:e1901.
    PMID: 38435554 DOI: 10.7717/peerj-cs.1901
    Speech enhancement algorithms are applied in multiple levels of enhancement to improve the quality of speech signals under noisy environments known as multi-channel speech enhancement (MCSE) systems. Numerous existing algorithms are used to filter noise in speech enhancement systems, which are typically employed as a pre-processor to reduce noise and improve speech quality. They may, however, be limited in performing well under low signal-to-noise ratio (SNR) situations. The speech devices are exposed to all kinds of environmental noises which may go up to a high-level frequency of noises. The objective of this research is to conduct a noise reduction experiment for a multi-channel speech enhancement (MCSE) system in stationary and non-stationary environmental noisy situations with varying speech signal SNR levels. The experiments examined the performance of the existing and the proposed MCSE systems for environmental noises in filtering low to high SNRs environmental noises (-10 dB to 20 dB). The experiments were conducted using the AURORA and LibriSpeech datasets, which consist of different types of environmental noises. The existing MCSE (BAV-MCSE) makes use of beamforming, adaptive noise reduction and voice activity detection algorithms (BAV) to filter the noises from speech signals. The proposed MCSE (DWT-CNN-MCSE) system was developed based on discrete wavelet transform (DWT) preprocessing and convolution neural network (CNN) for denoising the input noisy speech signals to improve the performance accuracy. The performance of the existing BAV-MCSE and the proposed DWT-CNN-MCSE were measured using spectrogram analysis and word recognition rate (WRR). It was identified that the existing BAV-MCSE reported the highest WRR at 93.77% for a high SNR (at 20 dB) and 5.64% on average for a low SNR (at -10 dB) for different noises. The proposed DWT-CNN-MCSE system has proven to perform well at a low SNR with WRR of 70.55% and the highest improvement (64.91% WRR) at -10 dB SNR.
  7. Shen L, Jiang L
    PeerJ Comput Sci, 2024;10:e1858.
    PMID: 38435553 DOI: 10.7717/peerj-cs.1858
    Managing user bias in large-scale user review data is a significant challenge in optimizing children's book recommendation systems. To tackle this issue, this study introduces a novel hybrid model that combines graph convolutional networks (GCN) based on bipartite graphs and neural matrix factorization (NMF). This model aims to enhance the precision and efficiency of children's book recommendations by accurately capturing user biases. In this model, the complex interactions between users and books are modeled as a bipartite graph, with the users' book ratings serving as the weights of the edges. Through GCN and NMF, we can delve into the structure of the graph and the behavioral patterns of users, more accurately identify and address user biases, and predict their future behaviors. Compared to traditional recommendation systems, our hybrid model excels in handling large-scale user review data. Experimental results confirm that our model has significantly improved in terms of recommendation accuracy and scalability, positively contributing to the advancement of children's book recommendation systems.
  8. Hidayat T, Ahmad A, Ngo HC
    PeerJ Comput Sci, 2024;10:e1806.
    PMID: 38435549 DOI: 10.7717/peerj-cs.1806
    An implicational base is knowledge extracted from a formal context. The implicational base of a formal context consists of attribute implications which are sound, complete, and non-redundant regarding to the formal context. Non-redundant means that each attribute implication in the implication base cannot be inferred from the others. However, sometimes some attribute implications in the implication base can be inferred from the others together with a prior knowledge. Regarding knowledge discovery, such attribute implications should be not considered as new knowledge and ignored from the implicational base. In other words, such attribute implications are redundant based on prior knowledge. One sort of prior knowledge is a set of constraints that restricts some attributes in data. In formal context, constraints restrict some attributes of objects in the formal context. This article proposes a method to generate non-redundant implication base of a formal context with some constraints which restricting the formal context. In this case, non-redundant implicational base means that the implicational base does not contain all attribute implications which can be inferred from the others together with information of the constraints. This article also proposes a formulation to check the redundant attribute implications and encoding the problem into satisfiability (SAT) problem such that the problem can be solved by SAT Solver, a software which can solve a SAT problem. After implementation, an experiment shows that the proposed method is able to check the redundant attribute implication and generates a non-redundant implicational base of formal context with constraints.
  9. Bukar UA, Sayeed MS, Razak SFA, Yogarayan S, Amodu OA
    PeerJ Comput Sci, 2024;10:e1845.
    PMID: 38440047 DOI: 10.7717/peerj-cs.1845
    Generative artificial intelligence has created a moment in history where human beings have begin to closely interact with artificial intelligence (AI) tools, putting policymakers in a position to restrict or legislate such tools. One particular example of such a tool is ChatGPT which is the first and world's most popular multipurpose generative AI tool. This study aims to put forward a policy-making framework of generative artificial intelligence based on the risk, reward, and resilience framework. A systematic search was conducted, by using carefully chosen keywords, excluding non-English content, conference articles, book chapters, and editorials. Published research were filtered based on their relevance to ChatGPT ethics, yielding a total of 41 articles. Key elements surrounding ChatGPT concerns and motivations were systematically deduced and classified under the risk, reward, and resilience categories to serve as ingredients for the proposed decision-making framework. The decision-making process and rules were developed as a primer to help policymakers navigate decision-making conundrums. Then, the framework was practically tailored towards some of the concerns surrounding ChatGPT in the context of higher education. In the case of the interconnection between risk and reward, the findings show that providing students with access to ChatGPT presents an opportunity for increased efficiency in tasks such as text summarization and workload reduction. However, this exposes them to risks such as plagiarism and cheating. Similarly, pursuing certain opportunities such as accessing vast amounts of information, can lead to rewards, but it also introduces risks like misinformation and copyright issues. Likewise, focusing on specific capabilities of ChatGPT, such as developing tools to detect plagiarism and misinformation, may enhance resilience in some areas (e.g., academic integrity). However, it may also create vulnerabilities in other domains, such as the digital divide, educational equity, and job losses. Furthermore, the finding indicates second-order effects of legislation regarding ChatGPT which have implications both positively and negatively. One potential effect is a decrease in rewards due to the limitations imposed by the legislation, which may hinder individuals from fully capitalizing on the opportunities provided by ChatGPT. Hence, the risk, reward, and resilience framework provides a comprehensive and flexible decision-making model that allows policymakers and in this use case, higher education institutions to navigate the complexities and trade-offs associated with ChatGPT, which have theoretical and practical implications for the future.
  10. Idris NF, Ismail MA
    PeerJ Comput Sci, 2021;7:e427.
    PMID: 34013024 DOI: 10.7717/peerj-cs.427
    Breast cancer becomes the second major cause of death among women cancer patients worldwide. Based on research conducted in 2019, there are approximately 250,000 women across the United States diagnosed with invasive breast cancer each year. The prevention of breast cancer remains a challenge in the current world as the growth of breast cancer cells is a multistep process that involves multiple cell types. Early diagnosis and detection of breast cancer are among the greatest approaches to preventing cancer from spreading and increasing the survival rate. For more accurate and fast detection of breast cancer disease, automatic diagnostic methods are applied to conduct the breast cancer diagnosis. This paper proposed the fuzzy-ID3 (FID3) algorithm, a fuzzy decision tree as the classification method in breast cancer detection. This study aims to resolve the limitation of an existing method, ID3 algorithm that unable to classify the continuous-valued data and increase the classification accuracy of the decision tree. FID3 algorithm combined the fuzzy system and decision tree techniques with ID3 algorithm as the decision tree learning. FUZZYDBD method, an automatic fuzzy database definition method, would be used to design the fuzzy database for fuzzification of data in the FID3 algorithm. It was used to generate a predefined fuzzy database before the generation of the fuzzy rule base. The fuzzified dataset was applied in FID3 algorithm, which is the fuzzy version of the ID3 algorithm. The inference system of FID3 algorithm is simple with direct extraction of rules from generated tree to determine the classes for the new input instances. This study also analysed the results using three breast cancer datasets: WBCD (Original), WDBC (Diagnostic) and Coimbra. Furthermore, the comparison of FID3 algorithm with the existing methods is conducted to verify the proposed method's capability and performance. This study identified that the combination of FID3 algorithm with FUZZYDBD method is reliable, robust and managed to perform well in breast cancer classification.
  11. Chiroma H, Ezugwu AE, Jauro F, Al-Garadi MA, Abdullahi IN, Shuib L
    PeerJ Comput Sci, 2020;6:e313.
    PMID: 33816964 DOI: 10.7717/peerj-cs.313
    Background and Objective: The COVID-19 pandemic has caused severe mortality across the globe, with the USA as the current epicenter of the COVID-19 epidemic even though the initial outbreak was in Wuhan, China. Many studies successfully applied machine learning to fight COVID-19 pandemic from a different perspective. To the best of the authors' knowledge, no comprehensive survey with bibliometric analysis has been conducted yet on the adoption of machine learning to fight COVID-19. Therefore, the main goal of this study is to bridge this gap by carrying out an in-depth survey with bibliometric analysis on the adoption of machine learning-based technologies to fight COVID-19 pandemic from a different perspective, including an extensive systematic literature review and bibliometric analysis.

    Methods: We applied a literature survey methodology to retrieved data from academic databases and subsequently employed a bibliometric technique to analyze the accessed records. Besides, the concise summary, sources of COVID-19 datasets, taxonomy, synthesis and analysis are presented in this study. It was found that the Convolutional Neural Network (CNN) is mainly utilized in developing COVID-19 diagnosis and prognosis tools, mostly from chest X-ray and chest CT scan images. Similarly, in this study, we performed a bibliometric analysis of machine learning-based COVID-19 related publications in the Scopus and Web of Science citation indexes. Finally, we propose a new perspective for solving the challenges identified as direction for future research. We believe the survey with bibliometric analysis can help researchers easily detect areas that require further development and identify potential collaborators.

    Results: The findings of the analysis presented in this article reveal that machine learning-based COVID-19 diagnose tools received the most considerable attention from researchers. Specifically, the analyses of results show that energy and resources are more dispenses towards COVID-19 automated diagnose tools while COVID-19 drugs and vaccine development remains grossly underexploited. Besides, the machine learning-based algorithm that is predominantly utilized by researchers in developing the diagnostic tool is CNN mainly from X-rays and CT scan images.

    Conclusions: The challenges hindering practical work on the application of machine learning-based technologies to fight COVID-19 and new perspective to solve the identified problems are presented in this article. Furthermore, we believed that the presented survey with bibliometric analysis could make it easier for researchers to identify areas that need further development and possibly identify potential collaborators at author, country and institutional level, with the overall aim of furthering research in the focused area of machine learning application to disease control.

  12. Hameed SS, Hassan WH, Abdul Latiff L, Ghabban F
    PeerJ Comput Sci, 2021;7:e414.
    PMID: 33834100 DOI: 10.7717/peerj-cs.414
    Background: The Internet of Medical Things (IoMTs) is gradually replacing the traditional healthcare system. However, little attention has been paid to their security requirements in the development of the IoMT devices and systems. One of the main reasons can be the difficulty of tuning conventional security solutions to the IoMT system. Machine Learning (ML) has been successfully employed in the attack detection and mitigation process. Advanced ML technique can also be a promising approach to address the existing and anticipated IoMT security and privacy issues. However, because of the existing challenges of IoMT system, it is imperative to know how these techniques can be effectively utilized to meet the security and privacy requirements without affecting the IoMT systems quality, services, and device's lifespan.

    Methodology: This article is devoted to perform a Systematic Literature Review (SLR) on the security and privacy issues of IoMT and their solutions by ML techniques. The recent research papers disseminated between 2010 and 2020 are selected from multiple databases and a standardized SLR method is conducted. A total of 153 papers were reviewed and a critical analysis was conducted on the selected papers. Furthermore, this review study attempts to highlight the limitation of the current methods and aims to find possible solutions to them. Thus, a detailed analysis was carried out on the selected papers through focusing on their methods, advantages, limitations, the utilized tools, and data.

    Results: It was observed that ML techniques have been significantly deployed for device and network layer security. Most of the current studies improved traditional metrics while ignored performance complexity metrics in their evaluations. Their studies environments and utilized data barely represent IoMT system. Therefore, conventional ML techniques may fail if metrics such as resource complexity and power usage are not considered.

  13. Hu X, Xie Y, Zhao H, Sheng G, Lai KW, Zhang Y
    PeerJ Comput Sci, 2024;10:e1874.
    PMID: 38481705 DOI: 10.7717/peerj-cs.1874
    Epilepsy is a chronic, non-communicable disease caused by paroxysmal abnormal synchronized electrical activity of brain neurons, and is one of the most common neurological diseases worldwide. Electroencephalography (EEG) is currently a crucial tool for epilepsy diagnosis. With the development of artificial intelligence, multi-view learning-based EEG analysis has become an important method for automatic epilepsy recognition because EEG contains difficult types of features such as time-frequency features, frequency-domain features and time-domain features. However, current multi-view learning still faces some challenges, such as the difference between samples of the same class from different views is greater than the difference between samples of different classes from the same view. In view of this, in this study, we propose a shared hidden space-driven multi-view learning algorithm. The algorithm uses kernel density estimation to construct a shared hidden space and combines the shared hidden space with the original space to obtain an expanded space for multi-view learning. By constructing the expanded space and utilizing the information of both the shared hidden space and the original space for learning, the relevant information of samples within and across views can thereby be fully utilized. Experimental results on a dataset of epilepsy provided by the University of Bonn show that the proposed algorithm has promising performance, with an average classification accuracy value of 0.9787, which achieves at least 4% improvement compared to single-view methods.
  14. Rajabpour L, Selamat H, Barzegar A, Fadzli Haniff M
    PeerJ Comput Sci, 2021;7:e756.
    PMID: 34805509 DOI: 10.7717/peerj-cs.756
    Undesirable vibrations resulting from the use of vibrating hand-held tools decrease the tool performance and user productivity. In addition, prolonged exposure to the vibration can cause ergonomic injuries known as the hand-arm vibration syndrome (HVAS). Therefore, it is very important to design a vibration suppression mechanism that can isolate or suppress the vibration transmission to the users' hands to protect them from HAVS. While viscoelastic materials in anti-vibration gloves are used as the passive control approach, an active vibration control has shown to be more effective but requires the use of sensors, actuators and controllers. In this paper, the design of a controller for an anti-vibration glove is presented. The aim is to keep the level of vibrations transferred from the tool to the hands within a healthy zone. The paper also describes the formulation of the hand-glove system's mathematical model and the design of a fuzzy parallel distributed compensation (PDC) controller that can cater for different hand masses. The performances of the proposed controller are evaluated through simulations and the results are benchmarked with two other active vibration control techniques-proportional integral derivative (PID) controller and active force controller (AFC). The simulation results show a superior performance of the proposed controller over the benchmark controllers. The designed PDC controller is able to suppress the vibration transferred to the user's hand 93% and 85% better than the PID controller and the AFC, respectively.
  15. Hossain T, Shamrat FMJM, Zhou X, Mahmud I, Mazumder MSA, Sharmin S, et al.
    PeerJ Comput Sci, 2024;10:e1950.
    PMID: 38660192 DOI: 10.7717/peerj-cs.1950
    Gastrointestinal (GI) diseases are prevalent medical conditions that require accurate and timely diagnosis for effective treatment. To address this, we developed the Multi-Fusion Convolutional Neural Network (MF-CNN), a deep learning framework that strategically integrates and adapts elements from six deep learning models, enhancing feature extraction and classification of GI diseases from endoscopic images. The MF-CNN architecture leverages truncated and partially frozen layers from existing models, augmented with novel components such as Auxiliary Fusing Layers (AuxFL), Fusion Residual Block (FuRB), and Alpha Dropouts (αDO) to improve precision and robustness. This design facilitates the precise identification of conditions such as ulcerative colitis, polyps, esophagitis, and healthy colons. Our methodology involved preprocessing endoscopic images sourced from open databases, including KVASIR and ETIS-Larib Polyp DB, using adaptive histogram equalization (AHE) to enhance their quality. The MF-CNN framework supports detailed feature mapping for improved interpretability of the model's internal workings. An ablation study was conducted to validate the contribution of each component, demonstrating that the integration of AuxFL, αDO, and FuRB played a crucial part in reducing overfitting and efficiency saturation and enhancing overall model performance. The MF-CNN demonstrated outstanding performance in terms of efficacy, achieving an accuracy rate of 99.25%. It also excelled in other key performance metrics with a precision of 99.27%, a recall of 99.25%, and an F1-score of 99.25%. These metrics confirmed the model's proficiency in accurate classification and its capability to minimize false positives and negatives across all tested GI disease categories. Furthermore, the AUC values were exceptional, averaging 1.00 for both test and validation sets, indicating perfect discriminative ability. The findings of the P-R curve analysis and confusion matrix further confirmed the robust classification performance of the MF-CNN. This research introduces a technique for medical imaging that can potentially transform diagnostics in gastrointestinal healthcare facilities worldwide.
  16. Khairuddin MZF, Sankaranarayanan S, Hasikin K, Abd Razak NA, Omar R
    PeerJ Comput Sci, 2024;10:e1985.
    PMID: 38660193 DOI: 10.7717/peerj-cs.1985
    BACKGROUND: This study introduced a novel approach for predicting occupational injury severity by leveraging deep learning-based text classification techniques to analyze unstructured narratives. Unlike conventional methods that rely on structured data, our approach recognizes the richness of information within injury narrative descriptions with the aim of extracting valuable insights for improved occupational injury severity assessment.

    METHODS: Natural language processing (NLP) techniques were harnessed to preprocess the occupational injury narratives obtained from the US Occupational Safety and Health Administration (OSHA) from January 2015 to June 2023. The methodology involved meticulous preprocessing of textual narratives to standardize text and eliminate noise, followed by the innovative integration of Term Frequency-Inverse Document Frequency (TF-IDF) and Global Vector (GloVe) word embeddings for effective text representation. The proposed predictive model adopts a novel Bidirectional Long Short-Term Memory (Bi-LSTM) architecture and is further refined through model optimization, including random search hyperparameters and in-depth feature importance analysis. The optimized Bi-LSTM model has been compared and validated against other machine learning classifiers which are naïve Bayes, support vector machine, random forest, decision trees, and K-nearest neighbor.

    RESULTS: The proposed optimized Bi-LSTM models' superior predictability, boasted an accuracy of 0.95 for hospitalization and 0.98 for amputation cases with faster model processing times. Interestingly, the feature importance analysis revealed predictive keywords related to the causal factors of occupational injuries thereby providing valuable insights to enhance model interpretability.

    CONCLUSION: Our proposed optimized Bi-LSTM model offers safety and health practitioners an effective tool to empower workplace safety proactive measures, thereby contributing to business productivity and sustainability. This study lays the foundation for further exploration of predictive analytics in the occupational safety and health domain.

  17. Alhazmi A, Mahmud R, Idris N, Mohamed Abo ME, Eke C
    PeerJ Comput Sci, 2024;10:e1966.
    PMID: 38660217 DOI: 10.7717/peerj-cs.1966
    The automatic speech identification in Arabic tweets has generated substantial attention among academics in the fields of text mining and natural language processing (NLP). The quantity of studies done on this subject has experienced significant growth. This study aims to provide an overview of this field by conducting a systematic review of literature that focuses on automatic hate speech identification, particularly in the Arabic language. The goal is to examine the research trends in Arabic hate speech identification and offer guidance to researchers by highlighting the most significant studies published between 2018 and 2023. This systematic study addresses five specific research questions concerning the types of the Arabic language used, hate speech categories, classification techniques, feature engineering techniques, performance metrics, validation methods, existing challenges faced by researchers, and potential future research directions. Through a comprehensive search across nine academic databases, 24 studies that met the predefined inclusion criteria and quality assessment were identified. The review findings revealed the existence of many Arabic linguistic varieties used in hate speech on Twitter, with modern standard Arabic (MSA) being the most prominent. In identification techniques, machine learning categories are the most used technique for Arabic hate speech identification. The result also shows different feature engineering techniques used and indicates that N-gram and CBOW are the most used techniques. F1-score, precision, recall, and accuracy were also identified as the most used performance metric. The review also shows that the most used validation method is the train/test split method. Therefore, the findings of this study can serve as valuable guidance for researchers in enhancing the efficacy of their models in future investigations. Besides, algorithm development, policy rule regulation, community management, and legal and ethical consideration are other real-world applications that can be reaped from this research.
  18. Hajim WI, Zainudin S, Mohd Daud K, Alheeti K
    PeerJ Comput Sci, 2024;10:e1903.
    PMID: 38660174 DOI: 10.7717/peerj-cs.1903
    Recent advancements in deep learning (DL) have played a crucial role in aiding experts to develop personalized healthcare services, particularly in drug response prediction (DRP) for cancer patients. The DL's techniques contribution to this field is significant, and they have proven indispensable in the medical field. This review aims to analyze the diverse effectiveness of various DL models in making these predictions, drawing on research published from 2017 to 2023. We utilized the VOS-Viewer 1.6.18 software to create a word cloud from the titles and abstracts of the selected studies. This study offers insights into the focus areas within DL models used for drug response. The word cloud revealed a strong link between certain keywords and grouped themes, highlighting terms such as deep learning, machine learning, precision medicine, precision oncology, drug response prediction, and personalized medicine. In order to achieve an advance in DRP using DL, the researchers need to work on enhancing the models' generalizability and interoperability. It is also crucial to develop models that not only accurately represent various architectures but also simplify these architectures, balancing the complexity with the predictive capabilities. In the future, researchers should try to combine methods that make DL models easier to understand; this will make DRP reviews more open and help doctors trust the decisions made by DL models in cancer DRP.
  19. Humayun MA, Shuja J, Abas PE
    PeerJ Comput Sci, 2024;10:e1984.
    PMID: 38660189 DOI: 10.7717/peerj-cs.1984
    Social background profiling of speakers is heavily used in areas, such as, speech forensics, and tuning speech recognition for accuracy improvement. This article provides a survey of recent research in speaker background profiling in terms of accent classification and analyses the datasets, speech features, and classification models used for the classification tasks. The aim is to provide a comprehensive overview of recent research related to speaker background profiling and to present a comparative analysis of the achieved performance measures. Comprehensive descriptions of the datasets, speech features, and classification models used in recent research for accent classification have been presented, with a comparative analysis made on the performance measures of the different methods. This analysis provides insights into the strengths and weaknesses of the different methods for accent classification. Subsequently, research gaps have been identified, which serve as a useful resource for researchers looking to advance the field.
  20. Abd Wahab NH, Hasikin K, Wee Lai K, Xia K, Bei L, Huang K, et al.
    PeerJ Comput Sci, 2024;10:e1943.
    PMID: 38686003 DOI: 10.7717/peerj-cs.1943
    BACKGROUND: Maintaining machines effectively continues to be a challenge for industrial organisations, which frequently employ reactive or premeditated methods. Recent research has begun to shift its attention towards the application of Predictive Maintenance (PdM) and Digital Twins (DT) principles in order to improve maintenance processes. PdM technologies have the capacity to significantly improve profitability, safety, and sustainability in various industries. Significantly, precise equipment estimation, enabled by robust supervised learning techniques, is critical to the efficacy of PdM in conjunction with DT development. This study underscores the application of PdM and DT, exploring its transformative potential across domains demanding real-time monitoring. Specifically, it delves into emerging fields in healthcare, utilities (smart water management), and agriculture (smart farm), aligning with the latest research frontiers in these areas.

    METHODOLOGY: Employing the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) criteria, this study highlights diverse modeling techniques shaping asset lifetime evaluation within the PdM context from 34 scholarly articles.

    RESULTS: The study revealed four important findings: various PdM and DT modelling techniques, their diverse approaches, predictive outcomes, and implementation of maintenance management. These findings align with the ongoing exploration of emerging applications in healthcare, utilities (smart water management), and agriculture (smart farm). In addition, it sheds light on the critical functions of PdM and DT, emphasising their extraordinary ability to drive revolutionary change in dynamic industrial challenges. The results highlight these methodologies' flexibility and application across many industries, providing vital insights into their potential to revolutionise asset management and maintenance practice for real-time monitoring.

    CONCLUSIONS: Therefore, this systematic review provides a current and essential resource for academics, practitioners, and policymakers to refine PdM strategies and expand the applicability of DT in diverse industrial sectors.

Related Terms
Filters
Contact Us

Please provide feedback to Administrator ([email protected])

External Links