MyMedR

Displaying all 10 publications

Abstract:

Sort:

Fulltext Identification of significant climatic risk factors and machine learning models in dengue outbreak prediction

Yavari Nejad F, Varathan KD

BMC Med Inform Decis Mak, 2021 04 30;21(1):141.
PMID: 33931058 DOI: 10.1186/s12911-021-01493-y

BACKGROUND: Dengue fever is a widespread viral disease and one of the world's major pandemic vector-borne infections, causing serious hazard to humanity. The World Health Organisation (WHO) reported that the incidence of dengue fever has increased dramatically across the world in recent decades. WHO currently estimates an annual incidence of 50-100 million dengue infections worldwide. To date, no tested vaccine or treatment is available to stop or prevent dengue fever. Thus, the importance of predicting dengue outbreaks is significant. The current issue that should be addressed in dengue outbreak prediction is accuracy. A limited number of studies have conducted an in-depth analysis of climate factors in dengue outbreak prediction.
METHODS: The most important climatic factors that contribute to dengue outbreaks were identified in the current work. Correlation analyses were performed in order to determine these factors and these factors were used as input parameters for machine learning models. Top five machine learning classification models (Bayes network (BN) models, support vector machine (SVM), RBF tree, decision table and naive Bayes) were chosen based on past research. The models were then tested and evaluated on the basis of 4-year data (January 2010 to December 2013) collected in Malaysia.
RESULTS: This research has two major contributions. A new risk factor, called the TempeRain factor (TRF), was identified and used as an input parameter for the model of dengue outbreak prediction. Moreover, TRF was applied to demonstrate its strong impact on dengue outbreaks. Experimental results showed that the Bayes Network model with the new meteorological risk factor identified in this study increased accuracy to 92.35% for predicting dengue outbreaks.
CONCLUSIONS: This research explored the factors used in dengue outbreak prediction systems. The major contribution of this study is identifying new significant factors that contribute to dengue outbreak prediction. From the evaluation result, we obtained a significant improvement in the accuracy of a machine learning model for dengue outbreak prediction.
Fulltext Predicting judging-perceiving of Myers-Briggs Type Indicator (MBTI) in online social forum

Choong EJ, Varathan KD

PeerJ, 2021;9:e11382.
PMID: 34221705 DOI: 10.7717/peerj.11382

The Myers-Briggs Type Indicator (MBTI) is a well-known personality test that assigns a personality type to a user by using four traits dichotomies. For many years, people have used MBTI as an instrument to develop self-awareness and to guide their personal decisions. Previous researches have good successes in predicting Extraversion-Introversion (E/I), Sensing-Intuition (S/N) and Thinking-Feeling (T/F) dichotomies from textual data but struggled to do so with Judging-Perceiving (J/P) dichotomy. J/P dichotomy in MBTI is a non-separable part of MBTI that have significant inference on human behavior, perception and decision towards their surroundings. It is an assessment on how someone interacts with the world when making decision. This research was set out to evaluate the performance of the individual features and classifiers for J/P dichotomy in personality computing. At the end, data leakage was found in dataset originating from the Personality Forum Café, which was used in recent researches. The results obtained from the previous research on this dataset were suggested to be overly optimistic. Using the same settings, this research managed to outperform previous researches. Five machine learning algorithms were compared, and LightGBM model was recommended for the task of predicting J/P dichotomy in MBTI personality computing.
Fulltext Feature extraction from customer reviews using enhanced rules

Santhiran R, Varathan KD, Chiam YK

PeerJ Comput Sci, 2024;10:e1821.
PMID: 38435547 DOI: 10.7717/peerj-cs.1821

Opinion mining is gaining significant research interest, as it directly and indirectly provides a better avenue for understanding customers, their sentiments toward a service or product, and their purchasing decisions. However, extracting every opinion feature from unstructured customer review documents is challenging, especially since these reviews are often written in native languages and contain grammatical and spelling errors. Moreover, existing pattern rules frequently exclude features and opinion words that are not strictly nouns or adjectives. Thus, selecting suitable features when analyzing customer reviews is the key to uncovering their actual expectations. This study aims to enhance the performance of explicit feature extraction from product review documents. To achieve this, an approach that employs sequential pattern rules is proposed to identify and extract features with associated opinions. The improved pattern rules total 41, including 16 new rules introduced in this study and 25 existing pattern rules from previous research. An average calculated from the testing results of five datasets showed that the incorporation of this study's 16 new rules significantly improved feature extraction precision by 6%, recall by 6% and F-measure value by 5% compared to the contemporary approach. The new set of rules has proven to be effective in extracting features that were previously overlooked, thus achieving its objective of addressing gaps in existing rules. Therefore, this study has successfully enhanced feature extraction results, yielding an average precision of 0.91, an average recall value of 0.88, and an average F-measure of 0.89.
Fulltext Predicting Return to Work after Cardiac Rehabilitation using Machine Learning Models

Yuan CJ, Varathan KD, Suhaimi A, Ling LW

J Rehabil Med, 2023 Jan 09;55:jrm00348.
PMID: 36306152 DOI: 10.2340/jrm.v54.2432

OBJECTIVE: To explore machine learning models for predicting return to work after cardiac rehabilitation.
SUBJECTS: Patients who were admitted to the University of Malaya Medical Centre due to cardiac events.
METHODS: Eight different machine learning models were evaluated. The models included 3 different sets of features: full features; significant features from multiple logistic regression; and features selected from recursive feature extraction technique. The performance of the prediction models with each set of features was compared.
RESULTS: The AdaBoost model with the top 20 features obtained the highest performance score of 92.4% (area under the curve; AUC) compared with other prediction models.
CONCLUSION: The findings showed the potential of using machine learning models to predict return to work after cardiac rehabilitation.
Fulltext Using online social networks to track a pandemic: A systematic review

Al-Garadi MA, Khan MS, Varathan KD, Mujtaba G, Al-Kabsi AM

J Biomed Inform, 2016 08;62:1-11.
PMID: 27224846 DOI: 10.1016/j.jbi.2016.05.005

BACKGROUND: The popularity and proliferation of online social networks (OSNs) have created massive social interaction among users that generate an extensive amount of data. An OSN offers a unique opportunity for studying and understanding social interaction and communication among far larger populations now more than ever before. Recently, OSNs have received considerable attention as a possible tool to track a pandemic because they can provide an almost real-time surveillance system at a less costly rate than traditional surveillance systems.
METHODS: A systematic literature search for studies with the primary aim of using OSN to detect and track a pandemic was conducted. We conducted an electronic literature search for eligible English articles published between 2004 and 2015 using PUBMED, IEEExplore, ACM Digital Library, Google Scholar, and Web of Science. First, the articles were screened on the basis of titles and abstracts. Second, the full texts were reviewed. All included studies were subjected to quality assessment.
RESULT: OSNs have rich information that can be utilized to develop an almost real-time pandemic surveillance system. The outcomes of OSN surveillance systems have demonstrated high correlations with the findings of official surveillance systems. However, the limitation in using OSN to track pandemic is in collecting representative data with sufficient population coverage. This challenge is related to the characteristics of OSN data. The data are dynamic, large-sized, and unstructured, thus requiring advanced algorithms and computational linguistics.
CONCLUSIONS: OSN data contain significant information that can be used to track a pandemic. Different from traditional surveys and clinical reports, in which the data collection process is time consuming at costly rates, OSN data can be collected almost in real time at a cheaper cost. Additionally, the geographical and temporal information can provide exploratory analysis of spatiotemporal dynamics of infectious disease spread. However, on one hand, an OSN-based surveillance system requires comprehensive adoption, enhanced geographical identification system, and advanced algorithms and computational linguistics to eliminate its limitations and challenges. On the other hand, OSN is probably to never replace traditional surveillance, but it can offer complementary data that can work best when integrated with traditional data.
Fulltext A novel approach for heart disease prediction using strength scores with significant predictors

Yazdani A, Varathan KD, Chiam YK, Malik AW, Wan Ahmad WA

BMC Med Inform Decis Mak, 2021 06 21;21(1):194.
PMID: 34154576 DOI: 10.1186/s12911-021-01527-5

BACKGROUND: Cardiovascular disease is the leading cause of death in many countries. Physicians often diagnose cardiovascular disease based on current clinical tests and previous experience of diagnosing patients with similar symptoms. Patients who suffer from heart disease require quick diagnosis, early treatment and constant observations. To address their needs, many data mining approaches have been used in the past in diagnosing and predicting heart diseases. Previous research was also focused on identifying the significant contributing features to heart disease prediction, however, less importance was given to identifying the strength of these features.
METHOD: This paper is motivated by the gap in the literature, thus proposes an algorithm that measures the strength of the significant features that contribute to heart disease prediction. The study is aimed at predicting heart disease based on the scores of significant features using Weighted Associative Rule Mining.
RESULTS: A set of important feature scores and rules were identified in diagnosing heart disease and cardiologists were consulted to confirm the validity of these rules. The experiments performed on the UCI open dataset, widely used for heart disease research yielded the highest confidence score of 98% in predicting heart disease.
CONCLUSION: This study managed to provide a significant contribution in computing the strength scores with significant predictors in heart disease prediction. From the evaluation results, we obtained important rules and achieved highest confidence score by utilizing the computed strength scores of significant predictors on Weighted Associative Rule Mining in predicting heart disease.
Correction to: Feature selection and risk prediction for patients with coronary artery disease using data mining

Md Idris N, Chiam YK, Varathan KD, Wan Ahmad WA, Chee KH, Liew YM

Med Biol Eng Comput, 2022 Mar;60(3):887.
PMID: 35048276 DOI: 10.1007/s11517-022-02506-2
Feature selection and risk prediction for patients with coronary artery disease using data mining

Md Idris N, Chiam YK, Varathan KD, Wan Ahmad WA, Chee KH, Liew YM

Med Biol Eng Comput, 2020 Dec;58(12):3123-3140.
PMID: 33155096 DOI: 10.1007/s11517-020-02268-9

Coronary artery disease (CAD) is an important cause of mortality across the globe. Early risk prediction of CAD would be able to reduce the death rate by allowing early and targeted treatments. In healthcare, some studies applied data mining techniques and machine learning algorithms on the risk prediction of CAD using patient data collected by hospitals and medical centers. However, most of these studies used all the attributes in the datasets which might reduce the performance of prediction models due to data redundancy. The objective of this research is to identify significant features to build models for predicting the risk level of patients with CAD. In this research, significant features were selected using three methods (i.e., Chi-squared test, recursive feature elimination, and Embedded Decision Tree). Synthetic Minority Over-sampling Technique (SMOTE) oversampling technique was implemented to address the imbalanced dataset issue. The prediction models were built based on the identified significant features and eight machine learning algorithms, utilizing Acute Coronary Syndrome (ACS) datasets provided by National Cardiovascular Disease Database (NCVD) Malaysia. The prediction models were evaluated and compared using six performance evaluation metrics, and the top-performing models have achieved AUC more than 90%. Graphical abstract.
Patients' Technology Readiness and eHealth Literacy: Implications for Adoption and Deployment of eHealth in the COVID-19 Era and Beyond

Lee WL, Lim ZJ, Tang LY, Yahya NA, Varathan KD, Ludin SM

Comput Inform Nurs, 2021 Nov 02;40(4):244-250.
PMID: 34740221 DOI: 10.1097/CIN.0000000000000854

The COVID-19 pandemic has rerouted the healthcare ecosystem by accelerating digital health, and rapid adoption of eHealth is partly influenced by eHealth literacy (eHL). This study aims to examine patients' eHL in relation to their "technology readiness"-an innate attitude that is underexplored in clinical research. A total of 276 adult inpatients with hypertension, diabetes mellitus, and coronary heart disease were surveyed cross-sectionally in 2019 using self-reported questionnaires: eHealth Literacy Scale and Technology Readiness Index (2.0). The study found moderate eHL (mean, 27.38) and moderate technology readiness (mean, 3.03) among patients. The hierarchical regression model shows that lower eHL scores were associated with patients of minor ethnicity (Malaysian Chinese), with an unemployed status, and having >1 cardiovascular risk (β = -0.136 to -0.215, R2 = 0.283, Ps < .005). Technology readiness is a strong determinant of eHL (ΔR2 = 0.295, P < .001) with its subdomains (optimism, innovativeness, and discomfort) significantly influencing eHL (|β| = 0.28-0.40, Ps < .001), except for the insecurity subdomain. Deployment of eHealth interventions that incorporate assessment of patients' eHL and technology readiness will enable targeted strategies, especially in resource-limited settings hit hard by the pandemic crisis.
Fulltext Investigating transportation research based on social media analysis: a systematic mapping review

Zayet TMA, Ismail MA, Varathan KD, Noor RMD, Chua HN, Lee A, et al.

Scientometrics, 2021;126(8):6383-6421.
PMID: 34188335 DOI: 10.1007/s11192-021-04046-2

Social media is a pool of users' thoughts, opinions, surrounding environment, situation and others. This pool can be used as a real-time and feedback data source for many domains such as transportation. It can be used to get instant feedback from commuters; their opinions toward the transportation network and their complaints, in addition to the traffic situation, road conditions, events detection and many others. The problem is in how to utilize social media data to achieve one or more of these targets. A systematic review was conducted in the field of transportation-related research based on social media analysis (TRR-SMA) from the years between 2008 and 2018; 74 papers were identified from an initial set of 703 papers extracted from 4 digital libraries. This review will structure the field and give an overview based on the following grounds: activity, keywords, approaches, social media data and platforms and focus of the researches. It will show the trend in the research subjects by countries, in addition to the activity trends, platforms usage trend and others. Further analysis of the most employed approach (Lexicons) and data (text) will be also shown. Finally, challenges and future works are drawn and proposed.
SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11192-021-04046-2.

Filters

Please provide feedback to Administrator ([email protected])

External Links