Comparison of Artificial Neural Network and Decision Tree to Identify and Predict Factors Associated with Type 2 Diabetes

Document Type : Original Article

Authors

1 Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran.

2 Student Research Committee, Department of Medical Informatics, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran

3 Department of medical informatics, Faculty of medicine, University of medical sciences, Mashhad, Iran

4 Medical Toxicology and Drug Abuse Research Center (MTDRC), Birjand University of Medical Sciences, Birjand, Iran

5 Medical Toxicology and Drug Abuse Research Center (MTDRC), Birjand University of Medical Sciences,

Abstract

Purpose: 
One of the goals of medical research is to determine the factors association of diseases in prognosis. One of the most common metabolic diseases in Iran is diabetes. The aim of this study was to identify the related factors that predict diabetes by using artificial neural network and decision tree algorithms. In this study we will compare the performance of these models.
Methods: 
In this study, 901 cases of people referred to health centers in Mashhad were used. Initially, data were analyzed using descriptive and analytical statistics. Then, 70% of the data were randomly selected for constructing artificial neural network and decision tree models and the remaining 30% were used to compare the performance of the models. Finally, the performance of the models was compared using the ROC curve.
Results:
Development of two predictive models was performed by using13 input (independent) variables and 1 output (dependent) variable. The two models were evaluated in terms of area under the ROC curve, sensitivity, specificity and accuracy. Area under ROC curve, sensitivity, specificity and accuracy for artificial neural network model were 69.1, 74.2, 56.03 and 61.3. For CART algorithm of decision tree the under ROC curve, sensitivity, specificity and accuracy were obtained as 68.9, 64.77, 63.47 and 65.3 respectively. In all modes, family history of diabetes, triglycerides, body mass index, low density lipoprotein, and systolic and diastolic blood pressure were the most important factors associated with type 2 diabetes.
Conclusion:
The results showed that the perceptron multi-layer neural network model had a better result than the CART decision tree in term of area under the ROC curve for prediction of diabetes type 2. Also, low density lipoprotein was identified as the most important related factor of type 2 diabetes. The study suggests that modern data mining techniques such as artificial neural network and decision trees can be used to identify associated disease factors.

Keywords

Main Subjects


  1. Hossain P, Kawar B, El Nahas M. Obesity and diabetes in the developing world--a growing challenge. The New England journal of medicine 2007; 356(3): 213-5.
  2. Booth GL, Kapral MK, Fung K, Tu JV. Relation between age and cardiovascular disease in men andwomen with diabetes compared with non-diabetic people: a population-based retrospective cohort study. Lancet (London, England) 2006; 368(9529): 29-36.
  3. Yach D, Hawkes C, Gould CL, Hofman KJ. The global burden of chronic diseases: overcoming impedimentsto prevention and control. Jama 2004; 291(21): 2616-22.
  4. Guariguata L, Whiting DR, Hambleton I, Beagley J, et.al. Global estimates of diabetes prevalence for 2013 and projections for 2035. Diabetes research and clinical practice 2014; 103(2): 137- 49.
  5. Sedehi M, Mehrabi Y, Kazemnejad A, F. H. Comparison of Artificial Neural Network, Logistic Regression and Discriminant Analysis Methods in Prediction of Metabolic Syndrome. IJEM. 2009; 11(6): 638-46.
  6. Tapak L, Mahjub H, Hamidi O, PoorolajalJ. Real-data comparison of data mining methods in prediction of diabetes in iran. Healthcare informatics research 2013; 19(3): 177-85.
  7. Whiting DR, Guariguata L, Weil C, Shaw J. IDF diabetes atlas: global estimates of the prevalence of diabetes for 2011and 2030. Diabetes research and clinical practice 2011; 94(3): 311-21.
  8. Jayalakshmi T, Santhakumaran A, et al. A Novel Classification Method for Diagnosis of Diabetes Mellitus Using Artificial Neural Networks. 2010 International Conference on Data Storage and Data Engineering; 2010 9-10 Feb. 2010.
  9. Choi SB, Kim WJ, Yoo TK, Park JS, et al. Screening for Prediabetes Using Machine Learning Models. Computational and Mathematical Methods in Medicine 2014; 2014:8.
  10. Franco L, Jerez JM, Alba E, et al. Artificial neural networks   and   prognosis   in   medicine.  Survival anaalysis in breast cancer patients. 13th European Symposium on Artificial Neural Networks; 2005; Bruges, Belgium.
  11. Jerez-Aragones JM, Gomez-Ruiz JA, Ramos-Jimenez G, Munoz-Perez J, et al. A combined neural network and decision trees model for prognosis of breast cancer relapse. Artificial intelligence in medicine 2003; 27(1): 45-63.
  12. Bourd, Bonnevay S, et al. Comparison of Artificial NeuralNetwork with Logistic Regression as Classification Models for Variable Selection for Prediction of Breast Cancer Patient Outcomes. Advances in Artificial Neural Systems 2010; 2010: 11.
  13. Chou S-M, Lee T-S, Shao YE, Chen IF. Mining the breast cancer pattern using artificial neural networks and multivariate adaptive regression splines. Expert Systems with Applications 2004; 27(1): 133-42.
  14. Endo A, Shibata T, Tanaka H. Comparison of Seven Algorithms to Predict Breast Cancer Survival(Contribution to 21 Century Intelligent Technologies and Bioinformatics). International Journal of Biomedical Soft Computing and Human Sciences: the official journal of the Biomedical Fuzzy Systems Association 2008; 13(2): 11-6.
  15. Haykin SS. Neural Networks and Learning Machines: Prentice Hall; 2009.
  16. Meng X-H, Huang Y-X, Rao D-P, Zhang Q, Liu Q. Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. The Kaohsiung Journal of Medical Sciences 2013; 29(2): 93-9.
  17. Ramezankhani A, Pournik O, Shahrabi J, Khalili D, et al. Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study. Diabetes research and clinical practice 2014; 105(3): 391-8.
  18. Heikes KE, Eddy DM, Arondekar B, Schlessinger L. Diabetes Risk Calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes. Diabetes care 2008; 31(5): 1040-5.
  19. Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters 2006; 8: 861-874.
  20. Tuomilehto J, Lindstrom J, Eriksson JG, Valle TT, et al. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. The New England journal of medicine 2001; 344(18): 1343-50.
  21. Ameri H, Alizadeh S, Barzegari A. Knowledge Extraction of Diabetics' Data by Decision Tree Method. Journal of Health Administration 2013; 16(53): 58-72.
  22. Ramezankhani A, Hadavandi E, Pournik O, Shahrabi J, et al. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study. BMJ open 2016; 6(12): e013336.
  23. Wang C, Li L, Wang L, Ping Z, et al. Evaluating the risk of type 2 diabetes mellitus using artificial neural network: an effective classification approach. Diabetes research and clinical practice 2013; 100(1): 111-8.
  24. Glumer C, Carstensen B, Sandbaek A, Lauritzen T, et al. A Danish diabetes risk score for targeted screening: the Inter99 study. Diabetes care 2004; 27(3): 727-33.
  25. Hu D, Sun L, Fu P, Xie J, et al. Prevalence and risk factors for type 2 diabetes mellitus in the Chinese adult population: the InterASIA Study. Diabetes research and clinical practice 2009; 84(3): 288-95.
  26. Ho WH, Lee KT, Chen HY, Ho TW, et al. Disease-free survival after hepatic resection in hepatocellular carcinoma patients: a prediction approach using artificial neural network. PloS one 2012; 7(1): e29179.
  27. Walker HK, Hall WD, Hurst JW, et al. Clinical Methods: The History, Physical, and Laboratory Examinations. Boston: Butterworths. Butterworth Publishers, a division of Reed Publishing; 1990.
  28. Wang CJ, Li YQ, Wang L, Li LL, et al. Development and evaluation of a simple and effective prediction approach for identifying those at high risk of dyslipidemia in rural adult residents. PloS one 2012; 7(8): e43834.
  29. Barakat NH, Bradley AP, Barakat MN. Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE transactions on information technology in biomedicine : a publication of the IEEE Engineering in Medicine and Biology Society 2010; 14(4): 1114-20.
  30. UbeyliED. Modified mixture of experts for diabetes diagnosis. J Med Syst 2009; 33(4): 299-305.
  31. Kazemnejad A, Batvandi Z, Faradmal J. Comparison of artificial neural network and binary logistic regression for determination of impaired glucose tolerance/diabetes. Eastern Mediterranean health journal = La revue de sante de la Mediterranee orientale = al-Majallah al-sihhiyah li-sharq al-mutawassit 2010; 16(6): 615-20.
  32. Kang JO, Chung S-H, Suh Y-M. Prediction of Hospital Charges for the Cancer Patients with Data Mining Techniques. J Korean Soc Med Inform 2009; 15(1): 13-23.
  33. Lee S-M, Kang J-O, Suh Y-M. Comparison of Hospital Charge Prediction Models for Colorectal Cancer Patients: Neural Network vs. Decision Tree Models. Journal of Korean Medical Science 2006; 19(5): 677-81.
  34. Wang J, Li M, Hu YT, Zhu Y. Comparison of hospital charge prediction models for gastric cancer patients: neural network vs. decision tree models. BMC health services research 2009; 9: 161.
  35. Kurt I, Ture M, Kurum AT. Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Systems with Applications 2008; 34(1): 366-74.