Machine Learning Approach for Early Diagnosis of Dyslexia Among Primary School Children: A Scoping Review and Model Development

Zaqi Kurniawan; Rizka Tiaharyadini

doi:10.24014/ijaidm.v7i2.30614

Machine Learning Approach for Early Diagnosis of Dyslexia Among Primary School Children: A Scoping Review and Model Development

Zaqi Kurniawan, Rizka Tiaharyadini

Abstract

Dyslexia, a prevalent learning disorder among primary school children, often goes undetected until later stages, hindering academic progress and socio-emotional development. Early diagnosis is crucial for effective intervention. Machine Learning (ML) offers promise in developing accurate diagnostic tools. However, there's a scarcity of comprehensive reviews focusing on ML approaches for dyslexia diagnosis in this demographic. In this scoping review, we consolidate existing literature and present the development of a novel ML model that was customized for early dyslexia diagnosis. Utilizing Decision Tree, K-Nearest Neighbors (KNN), Logistic Regression, Naive Bayes, and Random Forest. The comparative analysis of ML methods for dyslexia detection in elementary school children reveals distinct strengths. Decision Tree shows robust precision: 92.31% for dyslexia-prone, 90.62% for diagnosed dyslexia, and 86.67% for no dyslexia detected, with corresponding high recall values of 90.57%, 87.88%, and 100%, respectively. KNN excels with an overall accuracy of 94.00% and perfect precision for undetected dyslexia (100%), with high precision and recall for dyslexia-prone and diagnosed dyslexia. Logistic Regression highlights significant predictors and achieves precision of 95.38% for dyslexia-prone and 88.24% for diagnosed dyslexia, with recall rates of 93.34% and 90.91%, respectively. Naive Bayes exhibits outstanding precision for no dyslexia and dyslexia-prone categories (100%), with slightly lower precision for diagnosed dyslexia (82.5%), but perfect recall for undetected and diagnosed dyslexia. Random Forest demonstrates balanced performance with precision ranging from 91.18% to 94.23% and recall from 92.31% to 93.94%, achieving an overall accuracy of 93.00%. These results underscore ML's potential in enabling early dyslexia detection, facilitating timely interventions to improve outcomes for affected children and advancing dyslexia diagnosis.

Keywords

Dyslexia; Machine Learning; Early Diagnosis; Primary School Children; Scoping Review

Full Text:

PDF

References

L. Yang et al., “Prevalence of Developmental Dyslexia in Primary School Children: A Systematic Review and Meta-Analysis,” Brain Sciences, vol. 12, no. 2, 2022, doi: 10.3390/brainsci12020240.

H. W. Catts and Y. Petscher, “Early Identification of Dyslexia : Current Advancements and Future Directions,” Perspectives on Language and Literacy, vol. 44, no. 3, pp. 33–36, 2018.

R. Hernández-Vásquez, U. C. García, A. M. B. Barreto, M. L. R. Rojas, J. Ponce-Meza, and M. Saavedra-López, “An Overview on Electrophysiological and Neuroimaging Findings in Dyslexia,” Iranian Journal of Psychiatry, vol. 18, no. 4, pp. 503–509, 2023, doi: 10.18502/ijps.v18i4.13638.

G. Fragagonzález, I. I. Karipidis, and J. Tijms, “Dyslexia as a neurodevelopmental disorder and what makes it different from a chess disorder,” Brain Sciences, vol. 8, no. 10, 2018, doi: 10.3390/brainsci8100189.

M. E. Aguilar-Vafaie, N. Safarpour, M. Khosrojavid, and G. A. Afruz, “A comparative study of rapid naming and working memory as predictors of word recognition and reading comprehension in relation to phonological awareness in Iranian dyslexic and normal children,” Procedia - Social and Behavioral Sciences, vol. 32, pp. 14–21, 2012, doi: 10.1016/j.sbspro.2012.01.003.

N. M. Raschle, M. Chang, and N. Gaab, “Structural brain alterations associated with dyslexia predate reading onset,” NeuroImage, vol. 57, no. 3, pp. 742–749, 2011, doi: 10.1016/j.neuroimage.2010.09.055.

D. Theodoridou, P. Christodoulides, V. Zakopoulou, and M. Syrrou, “Developmental dyslexia: Environment matters,” Brain Sciences, vol. 11, no. 6, 2021, doi: 10.3390/brainsci11060782.

N. Ahmad, M. B. Rehman, H. M. El Hassan, I. Ahmad, and M. Rashid, “An Efficient Machine Learning-Based Feature Optimization Model for the Detection of Dyslexia,” Computational Intelligence and Neuroscience, vol. 2022, 2022, doi: 10.1155/2022/8491753.

S. Mascheretti et al., “Neurogenetics of developmental dyslexia: From genes to behavior through brain neuroimaging and cognitive and sensorial mechanisms,” Translational Psychiatry, vol. 7, no. 1, 2017, doi: 10.1038/tp.2016.240.

S. Man Kit Lee, H. W. Liu, and S. X. Tong, “Identifying Chinese Children with Dyslexia Using Machine Learning with Character Dictation,” Scientific Studies of Reading, vol. 27, no. 1, pp. 82–100, 2023, doi: 10.1080/10888438.2022.2088373.

G. Wang, J. Zhao, M. Van Kleek, and N. Shadbolt, “Informing Age-Appropriate AI: Examining Principles and Practices of AI for Children,” Conference on Human Factors in Computing Systems - Proceedings, 2022, doi: 10.1145/3491102.3502057.

N. Mather and D. Schneider, “The Use of Cognitive Tests in the Assessment of Dyslexia,” Journal of Intelligence, vol. 11, no. 5, p. 79, 2023, doi: 10.3390/jintelligence11050079.

P. M. Paz-Alonso et al., “Neural correlates of phonological, orthographic and semantic reading processing in dyslexia,” NeuroImage: Clinical, vol. 20, pp. 433–447, 2018, doi: 10.1016/j.nicl.2018.08.018.

R. W. Cooksey, “Descriptive Statistics for Summarising Data,” Illustrating Statistical Procedures: Finding Meaning in Quantitative Data, pp. 61–139, 2020, doi: 10.1007/978-981-15-2537-7_5.

L. Franzen, Z. Stark, and A. P. Johnson, “Individuals with dyslexia use a different visual sampling strategy to read text,” Scientific Reports, vol. 11, no. 1, 2021, doi: 10.1038/s41598-021-84945-9.

M. Ramezani and A. J. Fawcett, “Cognitive-Motor Training Improves Reading-Related Executive Functions: A Randomized Clinical Trial Study in Dyslexia,” Brain Sciences, vol. 14, no. 2, p. 127, 2024, doi: 10.3390/brainsci14020127.

S. Itani, M. Rossignol, F. Lecron, and P. Fortemps, “Towards interpretable machine learning models for diagnosis aid: A case study on attention deficit/hyperactivity disorder,” PLoS ONE, vol. 14, no. 4, 2019, doi: 10.1371/journal.pone.0215720.

R. K. Wagner, J. Moxley, C. Schatschneider, and F. A. Zirps, “A Bayesian Probabilistic Framework for Identification of Individuals with Dyslexia,” Scientific Studies of Reading, vol. 27, no. 1, pp. 67–81, 2023, doi: 10.1080/10888438.2022.2118057.

A. Paul, D. P. Mukherjee, P. Das, A. Gangopadhyay, A. R. Chintha, and S. Kundu, “Improved Random Forest for Classification,” IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 4012–4024, 2018, doi: 10.1109/TIP.2018.2834830.

D. & A. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation,” J. Mach. Learn. Technol, vol. 2, 2011, doi: 10.9735/2229-3981.

A. J. Bowers and X. Zhou, “Receiver Operating Characteristic (ROC) Area Under the Curve (AUC): A Diagnostic Measure for Evaluating the Accuracy of Predictors of Education Outcomes,” Journal of Education for Students Placed at Risk, vol. 24, no. 1, pp. 20–46, 2019, doi: 10.1080/10824669.2018.1523734.

J. Kaliappan, A. R. Bagepalli, S. Almal, R. Mishra, Y. C. Hu, and K. Srinivasan, “Impact of Cross-Validation on Machine Learning Models for Early Detection of Intrauterine Fetal Demise,” Diagnostics, vol. 13, no. 10, 2023, doi: 10.3390/diagnostics13101692.

S. Y. Ho, K. Phua, L. Wong, and W. W. Bin Goh, “Extensions of the External Validation for Checking Learned Model Interpretability and Generalizability,” Patterns, vol. 1, no. 8, 2020, doi: 10.1016/j.patter.2020.100129.

A. Amro, M. Al-Akhras, K. El Hindi, M. Habib, and B. A. Shawar, “Instance Reduction for Avoiding Overfitting in Decision Trees,” Journal of Intelligent Systems, vol. 30, no. 1, pp. 438–459, 2021, doi: 10.1515/jisys-2020-0061.

G. S. K. Ranjan, A. Kumar Verma, and S. Radhika, “K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries,” 2019 IEEE 5th International Conference for Convergence in Technology, I2CT 2019, 2019, doi: 10.1109/I2CT45611.2019.9033691.

F. Salehi, E. Abbasi, and B. Hassibi, “The impact of regularization on high-dimensional logistic regression,” Advances in Neural Information Processing Systems, vol. 32, 2019.

R. Blanquero, E. Carrizosa, P. Ramírez-Cobo, and M. R. Sillero-Denamiel, “Constrained Naïve Bayes with application to unbalanced data classification,” Central European Journal of Operations Research, vol. 30, no. 4, pp. 1403–1425, 2022, doi: 10.1007/s10100-021-00782-1.

V. S, “Predicting Dyslexia with Machine Learning: A Comprehensive Review of Feature Selection, Algorithms, and Evaluation Metrics,” Journal of Behavioral Data Science, vol. 3, no. 1, pp. 1–14, 2023, doi: 10.35566/jbds/v3n1/s.

F. J. Yang, “An extended idea about decision trees,” Proceedings - 6th Annual Conference on Computational Science and Computational Intelligence, CSCI 2019, pp. 349–354, 2019, doi: 10.1109/CSCI49370.2019.00068.

M. Mailagaha Kumbure and P. Luukka, “A generalized fuzzy k-nearest neighbor regression model based on Minkowski distance,” Granular Computing, vol. 7, no. 3, pp. 657–671, 2022, doi: 10.1007/s41066-021-00288-w.

M. Maalouf, “Logistic regression in data analysis: An overview,” International Journal of Data Analysis Techniques and Strategies, vol. 3, no. 3, pp. 281–299, 2011, doi: 10.1504/IJDATS.2011.041335.

D. Swain et al., “Cardiovascular Disease Prediction using Various Machine Learning Algorithms,” Journal of Computer Science, vol. 18, no. 10, pp. 993–1004, 2022, doi: 10.3844/jcssp.2022.993.1004.

V. H. Kamble and M. P. Dale, “Machine learning approach for longitudinal face recognition of children,” in Machine Learning for Biometrics: Concepts, Algorithms and Applications, B. M. Partha Pratim Sarangi, Madhumita Panda, Subhashree Mishra, Bhabani Shankar Prasad Mishra, Ed., 2022, pp. 1–27. doi: 10.1016/B978-0-323-85209-8.00011-0.

Y. & W. Y. & Z. J. Liu, “New Machine Learning Algorithm: Random Forest,” pp. 246–252, 2012.

T.-H. Lee, A. Ullah, and R. Wang, “Bootstrap Aggregating and Random Forest,” 2020.

Z. Sun, G. Wang, P. Li, H. Wang, M. Zhang, and X. Liang, “An improved random forest based on the classification accuracy and correlation measurement of decision trees,” Expert Syst Appl, vol. 237, p. 121549, Mar. 2024, doi: 10.1016/j.eswa.2023.121549.

C. L. Koo, M. J. Liew, M. S. Mohamad, and A. H. Mohamed Salleh, “A Review for Detecting Gene-Gene Interactions Using Machine Learning Methods in Genetic Epidemiology,” Biomed Res Int, vol. 2013, pp. 1–13, 2013, doi: 10.1155/2013/432375.

A. F. Lubis et al., “Classification of Diabetes Mellitus Sufferers Eating Patterns Using K-Nearest Neighbors, Naïve Bayes and Decission Tree,” Public Research Journal of Engineering, Data Technology and Computer Science, vol. 2, no. 1, pp. 44–51, Apr. 2024, doi: 10.57152/predatecs.v2i1.1103.

DOI: http://dx.doi.org/10.24014/ijaidm.v7i2.30614

Refbacks

There are currently no refbacks.

Office and Secretariat:

Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau

Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Website: http://predatech.uin-suska.ac.id/ijaidm
Email: ijaidm@uin-suska.ac.id
e-Journal: http://ejournal.uin-suska.ac.id/index.php/ijaidm
Phone: 085275359942

Journal Indexing:

IJAIDM Stats