ABSTRACT
The interest of analysts in applying analytical algorithms to investigate students’ performance data with a view to enhancing their knowledge is growing among data miners. The key factor of this trend is ample multimedia data generated by educational institutions with the usage of technologies, tools of e-learning, and other digital platforms for conducting online courses. Educators could utilize these data to examine and understand students’ learning behaviors by using data mining techniques to forecast student achievement, among other things. In data mining models, the difficult task is to select effective strategies that will generate satisfactory forecast accuracy. To achieve such goals, this article uses six hybrid feature selection (FS) algorithms such as hybrid PSO, hybrid GA, filter wrap, wrapper embedded, ensemble based, and filter embedded and five algorithms such as “Random Forest (RF)”, “Decision Tree (DT)”, “Logistic Regression (LR)”, “K-Nearest Neighbor (KNN)”, and “Support Vector Machine” (SVM) to forecast the performance of students and to make comparison of performance of five classification processes in the perspective of six FS models. Based on the analysis of the given data set, it is concluded that the performance of classification models improved when 20 important features were part of the data in place of all 30 attributes. Further reduction in the number of attributes did not result in further improvement in the performance of classification models in predicting students’ success or failure.
REFERENCES
Ahonen, L., Cowley, B. U., Hellas, A., & Puolamaki, K. (2018). Biosignals reflect pair-dynamics in collaborative work: EDA and ECG study of pair-programming in a classroom environment. Scientific Reports, 8, 3138. https://doi.org/10.1038/s41598-018-21248-1
Alberto, R., Alfonso, G. B., Guillermo, H., Javier, P., & Pablo, C. (2021). Artificial neural network analysis of the academic performance of students in virtual learning environments. Neurocomputing, 423, 713–720. https://doi.org/10.1016/j.neucom.2020.08.105.
Alisha Sikri, N. P. Singh, & Surjeet Dalal (2023). Chi-square method of feature selection: Impact of pre-processing of data, International Journal of Intelligent Systems and Applications in Engineering, 11(3s): 241-248.
Aluko, R. O., Daniel, E. I., Oshodi, O. S., Aigbavboa, C. O., & Abisuga, A. O. (2018). Towards reliable prediction of academic performance of architecture students using data mining techniques. Journal of Engineering Design and Technology, 16(3), 385–397. https://doi.org/10.1108/ JEDT-12-2017-0117.
Antoniou, P. E., Arfaras, G., Pandria, N., Athanasiou, A., Ntakakis, G., Babatsikos, E., & Bamidis, P. (2020). Biosensor real-time affective analytics in virtual and mixed reality medical education serious games: Cohort study. JMIR Serious Games, 8(1), e17823. https://doi. org/10.2196/17823.
Asif, R., Merceron, A., Ali, S. A., & Haider, N. (2017). Analyzing undergraduate students’ performance using educational data mining. Computers & Education, 113, 177–194. https://doi. org/10.1016/j.compedu.2017.05.005.
Aulck, L., Velagapudi, N., Blumenstock, J., & West, J. (2016). Predicting student dropout in higher education. In Proceedings of the ICML Workshop on #Data4Good: Machine Learning in Social Good Applications, New York, NY, USA, 24 June 2016. https://arxiv.org/pdf/1606.06364.pdf
Awoyemi, J. O., Adetunmbi, A. O., & Oluwadare, S. A. (2017). Credit card fraud detection using machine learning techniques: A comparative analysis. In Proceedings of the International Conference on Computing Networking and Informatics (ICCNI), Lagos, Nigeria, 29–31 October 2017.
Azizah, E. N., Pujianto, U., & Nugraha, E. (2018). Comparative performance between C4.5 and Naive Bayes classifiers in predicting student academic performance in a virtual learning environment. In Proceedings of the 4th International Conference on Education and Technology (ICET), Malang, Indonesia, 26–28 October 2018 (pp. 18–22).
Baker, N. K., D’Mello, S., Ocumpaugh, J., Baker, R., & Shute, V. (2016). Using video to automatically detect learner affect in computer-enabled classrooms. ACM Transactions on Interactive Intelligent Systems, 6(2), 1–26. https://doi.org/10.1145/2908190
Bohong, Y.; Zeping, Y.; Hong, L.; Yaqian, Z.; Jinkai, X. In-classroom learning analytics based on student behavior, topic and teaching characteristic mining. Pattern Recognit. Lett. 2020, 129, 224–231.
Byung-Hak, K., Ethan, V., & Ganapathi, V. (2018). GritNeikikik8t: Student performance prediction with deep learning. arXiv. https://arxiv.org/abs/1804.07405.
Cetintas, S., Si, L., Xin, P., & Hord, C. (2010). Automatic detection of off-task behaviors in intelligent tutoring systems with machine learning techniques. IEEE Transactions on Learning Technologies, 3(3), 228–236. https://doi.org/10.1109/TLT.2010.29.
Chadrashekar, G., & Sahni, F. (2014). A survey on feature selection methods. Computers and Electrical Engineering, 40(1), 16-28. DOI: 10.1016/j.compeleng.2013.11.04.
Dahiya, Shashi; S. S. Handa, and N P Singh (2017). A feature selection enabled hybrid bagging algorithm for credit risk evaluation, Expert Systems, 34 (6): 1-11
Dash,M. & Liu, H. (1997). Feature selection for classification, Intelligent Data Analysis, 1(1), 131-156. DOI: 10.1016/s1088-467X(97)00008-5.
Daud, A., Aljohani, N. R., Abbasi, R. A., Lytras, M. D., Abbas, F., & Alowibdi, J. S. (2017). Predicting student performance using advanced learning analytics. In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017. https:// doi.org/10.1145/3038912.3052588.
Farzaneh, A., & Fadlalla, A. (2017). Data mining applications in accounting: A review of the literature and organizing framework. International Journal of Accounting Information Systems, 24, 32–58. https://doi.org/10.1016/j.accinf.2017.02.002
Fernandes, E. Holanda, M., Victorino, M., Borges, V., Carvalho, R., Van Erven, G. (2019). Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil, Journal of Business Research, 94, 335-343.
Francis, B. K., & Babu, S. S. (2019). Predicting academic performance of students using a hybrid data mining approach. Journal of Medical Systems, 43(6), 162. https://doi.org/10.1007/ s10916-019-1350-7.
Garg, R. (2018). Predicting student performance of different regions of Punjab using classification techniques. International Journal of Advanced Research in Computer Science, 9(5), 236–241. https://doi.org/10.26483/ijarcs.v9i5.6031.
Ghosh, I., & Chaudhuri, D.T. (2021). FEB-stacking and FEB-DNN models for stock trend prediction: A performance analysis for pre and post COVID-19 periods. Decision Making: Applications in Management and Engineering, 4, 51–84. https://doi.org/10.5267/j.dmae.2021.2.004
Goldberg, P., Sümer, O., & Stürmer, K. (2021). Attentive or not? Toward a machine learning approach to assessing students’ visible engagement in classroom instruction. Educational Psychology, 33(1), 27–49. https://doi.org/10.1080/01443410.2021.1897027
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection, Journal of Machine Learning Research, 3.1157-1182.
Haiyang, L., Wang, Z., Benachour, P., & Tubman, P. (2018). A time series classification method for behaviour-based dropout prediction. In Proceedings of the IEEE 18th International Conference on Advanced Learning Technologies (ICALT), Mumbai, India, 9–13 July 2018. https:// doi.org/10.1109/ICALT.2018.00046.
Hasan, R., Palaniappan, S., Mahmood, S., Abbas, A., Sarker, K. U., & Sattar, M. U. (2020). Predicting student performance in higher educational institutions using video learning analytics and data mining techniques. Applied Sciences, 10(11), 3894. https://doi.org/10.3390/ app10113894
Role of Hybrid Feature Selection Algorithms in Foreseeing Student Performance
Helal, S., Jiuyong, L., Lin, L., Esmaeil, E., Shane, D., Duncan, M., & Qi, L. (2018). Predicting academic performance by considering student heterogeneity. Knowledge-Based Systems, 161, 134–146. https://doi.org/10.1016/j.knosys.2018.07.028.
Heuer, H., & Breiter, A. (2018). Student success prediction and the trade-off between big data and data minimization. In DeLFI 2018—Die 16. E-Learning Fachtagung Informatik (pp. 219– 230). Gesellschaft für Informatik, Bonn.
Hlosta, M., Zdrahal, Z., & Zendulka, J. (2017). Ouroboros: Early identification of at-risk students without models based on legacy data. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference (pp. 6–15), New York, NY, USA, 13–17 March 2017. Association for Computing Machinery. https://doi.org/10.1145/3027385.3027408.
Hussain, M., Zhu, W., Zhang, W., & Abidi, S. M. R. (2018). Student engagement predictions in an e-learning system and their impact on student course assessment scores. Computational Intelligence and Neuroscience, 2018, 6347186. https://doi.org/10.1155/2018/6347186.
Hussain, S., Dahan, N. A., Ba-Alwib, F., & Najoua, R. (2018). Educational data mining and analysis of students’ academic performance using WEKA. Indonesian Journal of Electrical Engineering and Computer Science, 9(2), 447–459. https://doi.org/10.11591/ijeecs.v9.i2.447-459.
Jackson, L. (2013). Get the 411: Laptops and tablets in the classroom, January 04. Education World. Retrieved on February 02, 2025 from https://www.educationworld.com/a_tech/ tech/tech194.shtml.
Kemper, L., Vorhoff, G., & Wigger, B. U. (2020). Predicting student dropout: A machine learning approach. European Journal of Higher Education, 10(1), 28–47. https://doi.org/10.1080/215 68235.2019.1613427.
Koh, H., & Tan, G. (2005). Data mining applications in healthcare. Journal of Healthcare Information Management, 19(2), 64–72. https://pubmed.ncbi.nlm.nih.gov/
Lan, K., Wang, D., & Fong, S. (2018). A survey of data mining and deep learning in bioinformatics. Journal of Medical Systems, 42(8), 139. https://doi.org/10.1007/s10916-018-0987-4
Li, F., Zhang, Y., Chen, M., & Gao, K. (2019). Which factors have the greatest impact on student’s performance. Journal of Physics: Conference Series, 1288, 012077. https://doi. org/10.1088/1742-6596/1288/1/012077.
Luhaybi, M. A., Tucker, A., & Yousefi, L. (2018). The prediction of student failure using classification methods: A case study. Computer Science and Information Technology, 2018, 79–90. https://doi.org/10.5121/csit.2018.80906.
Magsi, H., Sodhro, A. H., Al-Rakhami, M. S., Zahid, N., Pirbhulal, S., & Wang, L. (2021). A novel adaptive battery-aware algorithm for data transmission in IoT-based healthcare applications. Electronics, 10(3), 367. https://doi.org/10.3390/electronics10030367
Marte, J. (2014). Here’s How Much Your High School Grades Predict Your Future Salary, May 20. Retrieved on January 21, 2025 from https://www.washingtonpost.com/ news/wonk/wp/2014/05/20/hereshow-much-your-high-school-grades-predict-how-much-you-make-today/.
Nagahi, M., Jaradat, R., Nagahisarchoghaei, M., Ghanbari, G., Poudyal, S., & Goerger, S. (2020). Effect of individual differences in predicting engineering students’ performance: A case of education for sustainable development. In Proceedings of the International Conference on Decision Aid Sciences and Applications (DASA), Online, 8–9 November 2020.
Nagahisarchoghaei, M., Dodd, J., Nagahi, M., Ghanbari, G., & Poudyal, S. (2020). Analysis of a warranty-based quality management system in the construction industry. In Proceedings of the International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), Online, 26–27 October 2020.
Okubo, F., Yamashita, T., Shimada, A., & Ogata, H. (2017). A neural network approach for students’ performance prediction. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference (LAK ‘17), Vancouver, BC, Canada, 13–17 March 2017 (pp. 274– 283). Association for Computing Machinery. https://doi.org/10.1145/3027385.3027401
Rizvi, S., Rienties, B., & Khoja, S. A. (2019). The role of demographics in online learning: A decision tree-based approach. Computers & Education, 137, 32–47. https://doi.org/10.1016/j. compedu.2019.04.010.
Saeys, Y., Inza, I., & Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(9), 2507-2517. DOI: 10.1093/bioinformatics/btm344.
Saeys, Y., Abeel, T., & Vande Peer, Y. (2008). Robust feature selection using ensemble feature selection techniques, Machine Learning and Knowledge Discovery in Databases (ECML.PKDD.2008), Lecture Notes in Computer Science, 5212, 313-325, Springer. DOI:10.1007/978-3-590-87481-2_21.
Shah, C., & Du, Q. (2021a). Spatial-aware collaboration-competition preserving graph embedding for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters, 19(1), 1–5. https://doi.org/10.1109/LGRS.2021.3081070.
Shah, C., & Du, Q. (2021b). Modified structure-aware collaborative representation for hyperspectral image classification. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Brussels, Belgium, 11–16 July 2021. https://doi.org/10.1109/ IGARSS47720.2021.9554932.
Shah, C., & Du, Q. (2021c). Collaborative and low-rank graph for discriminant analysis of hyperspectral imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 5248–5259. https://doi.org/10.1109/JSTARS.2021.3082013.
Shah, C., Du, Q., and Xu, Y. (2022). Enhanced TabNet: Attentive interpretable tabular learning for hyperspectral image classification. Remote Sensing, 14(3), 716. https://doi.org/10.3390/ rs14030716
Singh, N.P. & Singh, D. (2019). Impact of feature selection methods on the performance of Credit Risk Classification Algorithm. Proceeding of IEEE 13th International Conference Application of Information and Communication Technologies 23-25, October 2019 | Baku, Azerbaijan, pp 101-106.
Sodhro, A. H., & Zahid, N. (2021). AI-enabled framework for fog computing driven e-healthcare applications. Sensors, 21(21), 8039. https://doi.org/10.3390/s21248039
Vazan, P., Janikova, D., Tanuska, P., Kebisek, M., & Cervenanska, Z. (2017). Using data mining methods for manufacturing process control. IFAC-PapersOnLine, 50(1), 6178–6183. https:// doi.org/10.1016/j.ifacol.2017.08.111
Velazquez, R. (2023). Virtual reality in education: Benefits, uses, and examples, March 22. Retrieved on April 06, 2025 from https://soeonline.american.edu/blog/benefits-of-virtual-reality-in-education
Wang, W., Yu, H., & Miao, C. (2017). Deep model for dropout prediction in MOOCs. In Proceedings of the 2nd International Conference on Crowd Science and Engineering (ICCSE’17), Beijing, China, 6–9 July 2017.
Wasif, M., Waheed, H., Aljohani, N. R., & Hassan, S. U. (2019). Understanding student learning behavior and predicting their performance. In Cognitive Computing in Technology-Enhanced Learning (pp. 1–28). IGI Global. https://doi.org/10.4018/978-1-5225-7191-7.ch001.
Whitehill, J., Mohan, K., Seaton, D., Rosen, Y., & Tingley, D. (2017). Delving deeper into MOOC student dropout prediction. arXiv. https://arxiv.org/abs/1702.06404.
Winstead, S. (2025). Using tablets in School: How to implement 1:1 technology using tablets in the classroom, April 06. My eLearning World. Retrieved on April 25, 2025, from https:// myelearningworld.com/10-benefits-of-tablets-in-the-classroom/.
Xing, W., & Dongping, D. (2019). Dropout prediction in MOOCs: Using deep learning for personalized intervention. Journal of Educational Computing Research, 57(3), 547–570. https:// doi.org/10.1177/0735633119831374
Xue, B. M., Browne, W.N., & Yao, X. (2016). A survey on evolutionary computation approaches to feature selection; IEEE Transactions on Evolutionary Computation, 20 (4), 606-626.
Zaletelj, J., & Košir, A. (2017). Predicting students’ attention in the classroom from Kinect facial and body features. Journal of Image and Video Processing, 2017, 80. https://doi.org/10.1186/ s13640-017-0252-9.
Zauzmer, J. (2020). More students are learning on laptops and tablets in class. Some parents want to hit the off switch. The Washington Post. https://www.washingtonpost.com/local/ education/more-students-are-learning-on-laptops-and-tablets-in-class-some-parentswant-to-hit-the-off-switch/2020/02/01/d53134d0-db1e-11e9-a688-303693fb4b0b_story. html