Analysing Student Performance using Sparse Data of Core Bachelor Courses



Published Feb 24, 2015
Mirka Saarela Tommi Kärkkäinen


Curricula for Computer Science degrees are characterized by the strong occupational orientation of the discipline. In the BSc degree structure, with clearly separated CS core studies, learning skills between these and other required studies may vary a lot, showing in student’s overall performance. To analyze such a situation, we apply nonstandard educational data mining techniques on a preprocessed log file of the passed courses. The joint variation of the course grades are studied through correlation analysis while the intrinsic groups of students are created and analyzed using a special clustering technique. Since not all students have attended all the courses, there is a nonstructured sparsity pattern to cope with. Finally, multilayer perceptron neural network with cross-validation based generalization assurance is trained and analyzed using analytic mean sensitivity to explain the nonlinear regression model constructed. Local (within-methods) and global (between-methods) triangulation of different analysis methods is argued to improve the technical soundness of the presented approaches, giving more confidence on our final conclusion that general learning capabilities predict the success of students better than specific IT skills obtained as part of the core studies.

How to Cite

Saarela, M., & Kärkkäinen, T. (2015). Analysing Student Performance using Sparse Data of Core Bachelor Courses. JEDM | Journal of Educational Data Mining, 7(1), 3-32. Retrieved from
Abstract 219 | PDF Downloads 243


ALDAHDOOH, R. T. AND ASHOUR, W. 2013. Dimk-means distance-based initialization method for k-means clustering algorithm. International Journal of Intelligent Systems and Applications (IJISA) 5, 2, 41.

APOSTOL, T. M. 1969. Calculus, Volume 2: Multi-variable Calculus and Linear Algebra with Applications to Differential Equations and Probability. Wiley. A YRAM O , S. 2006. Knowledge Mining Using Robust Clustering. Jyvaskyl a Studies in Computing, vol. 63. University of Jyvaskyl a.

BAI, L., LIANG, J., AND DANG, C. 2011. An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowledge-Based Systems 24, 6, 785–795.

BAI, L., LIANG, J., DANG, C., AND CAO, F. 2012. A cluster centers initialization method for clustering categorical data. Expert Systems with Applications 39, 9, 8022–8029.

BAKER, R. ET AL. 2010. Data mining for education. International Encyclopedia of Education 7, 112– 118.

BAKER, R. S. AND YACEF, K. 2009. The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining 1, 1, 3–17.

BARTLETT, P. L. 1998. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. Information Theory, IEEE Transactions on 44, 2, 525–536.

BATISTA, G. AND MONARD, M. C. 2003. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence 17, 519–533.

BAYER, J., BYDZOVSKA´ , H., GERYK ´ , J., OBSIVAC ˇ , T., AND POPELINSKY` , L. 2012. Predicting dropout from social behaviour of students. In Proceedings of the 5th International Conference on Educational Data Mining-EDM 2012. 103–109.

BHARDWAJ, B. AND PAL, S. 2011. Mining educational data to analyze students’ performance. (IJCSIS) International Journal of Computer Science and Information Security, 9, 4.

BOUCHET, F., KINNEBREW, J. S., BISWAS, G., AND AZEVEDO, R. 2012. Identifying students’ characteristic learning behaviors in an intelligent tutoring system fostering self-regulated learning. In EDM. 65–72.

BRADLEY, P. S. AND FAYYAD, U. M. 1998. Refining initial points for k-means clustering. In ICML. Vol. 98. 91–99.

BRYMAN, A. 2003. Triangulation. The Sage encyclopedia of social science research methods. Thousand Oaks, CA: Sage.

CALDERS, T. AND PECHENIZKIY, M. 2012. Introduction to the special section on educational data mining. ACM SIGKDD Explorations Newsletter 13, 2, 3–6.

CAMPAGNI, R., MERLINI, D., AND SPRUGNOLI, R. 2012. Analyzing paths in a student database. In The 5th International Conference on Educational Data Mining. 208–209.

CARLSON, R., GENIN, K., RAU, M., AND SCHEINES, R. 2013. Student profiling from tutoring system log data: When do multiple graphical representations matter? In Proceedings of the 6th International Conference on Educational Data Mining.

CHANDRA, E. AND NANDHINI, K. 2010. Knowledge mining from student data. European Journal of Scientific Research 47, 1, 156–163.

CHEN, L., CHEN, L., JIANG, Q., WANG, B., AND SHI, L. 2009. An initialization method for clustering high-dimensional data. In Database Technology and Applications, 2009 First International Workshop on. IEEE, 444–447.

CROUX, C., DEHON, C., AND YADINE, A. 2010. The k-step spatial sign covariance matrix. Adv Data Anal Classif 4, 137–150.

DENZIN, N. 1970. Strategies of multiple triangulation. The research act in sociology: A theoretical introduction to sociological method, 297–313.

DIMOPOULOS, Y., BOURRET, P., AND LEK, S. 1995. Use of some sensitivity criteria for choosing networks with good generalization ability. Neural Processing Letters 2, 6, 1–4.

EMRE CELEBI, M., KINGRAVI, H. A., AND VELA, P. A. 2012. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications.

ERDOGAN, S. AND TYMOR, M. 2005. A data mining application in a student database. Journal Of Aeronautics and Space Technologies 2.

ESTIVILL-CASTRO, V. 2002. Why so many clustering algorithms: a position paper. ACM SIGKDD Explorations Newsletter 4, 1, 65–75.

FAYYAD U., P. S. S. P. 1996. Extracting useful knowledge from volumes of data. Communications of the ACM 39, 11, pp. 27–34.

FLEISS, J. L. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5, 378–382.

GEVREY, M., DIMOPAULOS, I., AND LEK, S. 2003. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecological Modelling 160, 249–264.

HAGAN, M. T. AND MENHAJ, M. B. 1994. Training feedforward networks with the Marquardt algorithm.

IEEE Trans. Neural Networks 5, 989–993.

HALONEN, P. 2012. Tietotekniikan laitos. 2. TIETOTEKNIIKKA 12a- valintasyyt- opetuksen laatumielipiteet.pdf.

HAN, J., KAMBER, M., AND TUNG, A. 2001. Spatial clustering methods in data mining: A survey, h. miller and j. han (eds.), geographic data mining and knowledge discovery.

HARDEN, T. AND TERVO, M. 2012. Informaatioteknologian tiedekunta. 1. ITK 4- opinnoista suoriutuminen.pdf.

HARPSTEAD, E., MACLELLAN, C. J., KOEDINGER, K. R., ALEVEN, V., DOW, S. P., AND MYERS, B. A. 2013. Investigating the solution space of an open-ended educational game using conceptual feature extraction. In EDM2013.

HAWKINS, W., HEFFERNAN, N., WANG, Y., AND BAKER, R. S. 2013. Extending the assistance model: Analyzing the use of assistance over time. In EDM2013.

HETTMANSPERGER, T. P. AND MCKEAN, J. W. 1998. Robust nonparametric statistical methods. Edward Arnold, London.

HOLLANDER, M., WOLFE, D. A., AND CHICKEN, E. 2013. Nonparametric statistical methods. Vol. 751. John Wiley & Sons.

HORNIK, K., STINCHCOMBE, M., AND WHITE, H. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 359–366.

HUANG, G. B. 2003. Learning capability and storage capacity of two-hidden-layer feedforward networks. Neural Networks, IEEE Transactions on 14, 2, 274–281.

HUBER, P. J. 1981. Robust Statistics. John Wiley & Sons Inc., New York

JAIN, A. K. 2010. Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31, 8, 651–666.

JERKINS, J. A., STENGER, C. L., STOVALL, J., AND JENKINS, J. T. 2013. Establishing the impact of a computer science/mathematics anti-symbiotic stereotype in cs students. J. Comput. Sci. Coll. 28, 5 (May), 47–53.

JICK, T. D. 1979. Mixing qualitative and quantitative methods: Triangulation in action. Administrative science quarterly 24, 4, 602–611.

JOHN, G. H., KOHAVI, R., AND PFLEGER, K. 1994. Irrelevant features and the subset selection problem. In Proceedings of the 11th International Conference on Machine Learning. 121–129.

KARKK AINEN , T. 2002. MLP in layer-wise form with applications in weight decay. Neural Computation 14, 1451–1480.

KARKK AINEN , T. 2014. Feedforward network - with or without an adaptive hidden layer. IEEE Transactions on Neural Networks and Learning Systems. in revision.

KARKK AINEN , T. AND A YRAM O , S. 2004. Robust clustering methods for incomplete and erroneous data. In Proceedings of the Fifth Conference on Data Mining. WIT Press, 101–112.

KARKK AINEN , T. AND A YRAM O , S. 2005. On computation of spatial median for robust data mining. Evolutionary and Deterministic Methods for Design, Optimization and Control with Applications to Industrial and Societal Problems, EUROGEN, Munich.

KARKK AINEN , T. AND HEIKKOLA, E. 2004. Robust formulations for training multilayer perceptrons. Neural Computation 16, 837–862.

KARKK AINEN , T., MASLOV, A., AND WARTIAINEN, P. 2014. Region of interest detection using MLP. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning - ESANN 2014. 213–218.

KERR, D. AND CHUNG, G. 2012. Identifying key features of student performance in educational video games and simulations through cluster analysis. Journal of Educational Data Mining 4, 1, 144–182.

KHAN, S. S. AND AHMAD, A. 2013. Cluster center initialization algorithm for k-modes clustering. Expert Systems with Applications.

KINNUNEN, P., MARTTILA-KONTIO, M., AND PESONEN, E. 2013. Getting to know computer science freshmen. In Proceedings of the 13th Koli Calling International Conference on Computing Education Research. Koli Calling ’13. ACM, New York, NY, USA, 59–66.

KOHAVI, R. 1995. Study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’95). 1137– 1143.

KOHAVI, R. AND JOHN, G. H. 1997. Wrappers for feature subset selection. Artificial Intelligence 97, 273–324.

KOTSIANTIS, S. 2012. Use of machine learning techniques for educational proposes: a decision support system for forecasting students grades. Artificial Intelligence Review 37, 4, 331–344.

MACQUEEN, J. ET AL. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Vol. 1. California, USA, 14.

MEILA˘ , M. AND HECKERMAN, D. 1998. An experimental comparison of several clustering and initialization methods. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 386–395.

MENDEZ, G., BUSKIRK, T., LOHR, S., AND HAAG, S. 2008. Factors associated with persistence in science and engineering majors: An exploratory study using classification trees and random forests. Journal of Engineering Education 97, 1.

PINKUS, A. 1999. Approximation theory of the MLP model in neural networks. Acta Numerica, 143– 195.

RICE, W. R. 1989. Analyzing tables of statistical tests. Evolution 43, 1, 223–225.

ROUSSEEUW, P. J. AND LEROY, A. M. 1987. Robust regression and outlier detection. John Wiley & Sons Inc., New York.

RUBIN, D. B. 1976. Inference and missing data. Biometrika 63, 3, 581–592.

RUCK, D. W., ROGERS, S. K., AND KABRISKY, M. 1990. Feature selection using a multilayer perceptron. Neural Network Computing 2, 2, 40–48.


ROBINSON, B., SEKER, R., AND THOMPSON, A. 2013a. Computer science curricula 2013.

SAHAMI, M., ROACH, S., CUADROS-VARGAS, E., AND LEBLANC, R. 2013b. Acm/ieee-cs computer science curriculum 2013: Reviewing the ironman report. In Proceeding of the 44th ACM Technical Symposium on Computer Science Education. SIGCSE ’13. ACM, New York, NY, USA, 13–14.

SAN PEDRO, M. O. Z., BAKER, R. S., BOWERS, A. J., AND HEFFERNAN, N. T. 2013. Predicting college enrollment from student interaction with an intelligent tutoring system in middle school. In Proceedings of the 6th International Conference on Educational Data Mining. 177–184.

SHOJAEEFARD, M. H., AKBARI, M., TAHANI, M., AND FARHANI, F. 2013. Sensitivity analysis of the artificial neural network outputs in friction stir lap joining of aluminum to brass. Advances in Material Science and Engineering 2013, 1–7.

SPRINGER, A., JOHNSON, M., EAGLE, M., AND BARNES, T. 2013. Using sequential pattern mining to increase graph comprehension in intelligent tutoring system student data. In Proceeding of the 44th

ACM technical symposium on Computer science education. ACM, 732–732.

STEINBACH, M., ERTOZ , L., AND KUMAR, V. 2004. The challenges of clustering high dimensional data. In New Directions in Statistical Physics. Springer, 273–309.

TAMURA, S. AND TATEISHI, M. 1997. Capabilities of a four-layered feedforward neural network: Four layers versus three. IEEE Transactions on Neural Networks 8, 2, 251–255.

VALSAMIDIS, S., KONTOGIANNIS, S., KAZANIDIS, I., THEODOSIOU, T., AND KARAKOS, A. 2012. A clustering methodology of web log data for learning management systems. Educational Technology & Society 15, 2, 154–167.

VAN DE SANDE, B. 2013. Applying three models of learning to individual student log data. Under review.

VIHAVAINEN, A., LUUKKAINEN, M., AND KURHILA, J. 2013. Using students programming behavior to predict success in an introductory mathematics course. Under review.

XU, R. AND WUNSCH, D. C. 2005. Survey of clustering algorithms. IEEE Transactions on Neural Networks 16, 3, 645–678.

ZHONG, C., MIAO, D., WANG, R., AND ZHOU, X. 2008. Divfrp: An automatic divisive hierarchical clustering method based on the furthest reference points. Pattern Recognition Letters 29, 16, 2067– 2077.