Analysing Student Performance using Sparse Data of Core Bachelor Courses



Published Feb 24, 2015
Mirka Saarela Tommi Kärkkäinen


Curricula for Computer Science (CS) degrees are characterized by the strong occupational orientation of the discipline. In the BSc degree structure, with clearly separate CS core studies, the learning skills for these and other required courses may vary a lot, which is shown in students’ overall performance. To analyze this situation, we apply nonstandard educational data mining techniques on a preprocessed log file of the passed courses. The joint variation in the course grades is studied through correlation analysis while intrinsic groups of students are created and analyzed using a robust clustering technique. Since not all students attended all courses, there is a nonstructured sparsity pattern to cope with. Finally, multilayer perceptron neural network with cross-validation based generalization assurance is trained and analyzed using analytic mean sensitivity to explain the nonlinear regression model constructed. Local (withinmethods) and global (between-methods) triangulation of different analysis methods is argued to improve the technical soundness of the presented approaches, giving more confidence to our final conclusion that general learning capabilities predict the students’ success better than specific IT skills learned as part of the core studies.

How to Cite

Saarela, M., & Kärkkäinen, T. (2015). Analysing Student Performance using Sparse Data of Core Bachelor Courses. Journal of Educational Data Mining, 7(1), 3–32.
Abstract 1002 | PDF Downloads 630



sparse educational data, triangulation, curricula refinement, correlation analysis, robust clustering, multilayer perceptron

ALDAHDOOH, R. T. AND ASHOUR, W. 2013. Dimk-means distance-based initialization method for k-means clustering algorithm. International Journal of Intelligent Systems and Applications (IJISA) 5, 2, 41.

APOSTOL, T. M. 1969. Calculus, Volume 2: Multi-variable Calculus and Linear Algebra with Applications to Differential Equations and Probability. Wiley. ¨A YR¨A M¨O , S. 2006. Knowledge Mining Using Robust Clustering. Jyv¨askyl¨a Studies in Computing, vol. 63. University of Jyv¨askyl¨a.

BAI, L., LIANG, J., AND DANG, C. 2011. An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowledge-Based Systems 24, 6, 785–795.

BAI, L., LIANG, J., DANG, C., AND CAO, F. 2012. A cluster centers initialization method for clustering categorical data. Expert Systems with Applications 39, 9, 8022–8029.

BAKER, R. ET AL. 2010. Data mining for education. International Encyclopedia of Education 7, 112– 118.

BAKER, R. S. AND YACEF, K. 2009. The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining 1, 1, 3–17.

BARTLETT, P. L. 1998. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. Information Theory, IEEE Transactions on 44, 2, 525–536.

BATISTA, G. AND MONARD, M. C. 2003. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence 17, 519–533.

BAYER, J., BYDZOVSK´A , H., G´E RYK, J., OBˇSIVAC, T., AND POPELINSK`Y , L. 2012. Predicting dropout from social behaviour of students. In Educational Data Mining 2012. 103–109.

BHARDWAJ, B. AND PAL, S. 2011. Mining educational data to analyze students’ performance. (IJCSIS) International Journal of Computer Science and Information Security, 9, 4.

BOUCHET, F., KINNEBREW, J. S., BISWAS, G., AND AZEVEDO, R. 2012. Identifying students’ characteristic learning behaviors in an intelligent tutoring system fostering self-regulated learning. In Educational Data Mining 2012. 65–72.

BRADLEY, P. AND FAYYAD, U. 1998. Refining initial points for k-means clustering. In ICML. Vol. 98. 91–99.

BRYMAN, A. 2003. Triangulation. The Sage encyclopedia of social science research methods. Thousand Oaks, CA: Sage.

CALDERS, T. AND PECHENIZKIY, M. 2012. Introduction to the special section on educational data mining. ACM SIGKDD Explorations Newsletter 13, 2, 3–6.

CAMPAGNI, R., MERLINI, D., AND SPRUGNOLI, R. 2012. Analyzing paths in a student database. In Educational Data Mining 2012. 208–209.

CARLSON, R., GENIN, K., RAU, M., AND SCHEINES, R. 2013. Student profiling from tutoring system log data: When do multiple graphical representations matter? In Educational Data Mining 2013. 12–20.

CHANDRA, E. AND NANDHINI, K. 2010. Knowledge mining from student data. European Journal of Scientific Research 47, 1, 156–163.

CHEN, L., CHEN, L., JIANG, Q., WANG, B., AND SHI, L. 2009. An initialization method for clustering high-dimensional data. In Database Technology and Applications, 2009 First International Workshop on. IEEE, 444–447.

CROUX, C., DEHON, C., AND YADINE, A. 2010. The k-step spatial sign covariance matrix. Adv Data Anal Classif 4, 137–150.

DENZIN, N. 1970. Strategies of multiple triangulation. The research act in sociology: A theoretical introduction to sociological method, 297–313.

DIMOPOULOS, Y., BOURRET, P., AND LEK, S. 1995. Use of some sensitivity criteria for choosing networks with good generalization ability. Neural Processing Letters 2, 6, 1–4.

EMRE CELEBI, M., KINGRAVI, H. A., AND VELA, P. A. 2012. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications.

ERDOGAN, S. AND TYMOR, M. 2005. A data mining application in a student database. Journal Of Aeronautics and Space Technologies 2, 53–57.

FAYYAD, U., PIATESKY-SHAPIRO, G., AND P., S. 1996. Extracting useful knowledge from volumes of data. Communications of the ACM 39, 11, pp. 27–34.

FLEISS, J. L. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5, 378–382.

GEVREY, M., DIMOPAULOS, I., AND LEK, S. 2003. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecological Modelling 160, 249–264.

HAGAN, M. T. AND MENHAJ, M. B. 1994. Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Networks 5, 989–993.

HALONEN, P. 2012. Tietotekniikan laitos. 2. TIETOTEKNIIKKA 12a- valintasyyt- opetuksen laatumielipiteet. pdf.

HAN, J., KAMBER, M., AND TUNG, A. 2001. Spatial clustering methods in data mining: A survey. Data Mining and Knowledge Discovery.

HARDEN, T. AND TERVO, M. 2012. Informaatioteknologian tiedekunta. 1. ITK 4- opinnoista suoriutuminen. pdf.

HARPSTEAD, E., MACLELLAN, C. J., KOEDINGER, K. R., ALEVEN, V., DOW, S. P., AND MYERS, B. A. 2013. Investigating the solution space of an open-ended educational game using conceptual feature extraction. In Educational Data Mining 2013. 51–59.

HAWKINS, W., HEFFERNAN, N., WANG, Y., AND BAKER, R. S. 2013. Extending the assistance model: Analyzing the use of assistance over time. In Educational Data Mining 2013. 59–67.

HETTMANSPERGER, T. P. AND MCKEAN, J. W. 1998. Robust nonparametric statistical methods. Edward Arnold, London.

HOLLANDER, M., WOLFE, D. A., AND CHICKEN, E. 2013. Nonparametric statistical methods. Vol. 751. John Wiley & Sons.

HORNIK, K., STINCHCOMBE, M., AND WHITE, H. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 359–366.

HUANG, G. B. 2003. Learning capability and storage capacity of two-hidden-layer feedforward networks. Neural Networks, IEEE Transactions on 14, 2, 274–281.

HUBER, P. J. 1981. Robust Statistics. John Wiley & Sons Inc., New York.

JAIN, A. K. 2010. Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31, 8, 651–666.

JERKINS, J. A., STENGER, C. L., STOVALL, J., AND JENKINS, J. T. 2013. Establishing the Impact of a Computer Science/Mathematics Anti-symbiotic Stereotype in CS Students. Journal of Computing Sciences in Colleges 28, 5 (May), 47–53.

JICK, T. D. 1979. Mixing qualitative and quantitative methods: Triangulation in action. Administrative science quarterly 24, 4, 602–611.

JOHN, G. H., KOHAVI, R., AND PFLEGER, K. 1994. Irrelevant features and the subset selection problem. In Proceedings of the 11th International Conference on Machine Learning. 121–129. K¨ARKK¨AINEN, T. 2002. MLP in layer-wise form with applications in weight decay. Neural Computation 14, 1451–1480.

K¨ARKK¨AINEN, T. 2014. Feedforward Network - With or Without an Adaptive Hidden Layer. IEEE Transactions on Neural Networks and Learning Systems. In revision.

K¨ARKK¨AINEN, T. AND ¨AYR¨AM¨O , S. 2005. On computation of spatial median for robust data mining. Evolutionary and Deterministic Methods for Design, Optimization and Control with Applications to Industrial and Societal Problems, EUROGEN, Munich.

K¨ARKK¨AINEN, T. AND HEIKKOLA, E. 2004. Robust formulations for training multilayer perceptrons. Neural Computation 16, 837–862.

K¨ARKK¨AINEN, T., MASLOV, A., AND WARTIAINEN, P. 2014. Region of interest detection using MLP. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning - ESANN 2014. 213–218.

K¨ARKK¨AINEN, T. AND TOIVANEN, J. 2001. Building blocks for odd–even multigrid with applications to reduced systems. Journal of computational and applied mathematics 131, 1, 15–33.

KERR, D. AND CHUNG, G. 2012. Identifying key features of student performance in educational video games and simulations through cluster analysis. Journal of Educational Data Mining 4, 1, 144–182.

KHAN, S. S. AND AHMAD, A. 2013. Cluster center initialization algorithm for k-modes clustering. Expert Systems with Applications.

KINNUNEN, P., MARTTILA-KONTIO, M., AND PESONEN, E. 2013. Getting to know computer science freshmen. In Proceedings of the 13th Koli Calling International Conference on Computing Education Research. Koli Calling ’13. ACM, New York, NY, USA, 59–66.

KOHAVI, R. 1995. Study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’95). 1137– 1143.

KOHAVI, R. AND JOHN, G. H. 1997. Wrappers for feature subset selection. Artificial Intelligence 97, 273–324.

KOTSIANTIS, S. 2012. Use of machine learning techniques for educational proposes: a decision support system for forecasting students grades. Artificial Intelligence Review 37, 4, 331–344.

MEIL˘A, M. AND HECKERMAN, D. 1998. An experimental comparison of several clustering and initialization methods. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 386–395.

MENDEZ, G., BUSKIRK, T., LOHR, S., AND HAAG, S. 2008. Factors associated with persistence in science and engineering majors: An exploratory study using classification trees and random forests. Journal of Engineering Education 97, 1.

PINKUS, A. 1999. Approximation theory of the MLP model in neural networks. Acta Numerica, 143– 195.

RICE, W. R. 1989. Analyzing tables of statistical tests. Evolution 43, 1, 223–225.

ROUSSEEUW, P. J. AND LEROY, A. M. 1987. Robust regression and outlier detection. John Wiley & Sons Inc., New York.

RUBIN, D. B. 1976. Inference and missing data. Biometrika 63, 3, 581–592.

RUBIN, D. B. AND LITTLE, R. J. 2002. Statistical analysis with missing data. Hoboken, NJ: J Wiley & Sons.

RUCK, D. W., ROGERS, S. K., AND KABRISKY, M. 1990. Feature selection using a multilayer perceptron. Neural Network Computing 2, 2, 40–48.

SAARELA, M. AND K¨ARKK¨AINEN, T. 2014. Discovering Gender-Specific Knowledge from Finnish Basic Education using PISA Scale Indices. In Educational Data Mining 2014. 60–68.


ROBINSON, B., SEKER, R., AND THOMPSON, A. 2013a. Computer science curricula 2013.

SAHAMI, M., ROACH, S., CUADROS-VARGAS, E., AND LEBLANC, R. 2013b. ACM/IEEE-CS Computer Science Curriculum 2013: Reviewing the Ironman Report. In Proceeding of the 44th ACM Technical Symposium on Computer Science Education. ACM, New York, USA, 13–14.

SAN PEDRO, M. O. Z., BAKER, R. S., BOWERS, A. J., AND HEFFERNAN, N. T. 2013. Predicting college enrollment from student interaction with an intelligent tutoring system in middle school. In Educational Data Mining 2013. 177–184.

SHOJAEEFARD, M. H., AKBARI, M., TAHANI, M., AND FARHANI, F. 2013. Sensitivity analysis of the artificial neural network outputs in friction stir lap joining of aluminum to brass. Advances in Material Science and Engineering 2013, 1–7.

SPRINGER, A., JOHNSON, M., EAGLE, M., AND BARNES, T. 2013. Using sequential pattern mining to increase graph comprehension in intelligent tutoring system student data. In Proceeding of the 44th ACM technical symposium on Computer science education. ACM, 732–732.

STEINBACH, M., ERT¨OZ, L., AND KUMAR, V. 2004. The challenges of clustering high dimensional data. In New Directions in Statistical Physics. Springer, 273–309.

TAMURA, S. AND TATEISHI, M. 1997. Capabilities of a four-layered feedforward neural network: Four layers versus three. IEEE Transactions on Neural Networks 8, 2, 251–255.

VALSAMIDIS, S., KONTOGIANNIS, S., KAZANIDIS, I., THEODOSIOU, T., AND KARAKOS, A. 2012. A clustering methodology of web log data for learning management systems. Educational Technology & Society 15, 2, 154–167.

VIHAVAINEN, A., LUUKKAINEN, M., AND KURHILA, J. 2013. Using students’ programming behavior to predict success in an introductory mathematics course. In Educational Data Mining 2013. 300– 303.

XU, R. AND WUNSCH, D. C. 2005. Survey of clustering algorithms. IEEE Transactions on Neural Networks 16, 3, 645–678.

ZHONG, C., MIAO, D., WANG, R., AND ZHOU, X. 2008. Divfrp: An automatic divisive hierarchical clustering method based on the furthest reference points. Pattern Recognition Letters 29, 16, 2067–2077.