Analysis of Student Pair Teamwork Using GitHub Activities

Niki Gitinabard; Zhikai Gao; Sarah Heckman; Tiffany Barnes; Collin F. Lynch

doi:10.5281/zenodo.7646845

Analysis of Student Pair Teamwork Using GitHub Activities

PDF

Published Mar 15, 2023

DOI https://doi.org/10.5281/zenodo.7646845

Niki Gitinabard

North Carolina State University

https://orcid.org/0000-0002-9975-1677

Zhikai Gao

North Carolina State University

Sarah Heckman

North Carolina State University

https://orcid.org/0000-0003-4351-8611

Tiffany Barnes

North Carolina State University

Collin F. Lynch

North Carolina State University

https://orcid.org/0000-0001-6958-9368

Abstract

Few studies have analyzed students’ teamwork (pairwork) habits in programming projects due to the challenges and high cost of analyzing complex, long-term collaborative processes. In this work, we analyze student teamwork data collected from the GitHub platform with the goal of identifying specific pair teamwork styles. This analysis builds on an initial corpus of commit message data that was manually labeled by subject matter experts. We then extend this annotation through the use of self-supervised, semi-supervised learning to develop a large-scale annotated dataset that covers multiple course offerings from a second-semester CS2 course. Further, we develop a series of predictive models to automatically identify student teamwork styles. Finally, we compare trends in students’ performance and team selection for each teamwork style to see if any of them reflected better student outcomes or different trends of help-seeking among students. Our analysis showed that applying self-supervised semi-supervised methods helps us to label larger subsets of data automatically and maintains and even sometimes improves the performance of the fully supervised models on a held-out validation set. Our analysis also showed that members of teams in which all members have significant contributions tend to have better performance in class, but their help-seeking behaviors are not significantly different.

How to Cite

Gitinabard, N., Gao, Z., Heckman, S., Barnes, T., & Lynch, C. F. (2023). Analysis of Student Pair Teamwork Using GitHub Activities. Journal of Educational Data Mining, 15(1), 32–62. https://doi.org/10.5281/zenodo.7646845

Abstract 766 | PDF Downloads 594

Keywords

teamwork, GitHub, undergraduate, self-supervised learning, help-seeking

References

AHMADZADEH, M., ELLIMAN, D., AND HIGGINS, C. 2005. An analysis of patterns of debugging among novice computer science students. In Proceedings of the 10th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education. ITiCSE ’05. Association for Computing Machinery, 84–88.

BANDURA, A. AND WALTERS, R. H. 1977. Social learning theory. Vol. 1. Prentice-hall Englewood Cliffs, NJ.

BARR, T. F., DIXON, A. L., AND GASSENHEIMER, J. B. 2005. Exploring the “lone wolf” phenomenon in student teams. Journal of Marketing Education 27, 1, 81–90.

BLIKSTEIN, P. 2011. Using learning analytics to assess students’ behavior in open-ended programming tasks. In Proceedings of the 1st International Conference on Learning Analytics and Knowledge. LAK 2011. Association for Computing Machinery, 110–116.

BREIMAN, L. 1996. Bagging predictors. Machine learning 24, 2, 123–140.

CARTER, A. S., HUNDHAUSEN, C. D., AND ADESOPE, O. 2015. The normalized programming state model: Predicting student performance in computing courses based on programming behavior. In Proceedings of the eleventh annual International Conference on Computing Education Research. ICER’15. Association for Computing Machinery, 141–150.

CARTER, A. S., HUNDHAUSEN, C. D., AND ADESOPE, O. 2017. Blending measures of programming and social behavior into predictive models of student achievement in early computing courses. ACM Trans. Comput. Educ. 17, 3 (Aug.).

CHAO, P.-Y. 2016. Exploring students’ computational practice, design and performance of problemsolving through a visual programming environment. Computers & Education 95, 202–215.

COMAN, I. D., ROBILLARD, P. N., SILLITTI, A., AND SUCCI, G. 2014. Cooperation, collaboration and pair-programming: Field studies on backup behavior. Journal of Systems and Software 91, 124–134.

EL ASRI, I., KERZAZI, N., BENHIBA, L., AND JANATI, M. 2017. From periphery to core: a temporal analysis of github contributors’ collaboration network. In Collaboration in a Data-Rich World, L. M. Camarinha-Matos, H. Afsarmanesh, and R. Fornasiero, Eds. Springer International Publishing, 217– 229.

FEICHTNER, S. B. AND DAVIS, E. A. 1984. Why some groups fail: A survey of students’ experiences with learning groups. Organizational Behavior Teaching Review 9, 4, 58–73.

GANAPATHY, C., SHAW, E., AND KIM, J. 2011. Assessing collaborative undergraduate student wikis and svn with technology-based instrumentation: Relating participation patterns to learning. In 2011 ASEE Annual Conference & Exposition. ASEE Conferences, 22.233.1 – 22.233.10.

GARDNER, W. A. 1984. Learning characteristics of stochastic-gradient-descent algorithms: A general study, analysis, and critique. Signal Processing 6, 2, 113–133.

GLASSY, L. 2006. Using version control to observe student software development processes. Journal of Computing Sciences in Colleges 21, 3, 99–106.

HECKMAN, S. AND KING, J. 2018. Developing software engineering skills using real tools for automated grading. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education. SIGCSE ’18. Association for Computing Machinery, 794–799.

HOEGL, M. AND GEMUENDEN, H. G. 2001. Teamwork quality and the success of innovative projects: A theoretical concept and empirical evidence. Organization Science 12, 4, 435–449.

HOSSEINI, R., VIHAVAINEN, A., AND BRUSILOVSKY, P. 2014. Exploring problem solving paths in a java programming course. In Proceedings of the 25th Workshop of the Psychology of Programming Interest Group, B. du Boulay and J. Good, Eds. 65–76.

IMBRIE, P., IMMEKUS, J. C., AND MALLER, S. J. 2005. Work in progress-a model to evaluate team effectiveness. In Proceedings Frontiers in Education 35th Annual Conference. IEEE, T4F–12.

JOACHIMS, T. 1996. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. Tech. rep., Carnegie-mellon univ pittsburgh pa dept of computer science.

KAY, J., MAISONNEUVE, N., YACEF, K., AND REIMANN, P. 2006. The big five and visualisations of team work activity. In Intelligent Tutoring Systems: 8th International Conference, ITS 2006, Jhongli, Taiwan, June 26-30, 2006. Proceedings 8, M. Ikeda, K. D. Ashley, and T.-W. Chan, Eds. Springer, Springer Berlin Heidelberg, 197–206.

KIM, J., SHAW, E., XU, H., AND ADARSH, G. 2012. Assisting instructional assessment of undergraduate collaborative wiki and svn activities. In Proceedings of the 5th International Conference of Educational Data Mining, K. Yacef, O. Za¨ıane, A. Hershkovitz, M. Yudelson, and J. Stamper, Eds. International Educational Data Mining Society, 10–16.

KOSTOPOULOS, G., KOTSIANTIS, S., AND PINTELAS, P. 2015. Estimating student dropout in distance higher education using semi-supervised techniques. In Proceedings of the 19th Panhellenic Conference on Informatics, K. N. N, A. Demosthenes, and N. Mara, Eds. Association for Computing Machinery, 38–43.

KRUSKAL, W. H. AND WALLIS, W. A. 1952. Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association 47, 260, 583–621.

LEE, H.-J. 2009. Peer evaluation in blended team project-based learning; what do students find important? In Proceedings of E-Learn 2009: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, T. Bastiaens, J. Dron, and C. Xin, Eds. Association for the Advancement of Computing in Education (AACE), 2838–2842.

LEE, H.-J., KIM, H., AND BYUN, H. 2017. Are high achievers successful in collaborative learning? an explorative study of college students’ learning approaches in team project-based learning. Innovations in Education and Teaching International 54, 5, 418–427.

LIMA, J., TREUDE, C., FILHO, F. F., AND KULESZA, U. 2015. Assessing developer contribution with repository mining-based metrics. In 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 536–540.

LIN, Y.-T., WU, C.-C., HOU, T.-Y., LIN, Y.-C., YANG, F.-Y., AND CHANG, C.-H. 2016. Tracking students’ cognitive processes during program debugging-an eye-movement approach. IEEE Transactions on Education 59, 3, 175–186.

LIU, Y., STROULIA, E., WONG, K., AND GERMAN, D. 2004. Using cvs historical information to understand how students develop software. In Proceedings of the International Workshop on Mining Software Repositories, Edinburgh, Scotland, A. E. Hassan, R. C. Holt, and A. Mockus, Eds. IET, 32–36.

LIVIERIS, I. E., DRAKOPOULOU, K., MIKROPOULOS, T. A., TAMPAKAS, V., AND PINTELAS, P. 2018. An ensemble-based semi-supervised approach for predicting students’ performance. In Research on e-Learning and ICT in Education: Technological, Pedagogical and Instructional Perspectives, T. A. Mikropoulos, Ed. Springer International Publishing, 25–42.

LIVIERIS, I. E., DRAKOPOULOU, K., TAMPAKAS, V. T., MIKROPOULOS, T. A., AND PINTELAS, P. 2019. Predicting secondary school students’ performance utilizing a semi-supervised learning approach. Journal of Educational Computing Research 57, 2, 448–470.

MACFARLAND, T. W., YATES, J. M., MACFARLAND, T. W., AND YATES, J. M. 2016. Mann–whitney u test. Springer, 103–132.

MAIN, J. B. AND SANCHEZ-PENA, M. 2015. Student evaluations of team members: Is there gender bias? In 2015 IEEE Frontiers in Education Conference (FIE). IEEE, 1–6.

MANOCHA, S. AND GIROLAMI, M. A. 2007. An empirical analysis of the probabilistic k-nearest neighbour classifier. Pattern Recognition Letters 28, 13, 1818–1824.

MIERLE, K., LAVEN, K., ROWEIS, S., AND WILSON, G. 2005. Mining student cvs repositories for performance indicators. In Proceedings of the 2005 International Workshop on Mining Software Repositories. Association for Computing Machinery, 1–5.

MIYATO, T., DAI, A. M., AND GOODFELLOW, I. 2016. Adversarial training methods for semisupervised text classification. arXiv preprint arXiv:1605.07725.

MURPHY, C., KAISER, G., LOVELAND, K., AND HASAN, S. 2009. Retina: Helping students and instructors based on observed programming activities. In Proceedings of the 40th ACM Technical Symposium on Computer Science Education. SIGCSE’09. Association for Computing Machinery, 178–182.

NÄYKKI, P., JÄRVELÄ, S., KIRSCHNER, P. A., AND JÄRVENOJA, H. 2014. Socio-emotional conflict in collaborative learning-a process-oriented case study in a higher education context. International Journal of Educational Research 68, 1–14.

NGUYEN, T. AND CHUA, C. 2016. Predictive tool for software team performance. In 2016 23rd Asia- Pacific Software Engineering Conference (APSEC), A. Potanin, G. Murphy, S. Reeves, and J. Dietrich, Eds. IEEE, 373–376.

NIGAM, K., MCCALLUM, A., AND MITCHELL, T. M. 2006. Semi-supervised text classification using em. In Semi-Supervised Learning, O. Chapelle, B. Schölkopf, and A. Zien, Eds. Adaptive Computation and Machine Learning. MIT Press, 31–51.

NORTHRUP, S. G. AND NORTHRUP, D. A. 2006. Multidisciplinary teamwork assessment: Individual contributions and interdisciplinary interaction. In Proceedings. Frontiers in Education. 36th Annual Conference. IEEE, 15–20.

OAKLEY, B., FELDER, R. M., BRENT, R., AND ELHAJJ, I. 2004. Turning student groups into effective teams. Journal of Student Centered Learning 2, 1, 9–34.

PARIZI, R. M., SPOLETINI, P., AND SINGH, A. 2018. Measuring team members’ contributions in software engineering projects using git-driven technology. In 2018 IEEE Frontiers in Education Conference (FIE). IEEE, 1–5.

PERERA, D., KAY, J., KOPRINSKA, I., YACEF, K., AND ZAÏANE, O. R. 2009. Clustering and sequential pattern mining of online collaborative learning data. IEEE Transactions on Knowledge and Data Engineering 21, 6, 759–772.

REID, K. L. AND WILSON, G. V. 2005. Learning by doing: Introducing version control as a way to manage student assignments. In Proceedings of the 36th SIGCSE Technical Symposium on Computer Science Education. SIGCSE ’05. Association for Computing Machinery, 272–276.

RISH, I. ET AL. 2001. An empirical study of the naive bayes classifier. In IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence. Vol. 3. 41–46.

SALOMON, G. AND GLOBERSON, T. 1989. When teams do not function the way they ought to. International Journal of Educational Research 13, 1, 89–99.

SEERS, A. 1989. Team-member exchange quality: A new construct for role-making research. Organizational Behavior and Human Decision Processes 43, 1, 118–135.

SPACCO, J., DENNY, P., RICHARDS, B., BABCOCK, D., HOVEMEYER, D., MOSCOLA, J., AND DUVALL, R. 2015. Analyzing student work patterns using programming exercise data. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education. SIGCSE’15. Association for Computing Machinery, 18–23.

UCHIDA, S., MONDEN, A., IIDA, H., MATSUMOTO, K.-I., AND KUDO, H. 2002. A multiple-view analysis model of debugging processes. In Proceedings International Symposium on Empirical Software Engineering. ISESE’02. IEEE, 139–147.

VAN DEN BOSSCHE, P., GIJSELAERS, W. H., SEGERS, M., AND KIRSCHNER, P. A. 2006. Social and cognitive factors driving teamwork in collaborative learning environments: Team learning beliefs and behaviors. Small Group Research 37, 5, 490–521.

VAN DER DUIM, L., ANDERSSON, J., AND SINNEMA, M. 2007. Good practices for educational software engineering projects. In 29th International Conference on Software Engineering (ICSE’07). IEEE, 698–707.

VAN GOG, T. AND KESTER, L. 2012. A test of the testing effect: acquiring problem-solving skills from worked examples. Cognitive Science 36, 8, 1532–1541.

VAN HOUWELINGEN, J., LE CESSIE, S., ET AL. 1988. Logistic regression, a review. Statistica Neerlandica 42, 4, 215–232.

VIHAVAINEN, A., LUUKKAINEN, M., AND KURHILA, J. 2013. Using students’ programming behavior to predict success in an introductory mathematics course. In Proceedings of the 6th International Conference on Educational Data Mining (EDM 2013), S. K. D’Mello, R. A. Calvo, and A. Olney, Eds. International Educational Data Mining Society, 300–303.

WATSON, C., LI, F. W., AND GODWIN, J. L. 2014. No tests required: comparing traditional and dynamic predictors of programming success. In Proceedings of the 45th ACM Technical Symposium on Computer Science Education. SIGCSE’14. Association for Computing Machinery, 469–474.

WEN, M., MAKI, K., DOW, S., HERBSLEB, J. D., AND ROSE, C. 2017. Supporting virtual team formation through community-wide deliberation. Proceedings of the ACM on Human-Computer Interaction 1, CSCW, 109.

YAROWSKY, D. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 189–196.

ZHAI, X., OLIVER, A., KOLESNIKOV, A., AND BEYER, L. 2019. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, 1476–1485.

ZHU, X. J. 2005. Semi-supervised learning literature survey. UW Madison CS: Technical Report 1530 https://pages.cs.wisc.edu/˜jerryzhu/pub/ssl_survey.pdf.

Issue

Vol. 15 No. 1 (2023): JEDM Special Issue on Computer Science Education and Educational Data Mining (CSEDM)

Section

Special Issue on CSEDM: Educational Data Mining for Computing Education

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors who publish with this journal agree to the following terms:

The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:

Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.

The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
The Author represents and warrants that:

the Work is the Author’s original work;
the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
the Work is not pending review or under consideration by another publisher;
the Work has not previously been published;
the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
the Work contains no libel, invasion of privacy, or other unlawful matter.

The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.

##plugins.themes.bootstrap3.article.main##

##plugins.themes.bootstrap3.article.sidebar##

Abstract

How to Cite

##plugins.themes.bootstrap3.article.details##

Most read articles by the same author(s)