Providing real-time feedback to novice programmers is critical to their ability to learn to program. Higher enrollment in introductory computer science courses reduces the amount of time for individual student-instructor interaction. Reduced interaction time equates to a reduction in the time for and amount of instructor feedback. Building on our work involving manual classification and analysis of student source code comments, in this full paper we explore how machine learning techniques can be leveraged to provide real-time automated feedback to students with regards to their computational thinking processes. This paper discusses the initial classification of student source code comments using supervised machine learning methods. In this phase of classification, we focus on whether a comment is sufficient or insufficient. The classification process is broken down into three steps: text processing, data exploration, and comment classification using the Multinomial Naive-Bayes Classifier and a Random Forest Classifier. We detail the text processing requirements, including how to prepare the raw student data using natural language processing techniques such as stop word filtration, tokenization, and lemmatization. We also show how the data preparation process can affect the final classification outcome. Using Multinomial Naive-Bayes we achieved a precision rate of 82%. Using a Random Forest classifier and lemmatization we achieved a classification precision of 90%. We conclude with a description of how the current classification results can be used to provide real-time feedback to students while they are learning to program. Towards our ultimate goal of providing comprehensive real-time feedback to students, we describe future research plans, which include using unsupervised machine learning techniques to move beyond basic binary classification.
Are you a researcher? Would you like to cite this paper?
Visit the ASEE document repository at
for more tools and easy citations.