# 2019 ASEE Annual Conference & Exposition

## Using a Data Science Pipeline for Course Data: A Case Study Analyzing Heterogeneous Student Data in Two Flipped Classes

#### Presented at Curriculum and Assessment I

The landscape of student data in individual classes is changing rapidly. Traditionally, student data in an individual class consisted of homework assignments scores, exam and quiz scores, and project/lab scores. Those scores were usually manually entered in a gradebook. With course materials and assignments moving online and new educational technology tools being released with great frequency, there is an increasing amount of data recorded for each student in a class. That data can and does support formative assessment and evaluation, however there might be other information “hidden” in all that data. This study presents an applied data science methodology to explore student data from an engineering-mathematics course. The exploratory analysis serves two purposes, 1) it supports the faculty members desire to gain insights into the use of flipped classroom instruction and 2) it serves as a case study for a proposed data science pipeline for educational data. The instructor used Learning Catalytics, a classroom response system, on a daily basis in two sections of an engineering mathematics course. Each day’s scores were automatically recorded in the system. This data was combined with traditional homework and exam data and student demographic data. A combination of data mining and classical statistical techniques were used to reveal the trends and peculiarities in the data, without having a specific question or topic to investigate. The data science pipeline which we present has four major stages: data preprocessing, exploratory factor analysis, visualization and feature engineering. Analysis results show the differences and similarities within the course units and help to see learner behaviors. Significant differences related to gender were found, but prior experience in a course taught using the flipped classroom model did not show a significant difference. Exploratory factor analysis identified two factors from the whole data: class activities and exams (factor 1) and homeworks and lesson assignments (factor 2). When we take each factor, we found that they clustered as two groups within the course units: Unit 1 to 7 and Unit 8 to 13, which has a dividing point at the withdraw date. Results also shows that female students attend lesson more than male students and they are more engaged learners. The methodology is based on data mining methods such as factor analysis and visualization methods such as heat maps. Based on the exploratory data analysis, this paper proposes a data science pipeline methodology for analyzing and visualizing raw student data from multiple sources. We observed some trends and clusters within and across course units. Future work will include collecting more data and generating hypothesis.

Authors
1. Asuman Cagla Acun Sener University of Louisville [biography]
2. Dr. Jeffrey Lloyd Hieb University of Louisville [biography]
3. Prof. Olfa Nasraoui University of Louisville [biography]