We describe two primary components of an analytics system for STEM education research developed for a physics education research portal. The purpose of this data exploration system is to allow instructors to comparatively assess student performance in intraclass, longitudinal, and interinstitutional contexts. The interface allows instructors to upload course data including student demographics, exams, and grading rubrics to a secure site, then retrieve descriptive statistics and detailed visualizations of this data.
The first component consists of a rule-based system for pattern analysis that allows multiple common assessment formats to be inferred without metadata, and in some cases without headers. This paper describes the incremental development of a priority-based inference mechanism with matching heuristics, based on real and synthetic sample data, and further discusses the application of machine learning and data mining algorithms to the adaptation of probabilistic pattern analyzers. Early results indicate potential for user modeling and adaptive personalized recognition of document types and abstract type definitions.
The second component is an information retrieval and information visualization module for comparative evaluation of uploaded and preprocessed data. Views are provided for inspection of aggregate statistics about student scores, comparison over time within one course, or comparison across multiple years. The design of this system includes a search facility for retrieving anonymized data from classes similar to the uploader’s own. These visualizations include tracking of student performance on a range of standardized assessments including Halloun et al.’s Force Concept Inventory (FCI, 1995), Thornton and Sokoloff’s Force and Motion Conceptual Evaluation (FMCE, 1998), and Chabay & Sherwood's Brief Electricity and Magnetism Assessment (BEMA, 2006). Assessments can be viewed as pre- and post-tests with comparative statistics (e.g., normalized gain), decomposed by answer in the case of multiple-choice questions, and manipulated using prespecified data transformations such as aggregation and refinement (drill down and roll up). Furthermore, the system is designed to incorporate a scalable framework for machine learning-based analytics, including clustering and similarity-based retrieval, time series prediction, and probabilistic reasoning.
Both informal assessment of the system and intensive user testing on a pre-release version have yielded positive feedback. This feedback is instrumental in feature revision, both to improve system functionality and to plan the adaptation of the design of these two data exploration components to other STEM disciplines, such as computer science and mathematics. Lessons learned from visualization design and user experience feedback are reported in the context of usability criteria such as desired functionality of the pattern inference system.
The paper concludes with a discussion of the system as an emerging technology, the schedule for its deployment and continued augmentation, and the design rationale for user-centered intelligent systems components. The focal point of future work in this area is on facilitating meaningful interactive exploration of the data by multiple types of stakeholders who have been identified for this type of education research portal. This is being achieved using a synthesis of data-driven approaches towards information extraction, retrieval, transformation, and visualization.
Are you a researcher? Would you like to cite this paper?
Visit the ASEE document repository at
for more tools and easy citations.