Using element-level website usage data to improve online learning materials and predict learning outcomes

Research output: ThesisMaster's thesisTheses

Abstract

In this study we use element-level usage data that was collected from the online learning material of an university level introductory programming course for identification of areas-of-interest in the course material as well as for prediction of student learning outcomes. The data was collected in-situ using a JavaScript component embedded in the online learning material, which recorded which HTML elements were visible on the user's screen after each interaction (movement and click) and if the user's screen had been still for at least 2500 milliseconds. A visual analysis indicates that students spend large amounts of time on material sections that discuss special syntactic structures that they are unable to infer from previous experience. Overall, the analysis was able to identify areas of the online learning material that seem to be too long and in-depth for the concepts they are discussing, when the things the students have previously learned are taken into account. This high-level analysis also revealed that the time the students spent viewing an assignment's prompt was statistically significantly correlated with the perceived workload, difficulty and educational value of that same assignment. We observe that when partial correlations are considered, and multiple comparisons are corrected for, time spent with an assignment's prompt on the screen is no longer statistically significantly correlated with the three variables. The same usage data was used to investigate whether material usage statistics can predict learning outcomes or identify strong and at-risk students. The results indicate that based on just three to four weeks of data, it is possible to identify strong and at-risk students with some accuracy. Furthermore, it seems possible to identify student programming assignment scores and total course scores with a somewhat high accuracy. Models based on material usage statistics also displayed some light predictive power in predicting student exam scores. It was also shown that the predictive powers of these models are not based solely on student effort or time-on-task. All told, this thesis demonstrates that fine-grained online learning material usage data is feasible to collect and useful in understanding both the students and the learning material. The results suggest that very simple and almost entirely domain-independent data sources can be used to predict student performance to a relatively large degree, suggesting that a combination of such simple domain-independent metrics could match highly domain dependent and more complex metrics in predictive power, giving raise to more widely usable educational analytics tools.
Original languageEnglish
Awarding Institution
  • Department of Computer Science
Supervisors/Advisors
  • Vihavainen, Arto, Supervisor
  • Ihantola, Petri, Supervisor
Award date13 Jun 2017
Publisher
Publication statusPublished - Jun 2017
MoE publication typeG2 Master's thesis, polytechnic Master's thesis

Fields of Science

  • 113 Computer and information sciences

Cite this

@phdthesis{ea971586943542708151ac721e26b0c7,
title = "Using element-level website usage data to improve online learning materials and predict learning outcomes",
abstract = "In this study we use element-level usage data that was collected from the online learning material of an university level introductory programming course for identification of areas-of-interest in the course material as well as for prediction of student learning outcomes. The data was collected in-situ using a JavaScript component embedded in the online learning material, which recorded which HTML elements were visible on the user's screen after each interaction (movement and click) and if the user's screen had been still for at least 2500 milliseconds. A visual analysis indicates that students spend large amounts of time on material sections that discuss special syntactic structures that they are unable to infer from previous experience. Overall, the analysis was able to identify areas of the online learning material that seem to be too long and in-depth for the concepts they are discussing, when the things the students have previously learned are taken into account. This high-level analysis also revealed that the time the students spent viewing an assignment's prompt was statistically significantly correlated with the perceived workload, difficulty and educational value of that same assignment. We observe that when partial correlations are considered, and multiple comparisons are corrected for, time spent with an assignment's prompt on the screen is no longer statistically significantly correlated with the three variables. The same usage data was used to investigate whether material usage statistics can predict learning outcomes or identify strong and at-risk students. The results indicate that based on just three to four weeks of data, it is possible to identify strong and at-risk students with some accuracy. Furthermore, it seems possible to identify student programming assignment scores and total course scores with a somewhat high accuracy. Models based on material usage statistics also displayed some light predictive power in predicting student exam scores. It was also shown that the predictive powers of these models are not based solely on student effort or time-on-task. All told, this thesis demonstrates that fine-grained online learning material usage data is feasible to collect and useful in understanding both the students and the learning material. The results suggest that very simple and almost entirely domain-independent data sources can be used to predict student performance to a relatively large degree, suggesting that a combination of such simple domain-independent metrics could match highly domain dependent and more complex metrics in predictive power, giving raise to more widely usable educational analytics tools.",
keywords = "113 Computer and information sciences",
author = "Leo Lepp{\"a}nen",
year = "2017",
month = "6",
language = "English",
publisher = "University of Helsinki",
address = "Finland",
school = "Department of Computer Science",

}

Using element-level website usage data to improve online learning materials and predict learning outcomes. / Leppänen, Leo.

University of Helsinki, 2017. 79 p.

Research output: ThesisMaster's thesisTheses

TY - THES

T1 - Using element-level website usage data to improve online learning materials and predict learning outcomes

AU - Leppänen, Leo

PY - 2017/6

Y1 - 2017/6

N2 - In this study we use element-level usage data that was collected from the online learning material of an university level introductory programming course for identification of areas-of-interest in the course material as well as for prediction of student learning outcomes. The data was collected in-situ using a JavaScript component embedded in the online learning material, which recorded which HTML elements were visible on the user's screen after each interaction (movement and click) and if the user's screen had been still for at least 2500 milliseconds. A visual analysis indicates that students spend large amounts of time on material sections that discuss special syntactic structures that they are unable to infer from previous experience. Overall, the analysis was able to identify areas of the online learning material that seem to be too long and in-depth for the concepts they are discussing, when the things the students have previously learned are taken into account. This high-level analysis also revealed that the time the students spent viewing an assignment's prompt was statistically significantly correlated with the perceived workload, difficulty and educational value of that same assignment. We observe that when partial correlations are considered, and multiple comparisons are corrected for, time spent with an assignment's prompt on the screen is no longer statistically significantly correlated with the three variables. The same usage data was used to investigate whether material usage statistics can predict learning outcomes or identify strong and at-risk students. The results indicate that based on just three to four weeks of data, it is possible to identify strong and at-risk students with some accuracy. Furthermore, it seems possible to identify student programming assignment scores and total course scores with a somewhat high accuracy. Models based on material usage statistics also displayed some light predictive power in predicting student exam scores. It was also shown that the predictive powers of these models are not based solely on student effort or time-on-task. All told, this thesis demonstrates that fine-grained online learning material usage data is feasible to collect and useful in understanding both the students and the learning material. The results suggest that very simple and almost entirely domain-independent data sources can be used to predict student performance to a relatively large degree, suggesting that a combination of such simple domain-independent metrics could match highly domain dependent and more complex metrics in predictive power, giving raise to more widely usable educational analytics tools.

AB - In this study we use element-level usage data that was collected from the online learning material of an university level introductory programming course for identification of areas-of-interest in the course material as well as for prediction of student learning outcomes. The data was collected in-situ using a JavaScript component embedded in the online learning material, which recorded which HTML elements were visible on the user's screen after each interaction (movement and click) and if the user's screen had been still for at least 2500 milliseconds. A visual analysis indicates that students spend large amounts of time on material sections that discuss special syntactic structures that they are unable to infer from previous experience. Overall, the analysis was able to identify areas of the online learning material that seem to be too long and in-depth for the concepts they are discussing, when the things the students have previously learned are taken into account. This high-level analysis also revealed that the time the students spent viewing an assignment's prompt was statistically significantly correlated with the perceived workload, difficulty and educational value of that same assignment. We observe that when partial correlations are considered, and multiple comparisons are corrected for, time spent with an assignment's prompt on the screen is no longer statistically significantly correlated with the three variables. The same usage data was used to investigate whether material usage statistics can predict learning outcomes or identify strong and at-risk students. The results indicate that based on just three to four weeks of data, it is possible to identify strong and at-risk students with some accuracy. Furthermore, it seems possible to identify student programming assignment scores and total course scores with a somewhat high accuracy. Models based on material usage statistics also displayed some light predictive power in predicting student exam scores. It was also shown that the predictive powers of these models are not based solely on student effort or time-on-task. All told, this thesis demonstrates that fine-grained online learning material usage data is feasible to collect and useful in understanding both the students and the learning material. The results suggest that very simple and almost entirely domain-independent data sources can be used to predict student performance to a relatively large degree, suggesting that a combination of such simple domain-independent metrics could match highly domain dependent and more complex metrics in predictive power, giving raise to more widely usable educational analytics tools.

KW - 113 Computer and information sciences

M3 - Master's thesis

PB - University of Helsinki

ER -