Skip to content
henrishi edited this page Aug 23, 2021 · 3 revisions

Research question

The rise of popular mobile education applications produced data where a large number of students each answers a small subset of questions from a large question bank. Traditional approaches from the education measurement literature face important limitations in this context where data is large but sparse.

We propose models based on latent factorization and Bayesian variational inference to address these challenges. Our models retrieve true parameters with greater fidelity than traditional models in simulations. They also scale well computationally to industrial-size datasets. Compared to traditional specifications, latent factorization models can make more accurate predictions on the hold-out test set in general. More latent factors and adding hierarchical dependence on question attributes contribute to better predictive performance in lower-frequency content areas. We conclude by describing a real-world application of our models in personalizing homework assignments. In a future study, we plan to run experiments with this application to quantify the impact of personalization.

Data

We apply our models to data generated on our partner company's platform. The data comes from homework assignments and exams in three subject areas--English, math, and Chinese. Records are logged at the students-question level.

In addition to homework data, our partner company also collects exam data. Different from homework data, exam data are tagged with question attributes. Attributes include the appropriate grade level of the question. They also include two types of domain knowledge tags. One system maps questions to competencies. The other maps questions to skills. These tags are manually labeled by content specialists.

Paper draft

Clone this wiki locally