This repo includes code for the paper:
- Weijie Jiang and Zachary A. Pardos, 2019. Time Slice Imputation for Personalized Goal-based Recommendation in Higher Education. In Proceedings of the Thirteenth ACM Conference on Recommender Systems (RecSys'19). ACM, pp 506-510,
which extended the RNN-based goal-based course recommendation algorithm in Section 7 of the following paper:
- Weijie Jiang, Zachary A. Pardos, Qiang Wei, 2019. Goal-based Course Recommendation. In Proceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK'19). ACM, Pages 36-45,
by enhancing the learning process the personalized prerequisite courses for any target course, and also applying the goal-based recommendation framework to the MOOCs context, as illustrated here:
Due to FERPA privacy protection, we cannot publish the original student enrollment dataset and the MOOCs dataset. However, we provide code to run the proposed method on a synthetic student enrollment dataset that we published in this repo with data descriptions.
- python3
- pytorch
- install other dependencies by: pip3 install -r requirements.txt
-- command:
- Download dataset from here to a local directory */synthetic_data_samples
- Set up global parameters in data_preprocess/utils.py
python data_preprocess/preprocess.py
This command hard codes the locations of the expected data files to be in the synthetic data folder. This path can be changed in utils.py.
Then the following intermediate files will be generated for model training:
- course dictionaries (course_id.pkl): a pair of python dictionaries mapping courses to their preprocessed ID and vice versa.
- grade dictionary (grade_id.pkl): a pair of python dictionaries mapping all types of grades to their preprocessed ID and vice versa.
- semester dictionary (semester_id.pkl): a pair of python dictionaries mapping semesters to their preprocessed ID vice versa. For example, the earliest semester in the dataset, 2014 Fall, will be set 0 as its ID.
- condensed student enrollments and grades (stu_sem_grade_condense.pkl): a 2D python list with dimention
n×m
, wheren
is the number of students andm
is the number of semesters covered in the dataset: , where , and denotes the preprocessed enrollment histories of the i-th student in your data (multiple semesters) and represents the specific enrollment histories of the i-th student in the k-th semester. Note that the k-th semester of all the students refers to the same semester, for example, m=3, which means there are 3 semesters covered in your data: Fall 2019, Spring 2020, Summer 2020, then will contain enrollment histories of Spring 2020 for all students in your data. The format of is a python list: , where refers to the course ID of the p-th course the i-th student enrolled and the grade ID received for that course in the k-th semester. (empty) if the i-th student did not enroll in any course in semester k.
We selected the RNN-based grade prediction model without taking student major as the input (Model 2) in paper Goal-based Course Recommendation as the base model to surface personalized prerequisite courses in this work because it demonstrated the best grade prediction performance.
-- command
cd grade_prediction
- Set up arguments and hyperparameters for training in grade_prediction/utils.py (optional)
- training:
python train.py
- The best model(.pkl) and the log file that records the training loss and validation loss will be saved in grade_prediction/models.
- Set up evaluated_model_path and evaluated_semester in grade_prediction/utils.py, which corresponds to the model and semester you aim to evaluate (optional).
- evaluation:
python evaluate.py
.-
Evaluation results will be printed out based on these metrics:
- grade prediction accuracy on enrollments with letter grades
- grade prediction accuracy on enrollments with non-letter grades
- overall grade prediction accuracy
- true positive rate, true negative rate, false negative rate, and false positive rate on letter grade prediction and non-letter grade prediction
- F-score on letter grade prediction and non-letter grade prediction, overall F-score
-
-- command
cd student_evaluation
- Set up arguments in goal_based_new/utils.py (optional)
- Generate filters:
python generate_filters.py
- Filter files (.pkl) will be saved in the current directory.
- Learn to generate goal-based recommendations:
python train.py --target_course xxx
, wherexxx
is the name of a course (e.g., Subject_33 101) that you intend to set as a goal(target) course.- This will print out (1) the number of well-performing students and under-performing students in this course in the evaluated semester, (2) the recommendation accuracy for the two groups of students.
- This will also save the enrollment histories and the recommended courses in the evaluated semester of these students to a csv file in goal_based_new/results.
Please do not hesitate to contact us (jiangwj[at]berkeley[dot]edu, pardos[at]berkeley[dot]edu) if you have any questions or encounter any problems in running the code. We appreciate your support and citation if you find this work useful.
@inproceedings{jiang2019time,
title={Time slice imputation for personalized goal-based recommendation in higher education},
author={Jiang, Weijie and Pardos, Zachary A},
booktitle={Proceedings of the 13th ACM Conference on Recommender Systems},
pages={506--510},
year={2019}
}