Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add User Lifetime Statistics #309

Merged
merged 7 commits into from
Apr 14, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion backend/Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,4 @@ typing-extensions = "*"
drf-excel = "*"

[requires]
python_version = "3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure why the Pipfile and Pipfile.lock are changed. In general, if there are no dependency changes, we shouldn't touch these cause they're read directly from our production machines to install. Would suggest following this article to remove these changes from the PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to fix this with the stackoverflow article but it kind of does not look like it did anything per the last commit- I think I could use some help with this at GBM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

python_version = "3"
2,352 changes: 1,227 additions & 1,125 deletions backend/Pipfile.lock

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions backend/ohq/admin.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
Question,
Queue,
QueueStatistic,
UserStatistic,
Semester,
Tag,
)
Expand All @@ -26,3 +27,4 @@
admin.site.register(QueueStatistic)
admin.site.register(Announcement)
admin.site.register(Tag)
admin.site.register(UserStatistic)
65 changes: 65 additions & 0 deletions backend/ohq/management/commands/user_stat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
from django.core.management.base import BaseCommand
from django.utils import timezone
from datetime import timedelta

from ohq.models import Profile, Course, Question
from ohq.statistics import (
user_calculate_questions_asked,
user_calculate_questions_answered,
user_calculate_time_helped,
user_calculate_time_helping,
user_calculate_students_helped,
)
from django.db.models import Q




class Command(BaseCommand):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk about this file on Friday!

def add_arguments(self, parser):
parser.add_argument("--hist", action="store_true", help="Calculate all historic statistics")

def calculate_statistics(self, profiles, courses, earliest_date):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earliest_date parameter here is used to compute a date field, which I don't think is quite necessary. I think we can directly just use the earliest_date field wherever date is used.

yesterday = timezone.datetime.today().date() - timezone.timedelta(days=1)
active_users_today = set()

for course in courses:
if not earliest_date:
course_questions = Question.objects.filter(queue__course=course)
earliest_date = (
timezone.template_localtime(
course_questions.earliest("time_asked").time_asked
).date()
if course_questions
else yesterday
)

questions_queryset = Question.objects.filter(queue__course=course, time_asked__gte=earliest_date)
users_union = (
Profile.objects.filter(
Q(id__in=questions_queryset.values_list("asked_by", flat=True)) |
Q(id__in=questions_queryset.values_list("responded_to_by", flat=True))
)
)

for user in users_union:
user_calculate_questions_asked(user.user)
user_calculate_questions_answered(user.user)
user_calculate_time_helped(user.user)
user_calculate_time_helping(user.user)
user_calculate_students_helped(user.user)


def handle(self, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks a lot cleaner, good work!

if kwargs["hist"]:
courses = Course.objects.all()
profiles = Profile.objects.all()
earliest_date = None
else:
courses = Course.objects.filter(archived=False)
profiles = Profile.objects.all()
earliest_date = timezone.now().date() - timedelta(days=1)

self.calculate_statistics(profiles, courses, earliest_date)


29 changes: 29 additions & 0 deletions backend/ohq/migrations/0020_auto_20240326_0226.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Generated by Django 3.1.7 on 2024-03-26 02:26

from django.conf import settings
from django.db import migrations, models
import django.db.models.deletion


class Migration(migrations.Migration):

dependencies = [
migrations.swappable_dependency(settings.AUTH_USER_MODEL),
('ohq', '0019_auto_20211114_1800'),
]

operations = [
migrations.CreateModel(
name='UserStatistic',
fields=[
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('metric', models.CharField(choices=[('TOTAL_QUESTIONS_ASKED', 'Total questions asked'), ('TOTAL_QUESTIONS_ANSWERED', 'Total questions answered'), ('TOTAL_TIME_BEING_HELPED', 'Total time being helped'), ('TOTAL_TIME_HELPING', 'Total time helping'), ('TOTAL_STUDENTS_HELPED', 'Total students helped')], max_length=256)),
('value', models.DecimalField(decimal_places=8, max_digits=16)),
('user', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to=settings.AUTH_USER_MODEL)),
],
),
migrations.AddConstraint(
model_name='userstatistic',
constraint=models.UniqueConstraint(fields=('user', 'metric'), name='unique_user_statistic'),
),
]
35 changes: 35 additions & 0 deletions backend/ohq/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,3 +411,38 @@ class Announcement(models.Model):
author = models.ForeignKey(User, related_name="announcements", on_delete=models.CASCADE)
time_updated = models.DateTimeField(auto_now=True)
course = models.ForeignKey(Course, related_name="announcements", on_delete=models.CASCADE)


class UserStatistic(models.Model):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

"""
Statistics related to a user (student or TA) across many courses
"""

METRIC_TOTAL_QUESTIONS_ASKED = "TOTAL_QUESTIONS_ASKED"
METRIC_TOTAL_QUESTIONS_ANSWERED = "TOTAL_QUESTIONS_ANSWERED"
METRIC_TOTAL_TIME_BEING_HELPED = "TOTAL_TIME_BEING_HELPED"
METRIC_TOTAL_TIME_HELPING = "TOTAL_TIME_HELPING"
METRIC_TOTAL_STUDENTS_HELPED = "TOTAL_STUDENTS_HELPED"

METRIC_CHOICES = [
(METRIC_TOTAL_QUESTIONS_ASKED, "Total questions asked"),
(METRIC_TOTAL_QUESTIONS_ANSWERED, "Total questions answered"),
(METRIC_TOTAL_TIME_BEING_HELPED, "Total time being helped"),
(METRIC_TOTAL_TIME_HELPING, "Total time helping"),
(METRIC_TOTAL_STUDENTS_HELPED, "Total students helped"),
]

user = models.ForeignKey(User, on_delete=models.CASCADE)
metric = models.CharField(max_length=256, choices=METRIC_CHOICES)
value = models.DecimalField(max_digits=16, decimal_places=8)

class Meta:
constraints = [
models.UniqueConstraint(fields=["user", "metric"], name="unique_user_statistic")
]

def __str__(self):
return f"{self.user}: {self.metric}"



84 changes: 83 additions & 1 deletion backend/ohq/statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,10 @@
from django.db.models import Avg, Case, Count, F, Sum, When
from django.db.models.functions import TruncDate
from django.utils import timezone
from decimal import Decimal, ROUND_HALF_UP
import math

from ohq.models import CourseStatistic, Question, QueueStatistic
from ohq.models import CourseStatistic, Question, QueueStatistic, UserStatistic


User = get_user_model()
Expand Down Expand Up @@ -232,3 +234,83 @@ def queue_calculate_questions_per_ta_heatmap(queue, weekday, hour):
hour=hour,
defaults={"value": statistic if statistic else 0},
)


def user_calculate_questions_asked(user):
num_questions = Question.objects.filter(asked_by=user).count()

if num_questions and not num_questions == 0:
Copy link
Member

@judtinzhang judtinzhang Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can just write if num_questions:. If num_questions is ever 0, the if statement would fail (0 is considered False)

UserStatistic.objects.update_or_create(
user=user,
metric=UserStatistic.METRIC_TOTAL_QUESTIONS_ASKED,
defaults={"value": num_questions},
)


def user_calculate_questions_answered(user):
num_questions = (Question.objects.filter(
responded_to_by=user,
status=Question.STATUS_ANSWERED
).count())

if num_questions and not num_questions == 0:
UserStatistic.objects.update_or_create(
user=user,
metric=UserStatistic.METRIC_TOTAL_QUESTIONS_ANSWERED,
defaults={"value": num_questions},
)



def user_calculate_time_helped(user):
user_time_helped = (
Question.objects.filter(
asked_by=user,
status=Question.STATUS_ANSWERED
).aggregate(time_helped=Sum(F("time_responded_to") - F("time_response_started")))
)
time = user_time_helped["time_helped"]

if time and not math.isclose(time.total_seconds(), 0, abs_tol=0.001):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is there a reason you use time.total_seconds() here, but a few lines down you use time.seconds? Is there a difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol nope I don't think there's a difference I was just copying off of the documentation- I'll change it to .seconds for consistency

UserStatistic.objects.update_or_create(
user=user,
metric=UserStatistic.METRIC_TOTAL_TIME_BEING_HELPED,
defaults={"value": time.seconds},
)


def user_calculate_time_helping(user):
user_time_helping = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for comments regarding user_calculate_time_helped

Question.objects.filter(
responded_to_by=user,
status=Question.STATUS_ANSWERED
).aggregate(time_answering=Sum(F("time_responded_to") - F("time_response_started")))
)
time = user_time_helping["time_answering"]

if time and not math.isclose(time.total_seconds(), 0, abs_tol=0.001):
UserStatistic.objects.update_or_create(
user=user,
metric=UserStatistic.METRIC_TOTAL_TIME_HELPING,
defaults={"value": time.seconds},
)


def user_calculate_students_helped(user):
num_students = Decimal(
Question.objects.filter(
status=Question.STATUS_ANSWERED,
responded_to_by=user
)
.distinct("asked_by")
.count()
)
if num_students and not num_students == 0:
UserStatistic.objects.update_or_create(
user=user,
metric=UserStatistic.METRIC_TOTAL_STUDENTS_HELPED,
defaults={"value": num_students},
)



Loading
Loading