Fixes #1601 Add initial support for Unizin Synthetic data #1600

jxiao21 · 2024-07-31T15:12:29Z

Draft as there's a few things to fix up

Re-add metadata query to run, though it might not work with synthetic data. If it doesn't work with synthetic only run if DATASET_PROJECT_ID is not set.
Verify code works if DATASET_PROJECT_ID is not set if that's used for above. It might not work with the regular data unless we remove the connection_property.
Possibly rename or move DEFAULT_PROJECT_ID to env since GOOGLE_CLOUD_PROJECT is one that is expected to be defined in the environment
Test with service account

This is initial support

Future support ideas include

Adding on a per course basis whether to pull from synthetic data or actual data. This could allow courses to use this data as a preview. However there's more to this
To be able to preview synthetic data, we'll need to time-shift the events somehow so they fit more in the present. This might be looking at the term data for the synthetic course and adding in the difference in days to all other dates present in the data. This feels like it will be a semi-complex transformation but some views won't work well unless we do this.
Having a separate synthetic data increment value (or getting rid of this entirely and using short ids?). I think to be able to hold multiple institution data we might still need long ids unless we also change the data model to link up to the LTI deployment id.

config/env_sample.hjson

zqian · 2024-07-31T19:44:41Z

config/env_sample.hjson

+    # Change the default Bigquery Project ID
+    "DEFAULT_PROJECT_ID": "udp-umich-prod",
+    # Change the dataset project ID where queries are run against
+    "DATASET_PROJECT_ID": "unizin-shared"


I will leave this line commented by default, so developer needs to enable DATASET_PROJECT_ID in purpose, to connect to Unizin synthetic data

Yeah, I don't think it would be needed otherwise. Though there could be a case where a default project is used but the data sets are in another project. Like when/if we get true shared repositories.

Anyway, this is a temporary solution and I believe we'll need a future issue to set this value on a per course level in the admin.

dashboard/cron.py

zqian · 2024-07-31T19:46:47Z

dashboard/settings.py

+DEFAULT_PROJECT_ID = ENV.get("DEFAULT_PROJECT_ID", None)
+
+# Override the default project ID for BigQuery if needed, like to unizin-shared
+DATASET_PROJECT_ID = ENV.get("DATASET_PROJECT_ID", None)


Should the default value be DEFAULT_PROJECT_ID, instead of None?

Maybe, but then we might need a new variable to indicate whether or not this is running on synthetic data or not.

Maybe it's better to make that explicit. I'm not sure yet, this still has some more work on it.

zqian · 2024-07-31T19:47:46Z

config/env_sample.hjson

+    "CRON_QUERY_FILE": "config/cron_udp.hjson",
+
+    # Change the default Bigquery Project ID
+    "DEFAULT_PROJECT_ID": "udp-umich-prod",


I would use a placeholder value here, instead of "udp-umich-prod", like

"DEFAULT_PROJECT_ID": "<UDP_institution_id>",

I think this was just empty before. This should all work with the service account and the regular data if it's not set and I don't think it will work yet.

zqian · 2024-07-31T19:50:06Z

Maybe it is worthwhile to add a section into this loading_data.md, with the direction of using Unizin synthetic dataset.

jxiao21 requested review from jonespm, zqian and jaydonkrooss July 31, 2024 15:12

jxiao21 linked an issue Jul 31, 2024 that may be closed by this pull request

Add a maximum limit for how far BQ data is retrieved (event_time) #1574

Open

jxiao21 removed a link to an issue Jul 31, 2024

Add a maximum limit for how far BQ data is retrieved (event_time) #1574

Open

jonespm changed the title ~~Limit BQ Query~~ Fixes #1601 Add initial support for Unizin Synthetic data Jul 31, 2024

jonespm linked an issue Jul 31, 2024 that may be closed by this pull request

Add initial support for Unizin Synthetic data #1601

Open

zqian reviewed Jul 31, 2024

View reviewed changes

jxiao21 and others added 7 commits August 28, 2024 16:33

initial commit with current changes

a8621b8

env_sample update

a93b584

cron works

495f00a

Removing the hardcoded proejct ids and making it configurable.

5d9a157

Use the value in settings as the default project

7976fb9

added comment explaining why value 1000000000000 is used

64d5b7b

Added back in metadata, added try/except if it fails

8d229a6

jonespm force-pushed the bq_query branch from 2bc7975 to 8d229a6 Compare August 28, 2024 20:34

Merge branch 'tl-its-umich-edu:master' into bq_query

7a2bf56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #1601 Add initial support for Unizin Synthetic data #1600

Fixes #1601 Add initial support for Unizin Synthetic data #1600

jxiao21 commented Jul 31, 2024 •

edited by jonespm

Loading

zqian Jul 31, 2024

jonespm Jul 31, 2024

zqian Jul 31, 2024

jonespm Jul 31, 2024

zqian Jul 31, 2024

jonespm Jul 31, 2024

zqian commented Jul 31, 2024

Fixes #1601 Add initial support for Unizin Synthetic data #1600

Are you sure you want to change the base?

Fixes #1601 Add initial support for Unizin Synthetic data #1600

Conversation

jxiao21 commented Jul 31, 2024 • edited by jonespm Loading

zqian Jul 31, 2024

Choose a reason for hiding this comment

jonespm Jul 31, 2024

Choose a reason for hiding this comment

zqian Jul 31, 2024

Choose a reason for hiding this comment

jonespm Jul 31, 2024

Choose a reason for hiding this comment

zqian Jul 31, 2024

Choose a reason for hiding this comment

jonespm Jul 31, 2024

Choose a reason for hiding this comment

zqian commented Jul 31, 2024

jxiao21 commented Jul 31, 2024 •

edited by jonespm

Loading