Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exporter DSL #402

Merged
merged 1 commit into from
Sep 4, 2018
Merged

Add exporter DSL #402

merged 1 commit into from
Sep 4, 2018

Conversation

HassanJaveed84
Copy link
Contributor

TODO:

  • DRY.

Copy link

@azafty468 azafty468 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this pending whatever additional changes we need to get the job to run during testing.

Copy link
Contributor

@brianhw brianhw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few naming nits.

}

publishers {
textFinder("\\[WARNING\\]", '', true, false, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this configuration is self-documenting, but here I've no idea what the true, false, true refer to. Would it be reasonable to indicate in a comment what this is doing? (I assume it's something like mark the build unstable if it finds a warning.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment.

--external-prefix=databases/${DATE:-$TODAY} \
--django-admin=${PLATFORM_VENV}/bin/django-admin.py \
--django-pythonpath=${PLATFORM_VENV}/edx-platform \
--gpg-keys=${GPG_KEYS_PATH} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I wonder if this will be actually needed. One of the ways that the course exporter differs from the regular exporter is that the output is not encrypted. That is useful for RDX, where we need to actually read and transform the dumps. And useful for general testing of exporter changes, for example. (Like the addition we're supposed to do for cohorts, IIRC.)

Oh, wait. My mistake. This is actually called by the exporter-slave job, not by the course exporter. I was misled by the name of this file -- this script is actually exporting an org, not a list of courses. The output does produce files at the level of individual courses, but the presence of job that takes courses as input made this confusing to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed.

# Install requirements into this (exporter) virtual environment
pushd analytics-exporter/
pip install -r github_requirements.txt
pip install --allow-external mysql-connector-python -e .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think you and Andrew already noted, the --allow-external option is no longer used. We pulled it from pipeline makefile in Jan. 2017: edx/edx-analytics-pipeline@fcebbb7#diff-b67911656ef5d18c4ae36cb6741b7965.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

steps {
shell(readFileFromWorkspace("dataeng/resources/course-exporter-worker.sh"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should analytics-exporter-slave call org-exporter-worker.sh, instead of course-exporter-worker.sh, or something like that? The name course-exporter-worker.sh seems a better fit to be called by analytics-exporter-course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the name of script.

virtualenv {
nature("shell")
command(
readFileFromWorkspace("dataeng/resources/setup-course-exporter.sh")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this sets up the course-exporter, it also runs it. I would suggest calling it run-course-exporter, or something like that.

DAYOFWEEK=$(date +%u ${DATE_MODIFIER})
if [ $DAYOFWEEK != 7 ]; then
exit 1
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this about? The job can only use data that was generated on Sundays? I hope that job never breaks....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we have it, the CWSM dumps are only generated on sundays ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote that because the job will run but fail to upload to S3 unless it's run on a date that's a sunday.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CWSM dumps are scheduled on Sundays, but if they fail on Sunday and are run on a Monday, then this prevents us from being able to change the date modifier to some other date. I think that some new users (maybe Emilio?) when first running this specified today's date instead of the Sunday date, and I think Andrew put that in originally to deal with that. I'm fine with it present or absent, until it becomes a problem one way or the other.

@HassanJaveed84 HassanJaveed84 force-pushed the hassan/analytics-exporter branch 2 times, most recently from 5bca0f2 to 4b606cd Compare August 15, 2018 17:49
@HassanJaveed84 HassanJaveed84 changed the title WIP: exporter DSL Add exporter DSL Aug 15, 2018
@HassanJaveed84
Copy link
Contributor Author

This is ready for a final review.

@azafty468
Copy link

This looks good but I want to make sure the job history section is fixed before we merge this code into production (unless that investigation takes too long).

Copy link

@azafty468 azafty468 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've fixed the job config history plugin via the ansible code. I'm 👍 on this PR.

Copy link
Contributor

@brianhw brianhw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

@@ -0,0 +1,30 @@
import static analytics.AnalyticsEmailOptin.job as AnalyticsEmailOptinJob
import static analytics.AnalyticsExporter.job as AnalyticsExporterJob
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why does github highlight the characters on either side of " as " on these two lines? If they were just spaces I wouldn't expect a red square.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's going on, files created using vim and github show same behavior. I found this: textmate/groovy.tmbundle#17 which made me think it was textmate related(I use textmate). But doesn't look like that it is. olivex internal dsl repo show the same red blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it has something to do with asciidoc ?

DAYOFWEEK=$(date +%u ${DATE_MODIFIER})
if [ $DAYOFWEEK != 7 ]; then
exit 1
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CWSM dumps are scheduled on Sundays, but if they fail on Sunday and are run on a Monday, then this prevents us from being able to change the date modifier to some other date. I think that some new users (maybe Emilio?) when first running this specified today's date instead of the Sunday date, and I think Andrew put that in originally to deal with that. I'm fine with it present or absent, until it becomes a problem one way or the other.

@@ -0,0 +1 @@
import static analytics.AnalyticsEmailOptin.job as TestJob
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's going on. I created this file using vim and the same red blocks appear.

@@ -0,0 +1 @@
import static analytics.AnalyticsEmailOptin.job as SomeJob
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was created using github itself.

@HassanJaveed84 HassanJaveed84 force-pushed the hassan/analytics-exporter branch from 7a6c14d to ae2eb1f Compare September 4, 2018 10:06
@HassanJaveed84 HassanJaveed84 merged commit 162c02e into master Sep 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants