-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time Series QA: Make Dask notebook self-contained and testable #352
Conversation
dba15fe
to
2b593ad
Compare
f893a59
to
7057558
Compare
7057558
to
0438785
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We worked in parallel on a similar approach :) Good to have the kaggle import in CrateDB Toolkit now!
A few suggestions:
- I think we should explain how to obtain the kaggle.json file (API key) and where to place it.
- The schema should remain the default "doc", you switched to "notebook" - can lead to confusion, as we are working in doc in other notebooks as well.
Great work, thank you!
"CREATE TABLE IF NOT EXISTS \"doc\".\"weather_data\" (\n", | ||
"CREATE TABLE IF NOT EXISTS \"notebook\".\"weather_data\" (\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema should remain the default "doc", you switched to "notebook" - can lead to confusion, as we are working in doc in other notebooks as well.
Maybe let's remove this explicit reference about the schema, so that software tests can use a different schema. The other notebooks also don't include it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erledigt.
9ae49a8
to
cd09099
Compare
665d389
to
fb04467
Compare
Include data acquisition from Kaggle.
fb04467
to
3445b75
Compare
# Skip Kaggle tests when having no authentication information. | ||
kaggle_auth_exists = Path("~/.kaggle/kaggle.json").exists() or ( | ||
"KAGGLE_USERNAME" in os.environ and "KAGGLE_KEY" in os.environ | ||
) | ||
if not kaggle_auth_exists: | ||
raise pytest.skip(f"Kaggle dataset can not be tested " | ||
f"without authentication: {notebook.name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should explain how to obtain the kaggle.json file (API key) and where to place it.
Right, this information should also go to the docs / the notebook itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erledigt. See #352 (comment).
"CREATE TABLE IF NOT EXISTS \"doc\".\"weather_data\" (\n", | ||
"CREATE TABLE IF NOT EXISTS \"notebook\".\"weather_data\" (\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erledigt.
"The subsequent code cell acquires the dataset directly from kaggle.com.\n", | ||
"To properly configure the notebook to use corresponding credentials\n", | ||
"after signing up on Kaggle, define the `KAGGLE_USERNAME` and\n", | ||
"`KAGGLE_KEY` environment variables. Alternatively, put them into the\n", | ||
"file `~/.kaggle/kaggle.json` in your home folder, like this:\n", | ||
"```json\n", | ||
"{\n", | ||
" \"username\": \"acme\",\n", | ||
" \"key\": \"2b1dac2af55caaf1f34df76236fada4a\"\n", | ||
"}\n", | ||
"```\n", | ||
"Another variant is to acquire the dataset files manually, and extract\n", | ||
"them into a folder called `DOWNLOAD`. In this case, you can deactivate\n", | ||
"those two lines of code, in order to skip automatic dataset acquisition." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the relevant documentation paragraph, now within the notebook itself, which educates readers about Kaggle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you very much!
What's Inside
Make it more convenient to go data shopping using the CrateDB/Dask notebook.
Remarks
There are concerns about downloading and processing 200 MB worth of data again and again, so this patch may need a resource saver.
References