Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom notebooks are added to the image #3

Merged
merged 14 commits into from
May 11, 2021
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,11 @@ These images were first initialized with the packages listed on this
The Docker image builds can be found on Docker Hub :
* [eo](https://hub.docker.com/repository/docker/pavics/crim-jupyter-eo)
* [nlp](https://hub.docker.com/repository/docker/pavics/crim-jupyter-nlp)

The notebooks associated to each specific image are found on this repo, on their corresponding notebook subfolder.

Also, a yaml configuration file can be found for each image, containing a list of parameters used
by the [deploy_data_specific_image script](https://github.com/bird-house/pavics-jupyter-base/blob/master/scheduler-jobs/deploy_data_specific_image)
on the [bird-house/pavics-jupyter-base repo](https://github.com/bird-house/pavics-jupyter-base).
This script is used to download and update the image's associated notebooks that should be available on
the JupyterLab environment for DACCS.
12 changes: 12 additions & 0 deletions eo/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,18 @@ Fixes:
------
- ...

0.2.0 (2021-05-05)
===================

Changes:
--------
- Custom notebooks specific to the environment can now be added to the docker image
- New packages added to environment (rasterio, intake-stac, sat-search)

Fixes:
------
- na

0.1.0 (2021-02-22)
===================

Expand Down
2 changes: 2 additions & 0 deletions eo/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,7 @@ RUN umask 0000 && conda env update -f /environment.yml
# (https://github.com/conda-forge/gdal-feedstock/issues/83#issue-162911573)
ENV CPL_ZIP_ENCODING=UTF-8

COPY notebook_config.yml /notebook_config.yml

# specify user because of problem running start-notebook.sh when being root
USER jenkins
3 changes: 3 additions & 0 deletions eo/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,10 @@ dependencies:
- affine
- gdal
- geojson
- intake-stac
- pyproj
- rasterio
- sat-search
- shapely

# TODO: These next packages could possibly be added to a more generic 'vision' image, from which 'eo' would be built
Expand Down
15 changes: 15 additions & 0 deletions eo/notebook_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Config file containing the list of notebooks directories to download for a specific image.
#
# Used in the deploy-data-specific-image script from pavics-jupyter-base.
# More details on this config can be found on this script file :
# https://github.com/bird-house/pavics-jupyter-base/blob/master/scheduler-jobs/deploy_data_specific_image

- repo_url: https://github.com/crim-ca/pavics-jupyter-images
branch: master
source_dir: eo/notebooks
dest_sub_dir: eo

- repo_url: https://github.com/Ouranosinc/pavics-sdi
branch: master
source_dir: docs/source/notebooks
dest_sub_dir: common
44 changes: 44 additions & 0 deletions eo/notebooks/eo_example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "framed-intranet",
"metadata": {},
"source": [
"## Example notebook\n",
"\n",
"This is an example notebook just for the image's first version. \n",
"It is used for now to test the integration of the tutorial notebooks in JupyterHub, depending of which environment is selected. Additional notebooks should be added in the future."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "imposed-serve",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
11 changes: 11 additions & 0 deletions nlp/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,17 @@ Fixes:
------
- ...

0.2.0 (2021-05-05)
===================

Changes:
--------
- Custom notebooks specific to the environment can now be added to the docker image

Fixes:
------
- na

0.1.0 (2021-02-22)
===================

Expand Down
3 changes: 3 additions & 0 deletions nlp/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,8 @@ COPY environment.yml /environment.yml
# use umask 0000 so that package files for the updated environment are usable by the user for the jupyter-conda-extension
RUN umask 0000 && conda env update -f /environment.yml

COPY notebook_config.yml /notebook_config.yml

# specify user because of problem running start-notebook.sh when being root
USER jenkins

15 changes: 15 additions & 0 deletions nlp/notebook_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Config file containing the list of notebooks directories to download for a specific image.
#
# Used in the deploy-data-specific-image script from pavics-jupyter-base.
# More details on this config can be found on this script file :
# https://github.com/bird-house/pavics-jupyter-base/blob/master/scheduler-jobs/deploy_data_specific_image

- repo_url: https://github.com/crim-ca/pavics-jupyter-images
branch: master
source_dir: nlp/notebooks
dest_sub_dir: nlp

- repo_url: https://github.com/Ouranosinc/pavics-sdi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch double download of the pavics-sdi notebooks. At least the config is clearer.

We could have one config.yml that will handle all the notebooks repos (eo, nlp, common) but then each image won't be "independent" anymore.

Don't have a quick solution at the moment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlvu, at least it's one copy per image and not per user. Hopefully, the common notebooks folder will stay small.

We could have one config.yml that will handle all the notebooks repos (eo, nlp, common) but then each image won't be "independent" anymore.

That's the whole point of this PR! Having a single config.yml is about the same as what we have right now.

The download script could be optimized (not for this PR) and keep a cache of each repo independently of the current image. So that while looping over the images, if a repo has already been updated, it uses the cache.

But again I'm note sure of your concern, is it the download or the disk space? In both case it looks like trying to save pennies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opps sorry, sending wrong impression again. Wasn't saying this is a deal breaker, else I would change the approval status.

Probably the layout on disk could be

/data/jupyterhub_user_data/tutorial-notebooks/common
/data/jupyterhub_user_data/tutorial-notebooks/crim-eo
/data/jupyterhub_user_data/tutorial-notebooks/crim-nlp

Instead of

/data/jupyterhub_user_data/tutorial-notebooks/crim-eo/eo
/data/jupyterhub_user_data/tutorial-notebooks/crim-eo/common
/data/jupyterhub_user_data/tutorial-notebooks/crim-nlp/nlp
/data/jupyterhub_user_data/tutorial-notebooks/crim-nlp/common

But that also assume the same definition of common for those images, which might make sense only if on the same deployment node. Like you said, it's pennies at this point so let's do this once we get there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry I'm not offended and sorry if this is the impression I leave.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, I like all trade-offs/limitations to be clearly documented in a PR. Doesn't mean we need to address all of them immediately. I much prefer iterative approach where we improve gradually than keep adding feature to grow the PR and never release it!

branch: master
source_dir: docs/source/notebooks
dest_sub_dir: common
44 changes: 44 additions & 0 deletions nlp/notebooks/nlp_example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "framed-intranet",
"metadata": {},
"source": [
"## Example notebook\n",
"\n",
"- This is an example notebook just for the image's first version. \n",
"It is used for now to test the integration of the tutorial notebooks in JupyterHub, depending of which environment is selected. Additional notebooks should be added in the future."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "imposed-serve",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}