Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom notebooks are added to the image #3

Merged
merged 14 commits into from
May 11, 2021
Merged

Conversation

ChaamC
Copy link
Collaborator

@ChaamC ChaamC commented Mar 30, 2021

This PR adds the feature for the customization of included tutorial-notebooks.

  • The notebooks for the specific images are included when building the images. All notebooks are saved directly in this repo, in the folder related to their image.

This PR relates to the PR in pavics-jupyter-images : bird-house/pavics-jupyter-base#2

Copy link

@dbyrns dbyrns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but we need to take care of bird-house/pavics-jupyter-base#2 before merging this one.

Copy link
Collaborator

@tlvu tlvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised but this is CRIM image. I leave it to you guys.

eo/Dockerfile Outdated

# Add a copy of the eo notebooks to the docker image
# All tutorial-notebooks are set with jenkins as owner, so they can be manipulated by jenkins in the docker entrypoint
COPY --chown=jenkins notebooks/ /notebook_dir/tutorial-notebooks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised notebooks are copied directly into the image. The the other associated PR bird-house/pavics-jupyter-base#2 you just wrote "avoid having to rebuild the image just to update a notebook".

I thought I would see this child image leverage the new feature in the base image.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to what I said here : bird-house/pavics-jupyter-base#2 (comment), we looked for an easy way to add notebooks to the ones provide in the base image. We agree that having a custom image with static notebooks is the bare minimum, but it fits our needs right now. We do not expect a lot of modifications in a near future and until we build new images weekly, having another repository just for some notebooks, seems overkill. Adding a wget in this image is in the plan given that the notebooks start to be updated more frequently.

@ChaamC
Copy link
Collaborator Author

ChaamC commented Apr 19, 2021

PR has been updated for a new solution. Please refer to the other PR for more details.

@matprov @dbyrns @tlvu

@ChaamC ChaamC requested review from dbyrns, matprov and tlvu April 19, 2021 19:50
Copy link

@dbyrns dbyrns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some requests based on the main PR comments.

eo/Dockerfile Outdated
@@ -15,5 +15,7 @@ RUN umask 0000 && conda env update -f /environment.yml
# (https://github.com/conda-forge/gdal-feedstock/issues/83#issue-162911573)
ENV CPL_ZIP_ENCODING=UTF-8

COPY notebook_config.yml.template /notebook_config.yml.template
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok like I said, I can now really see that we use the template as is, so it's not a template. One cannot expect some variables to be replaced like in the birdhouse-deploy config folder.


- repo_url: https://github.com/crim-ca/pavics-jupyter-images
branch: master
source_path: eo/notebooks
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very neat. I hadn't thought to reuse the same repo both for the image and notebooks.

- repo_url: https://github.com/crim-ca/pavics-jupyter-images
branch: master
source_path: eo/notebooks
dest_dir: ${JUPYTERHUB_USER_DATA_DIR}/tutorial-notebooks/eo-crim
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You see here, what could prevent an image maintainer to cause trouble and set the dest_dir to ${JUPYTERHUB_USER_DATA_DIR}? (See : bird-house/birdhouse-deploy#150 (comment))

nlp/Dockerfile Outdated
@@ -9,5 +9,8 @@ COPY environment.yml /environment.yml
# use umask 0000 so that package files for the updated environment are usable by the user for the jupyter-conda-extension
RUN umask 0000 && conda env update -f /environment.yml

COPY notebook_config.yml.template /notebook_config.yml.template
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem

- repo_url: https://github.com/crim-ca/pavics-jupyter-images
branch: master
source_path: nlp/notebooks
dest_dir: ${JUPYTERHUB_USER_DATA_DIR}/tutorial-notebooks/nlp-crim
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem

@ChaamC
Copy link
Collaborator Author

ChaamC commented Apr 23, 2021

Updated repo : removed dest_dir parameter from the script config

@ChaamC ChaamC requested a review from dbyrns April 23, 2021 19:16
Copy link
Collaborator

@tlvu tlvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for me here, just minor comment.

@ChaamC
Copy link
Collaborator Author

ChaamC commented May 4, 2021

Updated PR :

  • added a dest_sub_dir parameter to specify a subdirectory destination where to download the data. Parameter is optional

@ChaamC ChaamC requested a review from tlvu May 4, 2021 17:13
Copy link

@dbyrns dbyrns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last little thing : I would like to see the common repository included here for both eo/nlp and a confirmation that both repos are keep synchronized for each images.

Copy link
Collaborator

@tlvu tlvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A remark about dest_sub_dir but I am not sure.

eo/CHANGELOG.md Outdated
@@ -12,6 +12,18 @@ Fixes:
------
- ...

0.2.0 (2021-03-30)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finalize the date I guess.

- repo_url: https://github.com/crim-ca/pavics-jupyter-images
branch: master
source_path: eo/notebooks
dest_sub_dir:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect dest_sub_dir: crim-eo? But I am not sure.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlvu see my comment here : bird-house/pavics-jupyter-base#3 (comment)
Like this, the notebooks would be checkout to : /data/jupyterhub_user_data/tutorial-notebooks/crim-eo
What I asked to @ChaamC would make it clearer to show how to achieve the following strucutre:
/data/jupyterhub_user_data/tutorial-notebooks/crim-eo/eo
/data/jupyterhub_user_data/tutorial-notebooks/crim-eo/common

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add the suggested structure.
But what would be the repo_url/branch/source_path for the common folder?
Should I try to find all the notebooks that are found in the tutorial-notebooks folder (when spawning the pavics-workflow image for example)?
Or maybe I can just add this folder for now?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that the current tutorial-notebooks folder was filled using only that folder. But looking at the diff, this is enough for our common folder. Maybe we could call it "core"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are more in the current tutorial-notebooks folder, so more sources are necessary if we want them all. But I don't think all those notebooks will work anyway. The eo image for example has only the base packages and the specific packages for eo. The packages necessary for the current tutorial-notebooks folder are not all on the eo image.

Even with the linked folder in my previous comment, I am not sure which notebooks will work or not from that list.

So, do I just take the linked folder for now, and store it in a 'core' folder, for both eo/nlp?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes go for it. At least for people using those images, they will see what PAVICS has to offer and the mecanism will be demonstrated. If some libraries are missing we will fix that later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI info if you guys want to replicate exactly what in the current tutorial-notebooks folder into your common folder https://github.com/bird-house/birdhouse-deploy/blob/20d440fd293b01d4cf3e1dccf2933877a651d571/birdhouse/deployment/trigger-deploy-notebook#L60-L71

However, keep in mind that bringing in notebook that your image can not run (missing packages) is no cool ! I'd suggest keeping only the notebooks relevant to your image runtime capabilities.

With that in mind, pavics-sdi is a good start (probably most notebooks will run without missing packages). Finch and Raven notebooks will requires more packages to be installed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, keep in mind that bringing in notebook that your image can not run (missing packages) is no cool !

I agree, but it's more as a place holder or demonstration capabilities right now. I think that once this setup is ready we will be able to better partitionize the notebooks for all our (Ouranos and CRIM) images.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common folder has been added to configs :)

source_dir: nlp/notebooks
dest_sub_dir: nlp

- repo_url: https://github.com/Ouranosinc/pavics-sdi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch double download of the pavics-sdi notebooks. At least the config is clearer.

We could have one config.yml that will handle all the notebooks repos (eo, nlp, common) but then each image won't be "independent" anymore.

Don't have a quick solution at the moment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlvu, at least it's one copy per image and not per user. Hopefully, the common notebooks folder will stay small.

We could have one config.yml that will handle all the notebooks repos (eo, nlp, common) but then each image won't be "independent" anymore.

That's the whole point of this PR! Having a single config.yml is about the same as what we have right now.

The download script could be optimized (not for this PR) and keep a cache of each repo independently of the current image. So that while looping over the images, if a repo has already been updated, it uses the cache.

But again I'm note sure of your concern, is it the download or the disk space? In both case it looks like trying to save pennies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opps sorry, sending wrong impression again. Wasn't saying this is a deal breaker, else I would change the approval status.

Probably the layout on disk could be

/data/jupyterhub_user_data/tutorial-notebooks/common
/data/jupyterhub_user_data/tutorial-notebooks/crim-eo
/data/jupyterhub_user_data/tutorial-notebooks/crim-nlp

Instead of

/data/jupyterhub_user_data/tutorial-notebooks/crim-eo/eo
/data/jupyterhub_user_data/tutorial-notebooks/crim-eo/common
/data/jupyterhub_user_data/tutorial-notebooks/crim-nlp/nlp
/data/jupyterhub_user_data/tutorial-notebooks/crim-nlp/common

But that also assume the same definition of common for those images, which might make sense only if on the same deployment node. Like you said, it's pennies at this point so let's do this once we get there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry I'm not offended and sorry if this is the impression I leave.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, I like all trade-offs/limitations to be clearly documented in a PR. Doesn't mean we need to address all of them immediately. I much prefer iterative approach where we improve gradually than keep adding feature to grow the PR and never release it!

@ChaamC ChaamC merged commit f865380 into master May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants