-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
custom notebooks are added to the image #3
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but we need to take care of bird-house/pavics-jupyter-base#2 before merging this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised but this is CRIM image. I leave it to you guys.
eo/Dockerfile
Outdated
|
||
# Add a copy of the eo notebooks to the docker image | ||
# All tutorial-notebooks are set with jenkins as owner, so they can be manipulated by jenkins in the docker entrypoint | ||
COPY --chown=jenkins notebooks/ /notebook_dir/tutorial-notebooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised notebooks are copied directly into the image. The the other associated PR bird-house/pavics-jupyter-base#2 you just wrote "avoid having to rebuild the image just to update a notebook".
I thought I would see this child image leverage the new feature in the base image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to what I said here : bird-house/pavics-jupyter-base#2 (comment), we looked for an easy way to add notebooks to the ones provide in the base image. We agree that having a custom image with static notebooks is the bare minimum, but it fits our needs right now. We do not expect a lot of modifications in a near future and until we build new images weekly, having another repository just for some notebooks, seems overkill. Adding a wget in this image is in the plan given that the notebooks start to be updated more frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some requests based on the main PR comments.
eo/Dockerfile
Outdated
@@ -15,5 +15,7 @@ RUN umask 0000 && conda env update -f /environment.yml | |||
# (https://github.com/conda-forge/gdal-feedstock/issues/83#issue-162911573) | |||
ENV CPL_ZIP_ENCODING=UTF-8 | |||
|
|||
COPY notebook_config.yml.template /notebook_config.yml.template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok like I said, I can now really see that we use the template as is, so it's not a template. One cannot expect some variables to be replaced like in the birdhouse-deploy config folder.
eo/notebook_config.yml.template
Outdated
|
||
- repo_url: https://github.com/crim-ca/pavics-jupyter-images | ||
branch: master | ||
source_path: eo/notebooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very neat. I hadn't thought to reuse the same repo both for the image and notebooks.
eo/notebook_config.yml.template
Outdated
- repo_url: https://github.com/crim-ca/pavics-jupyter-images | ||
branch: master | ||
source_path: eo/notebooks | ||
dest_dir: ${JUPYTERHUB_USER_DATA_DIR}/tutorial-notebooks/eo-crim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You see here, what could prevent an image maintainer to cause trouble and set the dest_dir to ${JUPYTERHUB_USER_DATA_DIR}? (See : bird-house/birdhouse-deploy#150 (comment))
nlp/Dockerfile
Outdated
@@ -9,5 +9,8 @@ COPY environment.yml /environment.yml | |||
# use umask 0000 so that package files for the updated environment are usable by the user for the jupyter-conda-extension | |||
RUN umask 0000 && conda env update -f /environment.yml | |||
|
|||
COPY notebook_config.yml.template /notebook_config.yml.template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem
nlp/notebook_config.yml.template
Outdated
- repo_url: https://github.com/crim-ca/pavics-jupyter-images | ||
branch: master | ||
source_path: nlp/notebooks | ||
dest_dir: ${JUPYTERHUB_USER_DATA_DIR}/tutorial-notebooks/nlp-crim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem
Updated repo : removed dest_dir parameter from the script config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for me here, just minor comment.
Updated PR :
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last little thing : I would like to see the common repository included here for both eo/nlp and a confirmation that both repos are keep synchronized for each images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A remark about dest_sub_dir
but I am not sure.
eo/CHANGELOG.md
Outdated
@@ -12,6 +12,18 @@ Fixes: | |||
------ | |||
- ... | |||
|
|||
0.2.0 (2021-03-30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finalize the date I guess.
eo/notebook_config.yml
Outdated
- repo_url: https://github.com/crim-ca/pavics-jupyter-images | ||
branch: master | ||
source_path: eo/notebooks | ||
dest_sub_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect dest_sub_dir: crim-eo
? But I am not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tlvu see my comment here : bird-house/pavics-jupyter-base#3 (comment)
Like this, the notebooks would be checkout to : /data/jupyterhub_user_data/tutorial-notebooks/crim-eo
What I asked to @ChaamC would make it clearer to show how to achieve the following strucutre:
/data/jupyterhub_user_data/tutorial-notebooks/crim-eo/eo
/data/jupyterhub_user_data/tutorial-notebooks/crim-eo/common
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add the suggested structure.
But what would be the repo_url/branch/source_path for the common folder?
Should I try to find all the notebooks that are found in the tutorial-notebooks folder (when spawning the pavics-workflow image for example)?
Or maybe I can just add this folder for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that the current tutorial-notebooks folder was filled using only that folder. But looking at the diff, this is enough for our common folder. Maybe we could call it "core"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are more in the current tutorial-notebooks folder, so more sources are necessary if we want them all. But I don't think all those notebooks will work anyway. The eo image for example has only the base packages and the specific packages for eo. The packages necessary for the current tutorial-notebooks folder are not all on the eo image.
Even with the linked folder in my previous comment, I am not sure which notebooks will work or not from that list.
So, do I just take the linked folder for now, and store it in a 'core' folder, for both eo/nlp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes go for it. At least for people using those images, they will see what PAVICS has to offer and the mecanism will be demonstrated. If some libraries are missing we will fix that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI info if you guys want to replicate exactly what in the current tutorial-notebooks
folder into your common
folder https://github.com/bird-house/birdhouse-deploy/blob/20d440fd293b01d4cf3e1dccf2933877a651d571/birdhouse/deployment/trigger-deploy-notebook#L60-L71
However, keep in mind that bringing in notebook that your image can not run (missing packages) is no cool ! I'd suggest keeping only the notebooks relevant to your image runtime capabilities.
With that in mind, pavics-sdi is a good start (probably most notebooks will run without missing packages). Finch and Raven notebooks will requires more packages to be installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, keep in mind that bringing in notebook that your image can not run (missing packages) is no cool !
I agree, but it's more as a place holder or demonstration capabilities right now. I think that once this setup is ready we will be able to better partitionize the notebooks for all our (Ouranos and CRIM) images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common folder has been added to configs :)
source_dir: nlp/notebooks | ||
dest_sub_dir: nlp | ||
|
||
- repo_url: https://github.com/Ouranosinc/pavics-sdi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch double download of the pavics-sdi notebooks. At least the config is clearer.
We could have one config.yml
that will handle all the notebooks repos (eo, nlp, common) but then each image won't be "independent" anymore.
Don't have a quick solution at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tlvu, at least it's one copy per image and not per user. Hopefully, the common notebooks folder will stay small.
We could have one config.yml that will handle all the notebooks repos (eo, nlp, common) but then each image won't be "independent" anymore.
That's the whole point of this PR! Having a single config.yml is about the same as what we have right now.
The download script could be optimized (not for this PR) and keep a cache of each repo independently of the current image. So that while looping over the images, if a repo has already been updated, it uses the cache.
But again I'm note sure of your concern, is it the download or the disk space? In both case it looks like trying to save pennies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opps sorry, sending wrong impression again. Wasn't saying this is a deal breaker, else I would change the approval status.
Probably the layout on disk could be
/data/jupyterhub_user_data/tutorial-notebooks/common
/data/jupyterhub_user_data/tutorial-notebooks/crim-eo
/data/jupyterhub_user_data/tutorial-notebooks/crim-nlp
Instead of
/data/jupyterhub_user_data/tutorial-notebooks/crim-eo/eo
/data/jupyterhub_user_data/tutorial-notebooks/crim-eo/common
/data/jupyterhub_user_data/tutorial-notebooks/crim-nlp/nlp
/data/jupyterhub_user_data/tutorial-notebooks/crim-nlp/common
But that also assume the same definition of common
for those images, which might make sense only if on the same deployment node. Like you said, it's pennies at this point so let's do this once we get there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry I'm not offended and sorry if this is the impression I leave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, I like all trade-offs/limitations to be clearly documented in a PR. Doesn't mean we need to address all of them immediately. I much prefer iterative approach where we improve gradually than keep adding feature to grow the PR and never release it!
This PR adds the feature for the customization of included tutorial-notebooks.
This PR relates to the PR in pavics-jupyter-images : bird-house/pavics-jupyter-base#2