Is it possible to have template notebooks for reusable logic? #25

pacospace · 2020-10-23T10:07:08Z

MichaelClifford · 2020-10-23T13:21:11Z

@pacospace 👍 What types of reusable logic should we include in this notebook(s)?

Interacting with Ceph
Connecting to Spark
Connecting to Prometheus/ Thanos
Using GPU's on JupyterHub
Plotting style and best practices
Managing environments on JupyterHub
?

Let's make a comprehensive list here, and we can start to add what we want to the template.

We did start this repo with example notebooks awhile back, but it hasn't seen much use, probably better to shift to include these "example" notebooks into the template as you've suggested.

pacospace · 2020-10-26T08:38:34Z

@pacospace What types of reusable logic should we include in this notebook(s)?

Interacting with Ceph

Connecting to Spark

Connecting to Prometheus/ Thanos

Using GPU's on JupyterHub

Plotting style and best practices

Managing environments on JupyterHub

?

Let's make a comprehensive list here, and we can start to add what we want to the template.

We did start this repo with example notebooks awhile back, but it hasn't seen much use, probably better to shift to include these "example" notebooks into the template as you've suggested.

what about the naming convention for notebooks? {MLstep}-{distributed-or-not}-{Hardware}-{version}. One of the concerns I have is dependencies because it might become quite a large software stack. As in an ML project, different steps will have different requirements and you might want to keep them separate because maybe some step will require different hardware or technology to run on. WDYT?

durandom · 2020-10-26T15:12:34Z

naming conventions or annotations are great. can we get those template notebooks to be published on https://github.com/operate-first/operate-first.github.io please? Let's start with those that we have in the template repo.

And may I suggest to create new issues in the template repo for missing templates?

MichaelClifford · 2020-10-26T15:41:03Z

what about the naming convention for notebooks? {MLstep}-{distributed-or-not}-{Hardware}-{version}.

@pacospace I think defined naming conventions are great. But can this be enforced by github in anyway or would it just exists through our own example notebooks using this convention?

Also what do you mean by the {Hardware} label, like GPU or CPU? And would {version} be the current version of the notebook or like the version of some CUDA driver the notebook needs?

One of the concerns I have is dependencies because it might become quite a large software stack. As in an ML project, different steps will have different requirements and you might want to keep them separate because maybe some step will require different hardware or technology to run on.

Can you clarify this point above? Are you suggesting we keep the notebooks separate? If so, in what way? Do you mean in separate repos? or in separate directories with different pip files?

pacospace · 2020-10-26T16:08:01Z

what about the naming convention for notebooks? {MLstep}-{distributed-or-not}-{Hardware}-{version}.

@pacospace I think defined naming conventions are great. But can this be enforced by github in anyway or would it just exists through our own example notebooks using this convention?

My thoughts were related to the different images that would be created. Imagine different steps in AI pipeline, they would require different images to be created, therefore the idea could be to have inside notebooks repo, different context directory equivalent to ML context (EDA, etc.. as it is already in https://github.com/aicoe-aiops/data-science-workflow-examples/tree/master/notebooks, we can find more classes and subclasses to store notebooks), as each of the notebook would create basically an image that you can use in one of your step in your AI pipeline (thinking about Elyra on ODH). Each context directory would have different Pipfile, Pipfile.lock so that small services are created basically. We cannot have a single Pipfile and Pipfile.lock that could cover all notebooks requirements, also it is not good to have an image with million dependencies. @goern @harshad16 tagged as I think that is what is done for the different base notebooks for ODH: https://github.com/thoth-station/jupyter-notebooks

Moreover, it would be possible to use nbrequirements extension (https://github.com/thoth-station/jupyter-nbrequirements) for each notebook to manage requirements and store the Pipfile/Pipfile.lock in the context directory where the user is using the notebook.

Also what do you mean by the {Hardware} label, like GPU or CPU? And would {version} be the current version of the notebook or like the version of some CUDA driver the notebook needs?

A version of the notebook actually not required if we create tags out of each context directory, so a different file for the version would be contained in the context directory. For hardware, I mean CPU and GPU actually if there is something different to be stated to use specific hardware. All CUDA requirements would be handled by Thoth logic and you can state that in .thoth.yaml file actually.

One of the concerns I have is dependencies because it might become quite a large software stack. As in an ML project, different steps will have different requirements and you might want to keep them separate because maybe some step will require different hardware or technology to run on.

Can you clarify this point above? Are you suggesting we keep the notebooks separate? If so, in what way? Do you mean in separate repos? or in separate directories with different pip files?

We can use the context directory that can be handled by s2i builds, for example in https://github.com/thoth-station/jupyter-notebooks. (this would be good with Elyra also selecting an image for your step in AI pipeline)

@sophwats @vpavlin @nakfour @goern @harshad16

pacospace added the enhancement label Oct 23, 2020

pacospace changed the title ~~Is it possible to have template notebooks for reusable logic~~ Is it possible to have template notebooks for reusable logic? Oct 23, 2020

MichaelClifford closed this as completed Oct 26, 2020

MichaelClifford reopened this Oct 26, 2020

This was referenced Nov 3, 2020

Include notebooks templates into JupyterLab/Elyra on ODH opendatahub-io/s2i-lab-elyra#11

Closed

Add an ability to manage lock files in nested directories thoth-station/thamos#464

Closed

sesheta added kind/feature Categorizes issue or PR as related to a new feature. and removed enhancement labels Feb 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to have template notebooks for reusable logic? #25

Is it possible to have template notebooks for reusable logic? #25

pacospace commented Oct 23, 2020

MichaelClifford commented Oct 23, 2020

pacospace commented Oct 26, 2020 •

edited

Loading

durandom commented Oct 26, 2020

MichaelClifford commented Oct 26, 2020 •

edited

Loading

pacospace commented Oct 26, 2020 •

edited

Loading

Is it possible to have template notebooks for reusable logic? #25

Is it possible to have template notebooks for reusable logic? #25

Comments

pacospace commented Oct 23, 2020

MichaelClifford commented Oct 23, 2020

pacospace commented Oct 26, 2020 • edited Loading

durandom commented Oct 26, 2020

MichaelClifford commented Oct 26, 2020 • edited Loading

pacospace commented Oct 26, 2020 • edited Loading

pacospace commented Oct 26, 2020 •

edited

Loading

MichaelClifford commented Oct 26, 2020 •

edited

Loading

pacospace commented Oct 26, 2020 •

edited

Loading