Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to have template notebooks for reusable logic? #25

Open
pacospace opened this issue Oct 23, 2020 · 5 comments
Open

Is it possible to have template notebooks for reusable logic? #25

pacospace opened this issue Oct 23, 2020 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@pacospace
Copy link
Member

Related-To: AICoE/idh-manifests#9

@pacospace pacospace changed the title Is it possible to have template notebooks for reusable logic Is it possible to have template notebooks for reusable logic? Oct 23, 2020
@MichaelClifford
Copy link
Member

@pacospace 👍 What types of reusable logic should we include in this notebook(s)?

  • Interacting with Ceph
  • Connecting to Spark
  • Connecting to Prometheus/ Thanos
  • Using GPU's on JupyterHub
  • Plotting style and best practices
  • Managing environments on JupyterHub
  • ?

Let's make a comprehensive list here, and we can start to add what we want to the template.

We did start this repo with example notebooks awhile back, but it hasn't seen much use, probably better to shift to include these "example" notebooks into the template as you've suggested.

@pacospace
Copy link
Member Author

pacospace commented Oct 26, 2020

@pacospace What types of reusable logic should we include in this notebook(s)?

  • Interacting with Ceph
  • Connecting to Spark
  • Connecting to Prometheus/ Thanos
  • Using GPU's on JupyterHub
  • Plotting style and best practices
  • Managing environments on JupyterHub
  • ?

Let's make a comprehensive list here, and we can start to add what we want to the template.

We did start this repo with example notebooks awhile back, but it hasn't seen much use, probably better to shift to include these "example" notebooks into the template as you've suggested.

what about the naming convention for notebooks? {MLstep}-{distributed-or-not}-{Hardware}-{version}. One of the concerns I have is dependencies because it might become quite a large software stack. As in an ML project, different steps will have different requirements and you might want to keep them separate because maybe some step will require different hardware or technology to run on. WDYT?

@durandom
Copy link
Member

naming conventions or annotations are great. can we get those template notebooks to be published on https://github.com/operate-first/operate-first.github.io please? Let's start with those that we have in the template repo.

And may I suggest to create new issues in the template repo for missing templates?

@MichaelClifford
Copy link
Member

MichaelClifford commented Oct 26, 2020

what about the naming convention for notebooks? {MLstep}-{distributed-or-not}-{Hardware}-{version}.

@pacospace I think defined naming conventions are great. But can this be enforced by github in anyway or would it just exists through our own example notebooks using this convention?

Also what do you mean by the {Hardware} label, like GPU or CPU? And would {version} be the current version of the notebook or like the version of some CUDA driver the notebook needs?

One of the concerns I have is dependencies because it might become quite a large software stack. As in an ML project, different steps will have different requirements and you might want to keep them separate because maybe some step will require different hardware or technology to run on.

Can you clarify this point above? Are you suggesting we keep the notebooks separate? If so, in what way? Do you mean in separate repos? or in separate directories with different pip files?

@pacospace
Copy link
Member Author

pacospace commented Oct 26, 2020

what about the naming convention for notebooks? {MLstep}-{distributed-or-not}-{Hardware}-{version}.

@pacospace I think defined naming conventions are great. But can this be enforced by github in anyway or would it just exists through our own example notebooks using this convention?

My thoughts were related to the different images that would be created. Imagine different steps in AI pipeline, they would require different images to be created, therefore the idea could be to have inside notebooks repo, different context directory equivalent to ML context (EDA, etc.. as it is already in https://github.com/aicoe-aiops/data-science-workflow-examples/tree/master/notebooks, we can find more classes and subclasses to store notebooks), as each of the notebook would create basically an image that you can use in one of your step in your AI pipeline (thinking about Elyra on ODH). Each context directory would have different Pipfile, Pipfile.lock so that small services are created basically. We cannot have a single Pipfile and Pipfile.lock that could cover all notebooks requirements, also it is not good to have an image with million dependencies. @goern @harshad16 tagged as I think that is what is done for the different base notebooks for ODH: https://github.com/thoth-station/jupyter-notebooks

Moreover, it would be possible to use nbrequirements extension (https://github.com/thoth-station/jupyter-nbrequirements) for each notebook to manage requirements and store the Pipfile/Pipfile.lock in the context directory where the user is using the notebook.

Also what do you mean by the {Hardware} label, like GPU or CPU? And would {version} be the current version of the notebook or like the version of some CUDA driver the notebook needs?

A version of the notebook actually not required if we create tags out of each context directory, so a different file for the version would be contained in the context directory. For hardware, I mean CPU and GPU actually if there is something different to be stated to use specific hardware. All CUDA requirements would be handled by Thoth logic and you can state that in .thoth.yaml file actually.

One of the concerns I have is dependencies because it might become quite a large software stack. As in an ML project, different steps will have different requirements and you might want to keep them separate because maybe some step will require different hardware or technology to run on.

Can you clarify this point above? Are you suggesting we keep the notebooks separate? If so, in what way? Do you mean in separate repos? or in separate directories with different pip files?

We can use the context directory that can be handled by s2i builds, for example in https://github.com/thoth-station/jupyter-notebooks. (this would be good with Elyra also selecting an image for your step in AI pipeline)

@sophwats @vpavlin @nakfour @goern @harshad16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants