Skip to content

nginx + django + docker architecture to host notebooks embedded from open-edx hosted MOOCs

License

Notifications You must be signed in to change notification settings

parmentelat/nbhosting

Folders and files

NameName
Last commit message
Last commit date
Nov 26, 2024
Apr 2, 2021
Apr 20, 2019
Apr 17, 2024
Feb 4, 2024
Jun 28, 2023
Jun 24, 2024
Apr 18, 2024
Nov 21, 2020
Jan 16, 2024
Nov 27, 2022
Dec 13, 2017
Jun 4, 2020
Jun 4, 2021
Apr 2, 2021
Nov 26, 2024

Repository files navigation

Foreword

Important Notices

  • Since release 0.24, nbhosting no longer relies on cookies to route traffic to the containers.
  • Since release 0.30, nbhosting relies on podman and no longer on docker to host containers.

Jupyter notebook hosting architecture

This git repo contains a collection of utilities, that together make up the architecture behind nbhosting.inria.fr that is designed as a notebook-serving infrastructure.

Use case : MOOCs

First use case is for hosting notebooks in the context of MOOCs. See e.g. on fun-mooc.fr:

The m@agistere service also uses this same infrastructure to add notebooks to their moddle-based LMS

In the classroom

In addition to this "*silent" mode, it is also possible to use it in standalone mode in the classroom; to that end, nbhosting also offers a few features to provide a thin navigation/structuring layer on top of notebook-oriented contents.


Open-edX teacher side

As far as fun-mooc/edx mode is concerned, on the edx side, teacher would create a bloc typed as ipython notebook - note that the present repo does not address the code for the edx extension that supports this type of blocs (ref?); it is readily available at this point (jan. 2017) at fun-mooc.fr; see below for enabling it on a new course.


Open-edX student side

With these settings in place, here's what a student would see;


How does it work ?

In a nutshell:

  • the first time a student opens a notebook, nbhosting transparently creates them an account, together with a container;
  • the first time a student opens a given notebook, this notebook is copied from the master course contents into her container; note that there are 2 different strategies at work in terms of copying, as explained below; in any case, from that point on, their work for that notebook is independant from the master course;
  • containers are automatically stopped (i.e. frozen) when the student is idle for some tunable amount of time, so as to preserve computing resources; as a consequence, a student may have to wait up to 10 seconds when she shows up the first time or after idle time (i.e. each a container needs to be respawn).

2 Additional features allow a student to:

  • Reset to Original: copy again the master course into their container - *beware that they will then lose their work on that notebook of course.
  • Share Static Version: create a read-only snapshot of her notebook, that can then be used to share their work in the course's forum or on their favorite chat system.


Miscellaneous

Enabling New ipython notebook

Before you can, as a teacher, add your first notebook-backed content in your edx course, you need to enable that extension; in order to do that, go to Studio, and then in your course's SettingsAvanced, and add ipython the Avanced Module List setting, as illustrated below:

Workflow / how to publish

Workflow is entirely based on git : a course is defined from a git repo, typically remote (github, gitlab, ...) and public. In order to publish a new version of your notebooks, you need to push them to that reference repo, and then instruct nbhosting to pull the new stuff :

If you set a given course in autopull mode, nbhosting will perform this pull operation on its own every 5 minutes.

Container image

Each course is deployed based on a specific image; for customization, create a file named nbhosting/Dockerfile in your course repo. Note that some magic recipes need to be applied in your image for proper deployment, so you should start from either the nbhosting/minimal-notebook or nbhosting/scipy-notebook image; see the beginning of the code for our Python MOOC for an example.

That image can then be rebuilt from the website. The new image will be deployed incrementally, essentially as running containers get phased out when detected as inactive; this means it can take a day or two before all the students can see the upgrade.

Notebook metadata

Each notebook is displayed with a label and version number; like e.g. on the example above . For tweaking that, use your notebook's metadata and set these two items:

Statistics

Some usage statistics are available, for visually inspecting data like:

  • how many different students have showed up and when,
  • which notebooks were opened and when,
  • computing resources like created/active containers, disk space, CPU load...

Staff

You can declare some people as being staff; this is used by nbhosting only for discarding accesses done by these people, when putting stats together. A convenience button also allows to trash all the working files for people declared as staff, which can come in handy to be sure that staff people always see the latest pushed version.

For declaring somebody as staff, you need to somehow locate that person's hash, as exposed by edx.

Jupytext

text-formats are way easier to manage under git than the historical ipynb format; for that reason, nbhosting provides full and transparent support for notebooks saved in a text-format, at least for formats known under jupytext as py:percent, py:light, markdown and md:myst.


Dataflow - nbhosting side

Here's the general principle of how things work

silent mode (in an iframe, behind a MOOC system)

  • Open-edX forges a URL, like the one shown above, with student replaced with the hash of some student id
  • This is caught by nginx, that runs forefront; the notebookLazyCopy/ prefix is routed to a django application, that primarily does this
    • create a linux user if needed
    • create a copy of that notebook for the student if needed
    • spawns a jupyter container for the couple (course, student)
    • redirects to a (plain https, on port 443) URL that contains the port number that the container can be reached at (on localhost via http)

Note that notebookLazyCopy used to be named ipythonExercice, which is still supported for backward compatibility.

classroom mode

The classroom mode uses a similar approach, but uses a URL that mentions notebookGitRepo/ instead of notebookLazyCopy/; the behaviour is mostly the same except for the policy used to create notebooks in the student space; when the visited notebook is missing there, notebookGitRepo triggers a git clone operation, instead of copying notebooks individually.

The advantage in this mode is that students can later on use the jupyterlab git extension to accurately manage their local repo, i.e. drop or commit local changes, pull any updates from the master repo, and so on

An experimental feature called 'pull-students' allows to deal with changes made in the master course; it allows to automatically pull these changes in the student's repo.

summary

As a summary:

TODO

See Issues on github for an up-to-date status.