About

This is material for a tutorial on the COSIMA Cookbook for a 2023 Intersect course on HPC and Data in Computational Fluid Dynamics. It borrows very heavily from Aidan Heerdegen's training workshop and associated hive post (thanks Aidan!)

Lecture slides from the course are here. Contact me if you'd like to see the video recording of my lecture.

This tutorial is aimed at people who have some familiarity with the Python programming language but have never used the COSIMA Recipes for accessing and analysing COSIMA model output.

The tutorial will use the NCI ARE (Australian Research Environment) JupyterLab App. This is a user-friendly way to run Jupyter Notebooks at NCI, which makes it possible to access the data required for this tutorial.

The tutorial will provide hands-on experience in

using Python in the JupyterLab App environment on NCI ARE
accessing and manipulating data with the Xarray package
using Dask to accelerate your analysis and handle larger datasets
using the COSIMA Cookbook, Data Explorer and COSIMA Recipes

Preparation Before Tutorial Day

There are three prerequisites, two to access NCI resources, and one to get the jupyter python notebooks.

All perequisites need to be completed BEFORE the tutorial. We only have 90 min for the tutorial, and there will not be time to do administration and follow the tutorial.

NCI Account

You must have an NCI account. If you do not have an NCI account, follow the NCI instructions to create one.

NCI Projects

This tutorial will require membership of particular projects at NCI. To participate you will need to follow the instructions sent by Meiyun. Please don't request membership of these projects directly through Mancini. You will be sent 3 project membership invitations which you need to accept WELL BEFORE the tutorial.

Get Notebooks

The tutorial uses some notebooks in a git repository. You will need to download these notebooks to an NCI filesystem. The recommended location is your home directory on gadi, as this is always accessible to an ARE JupyterLab session.

To do this, log in to gadi and copy and paste this command:

git clone https://github.com/aekiss/HPC-Data-CFD-2023.git

This will have created a directory called HPC-Data-CFD-2023 in your gadi home directory.

Note: this repository is being updated, so you may need to clone a fresh version of this repository (or do a git pull, if you know about git) just prior to the tutorial.

Tutorial Session

Start ARE JupyterLab Session

This will be the first thing we do in the tutorial, but feel free to test this out before the tutorial day to check it is all working correctly for you.

The instructions from the ACCESS-NRI Intake catalogue docs are excellent and worth reading for background, and of course the NCI ARE Documentation is also available.

Steps:

Log in to ARE using your NCI username and password: https://are.nci.org.au

Select the JupyterLab app

For this tutorial there are some recommended settings for your ARE session that have been tested and will work:


Walltime	2 hours
Queue	normalbw
Compute Size	Medium
Project	choose one, e.g. `nf47`
Storage	see below

You must choose one of the projects you have available to you. This must be a project with compute allocation remaining as discussed above in prerequisites.

The storage section must contain all the /g/data projects you need to access. Use the following for this tutorial:

gdata/hh5+gdata/ik11+gdata/cj50

Click on "Advanced Settings" and set the following options:

Module directories

/g/data/hh5/public/modules

Modules

conda/analysis3

Push the Launch button and then you will have a window that looks something like this:

You may need to wait a few minutes for your queued job to start running, which is where your JupyterLab application runs.

When your JupyterLab session is ready the screen will change, with an "Open JupyterLab" button. Push it and your JupyterLab session will open in a new window.

Find Tutorial Notebooks

In the JupyterLab window use the file browser to navigate to the directory where you downloaded the jupyter notebook files in the preparation section above. Double-click on the home directory in the file browser:

and then double-click on the HPC-Data-CFD-2023 directory.

Familiarisation with the JupyterLab interface

If you've never used Jupyter or JupyterLab before, you might like to look at this overview of the interface and how to use notebooks.

Creating a new notebook

Click the big blue + button at the top of the file browser (see image above), then click Python [conda env:analysis3-23.04] * in the Notebook category. This opens a new untitled Python notebook (running with a specific package environment). You can change the notebook name by right-clicking the notebook tab or in the file browser.
There will be a code cell at the top. Type 1+1 in it, hold down shift and press return. Shift-return will evaluate a cell (rather than creating a new line in the cell), print the result, and give you a new code cell in which you can try out other Python commands. You can go back and change cells and re-evaluate them with shift-return. You can hide input or output cells by clicking in the blue bar that appears in the left margin. Cells can also be rearranged, duplicated, deleted and more.
You can change a cell type from Code to Markdown using the dropdown menu. Markdown cells can be used to create comments and explanations of your code and analysis, including URLs, images and tables, and also mathematical markup in Latex. Try out some of the examples here.

Simple Xarray demo

Double-click xarray_demo.ipynb in the file browser in the left column and work through the notebook.

Dask, Xarray and COSIMA Cookbook demo

Xarray will use Dask arrays if you set the chunks parameter (e.g. chunks='auto') in NetCDF file opening commands such as xr.open_dataset and xr.open_mfdataset, and subsequent calculations will then be automatically parallelised with Dask, without needing any code changes. This is done automatically by cc.querying.getvar in the COSIMA Cookbook, but the default chunking scheme can be overridden if you like. Note that you may need to choose your chunking scheme to suit your calculation for optimum performance.

To see examples of Xarray calculations done in parallel with Dask using the COSIMA Cookbook, double-click Sea_level.ipynb in the file browser and work through the notebook.

COSIMA Cookbook data explorer demo

Double-click Explorer_demo.ipynb in the file browser and play around with it. There's more information in Finding_COSIMA_data.ipynb if you're interested.

For further interest

try out the other notebooks in this repository
check out the COSIMA Recipes

Clean-up

When finished make sure you save your work, close the tab and then Delete your running JupyterLab app, otherwise it will continue to consume compute resources and eventually stop when it reaches the walltime limit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

About

Preparation Before Tutorial Day

NCI Account

NCI Projects

Get Notebooks

Tutorial Session

Start ARE JupyterLab Session

Find Tutorial Notebooks

Familiarisation with the JupyterLab interface

Creating a new notebook

Simple Xarray demo

Dask, Xarray and COSIMA Cookbook demo

COSIMA Cookbook data explorer demo

For further interest

Clean-up

Files

README.md

Latest commit

History

README.md

File metadata and controls

About

Preparation Before Tutorial Day

NCI Account

NCI Projects

Get Notebooks

Tutorial Session

Start ARE JupyterLab Session

Find Tutorial Notebooks

Familiarisation with the JupyterLab interface

Creating a new notebook

Simple Xarray demo

Dask, Xarray and COSIMA Cookbook demo

COSIMA Cookbook data explorer demo

For further interest

Clean-up