-
Notifications
You must be signed in to change notification settings - Fork 13
Suggested NCCS Resources
The NASA Center for Climate Simulation (NCCS) has a computing infrastructure that allow users to run applications (using multiple cores and GPUs), perform visualization and store data. The main NCCS platform that new GMAO staff needs to be familiar with is discover
. We briefly describe in this document how to access discover
, the initial setup procedures, and how use the system to complile and run your application.
A description of discover
is available at:
If you do not already have an account, please send an email to NCCS Support: support at nccs dot nasa dot gov
Per that page:
The Discover cluster is the main compute cluster for processing batch jobs requiring significant compute resources. It consists of several scalable compute units (SCUs) that offer a variety of processor types. There are a variety of nodes dedicated to batch computing and interactive data analysis.
As soon as you receive the credential (USERNAME
, password
, token) to use discover
, you can access the platform from your workstation by issuing the command:
ssh -XY <USERNAME>@login.nccs.nasa.gov
Once you are connected, you will be asked to authenticate your access using RSA SecurID authentication:
PASSCODE: Enter your hardware or software token code here
host: discover
password: YOUR_NCCS_PASSWORD
Below are the recommended settings for .ssh/config
on the system you use to access discover (i.e., the system you ran ssh login.nccs.nasa.gov
from above).
Edit your local .ssh/config
(or create the file, if you don't have one) to have the below substituting your NCCS AUID for <USERNAME>
:
Host github.com
ForwardX11 no
Host *
ForwardX11 yes
ForwardX11Trusted yes
ForwardX11Timeout 500h
ServerAliveInterval 30
Host login.nccs.nasa.gov
User <USERNAME>
ForwardX11 yes
ForwardX11Trusted yes
ForwardX11Timeout 500h
ServerAliveInterval 30
PKCS11Provider /usr/lib/ssh-keychain.dylib
host discover discover?? discover-mil discover.nccs.nasa.gov dirac dirac.nccs.nasa.gov dataportal.nccs.nasa.gov adapt.nccs.nasa.gov
User <USERNAME>
LogLevel Quiet
ProxyCommand ssh -l <USERNAME> login.nccs.nasa.gov direct %h
ForwardX11 yes
ForwardX11Trusted yes
ForwardX11Timeout 500h
Protocol 2
ServerAliveInterval 30
This config is equivalent to the "PIV SSH" style SSH access to NCCS discussed here. If all works, you should no longer need to go through login.nccs.nasa.gov
but can do ssh discover
NOTE: If you do not have a PIV card and only an RSA token, REMOVE the PKCS11Provider
line above as they are for PIV access.
The webpage Logging-In & Passwords gives more details on the steps presented here,
When you are connected to discover
, you may want to select your default Shell, bash
being the default.
To switch to a different default shell (csh
, tcsh
, ksh
), contact support at nccs dot nasa dot gov
.
Users are recommended to configure their shell start up files as below.
umask 0022
ulimit -s unlimited
# Look for the OS version and set the module path accordingly
OS_VERSION=$(grep VERSION_ID /etc/os-release | cut -d= -f2 | cut -d. -f1 | sed 's/"//g')
# Run things in this if-block only if we're in an interactive shell
if [[ $- == *i* ]]
then
# Only put module use or other module commands here
# and in the correct OS version
if [[ "$OS_VERSION" == "15" ]]
then
export LMOD_SYSTEM_NAME=SLES15
module purge
module unuse -a /discover/swdev/gmao_SIteam/modulefiles-SLES12
module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES15
module load GEOSenv
else
export LMOD_SYSTEM_NAME=SLES12
module purge
module unuse -a /discover/swdev/gmao_SIteam/modulefiles-SLES15
module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES12
module load GEOSenv
fi
# Add any other things you want with interactive shells here
fi
umask 0022
limit stacksize unlimited
# Look for the OS version and set the module path accordingly
set OS_VERSION=`grep VERSION_ID /etc/os-release | cut -d= -f2 | cut -d. -f1 | sed 's/"//g'`
# Run things in this if-block only if we are in an interactive shell
if ($?prompt) then
# Only put module use or other module commands here
# and in the correct OS version
if ($OS_VERSION == 15) then
setenv LMOD_SYSTEM_NAME SLES15
module purge
module unuse -a /discover/swdev/gmao_SIteam/modulefiles-SLES12
module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES15
module load GEOSenv
else
setenv LMOD_SYSTEM_NAME SLES12
module purge
module unuse -a /discover/swdev/gmao_SIteam/modulefiles-SLES15
module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES12
module load GEOSenv
endif
# Add any other things you want with interactive shells here
endif
Users have the ability to ssh
or scp
within the NCCS systems without typing their NCCS passwords by setting up authorization keys. This step is required to run applications.
From your home directory on discover
, create a new authorized_keys by typing:
mkdir -p $HOME/.ssh
chmod 0700 $HOME/.ssh
cd $HOME/.ssh
ssh-keygen
Hit the enter/return
key two times for the prompted questions. This will create a pair of private and public identity files, id_rsa
and id_rsa.pub
, under the .ssh
directory.
Copy the file id_rsa.pub
into authorized_keys
in the same directory:
cat id_rsa.pub >> authorized_keys
NOTE: If you have instead an id_ed25519.pub
key use that instead. Both will work.
NOTE: This section is only needed if you have an account on dirac. If you do not, you can skip this
Copy the contents of id_rsa.pub
file from discover
to dirac
:
ssh <USERNAME>@dirac.nccs.nasa.gov 'mkdir -p ~/.ssh && chmod 700 ~/.ssh'
scp $HOME/.ssh/id_rsa.pub <USERNAME>@dirac.nccs.nasa.gov:~/.ssh/id_rsa.pub.discover
Access to dirac
:
ssh dirac
and from there, type:
cat $HOME/.ssh/id_rsa.pub.discover >> $HOME/.ssh/authorized_keys
exit
Below are the recommended settings for .ssh/config
on discover:
Host github.com
ForwardX11 no
Host *
ForwardX11 yes
ForwardX11Trusted yes
ForwardX11Timeout 500h
ServerAliveInterval 30
Now you have the initial settings to proper use discover
to run your GEOS related applications.
NCCS provides SchedMD's Slurm resource manager for users to control their applications on discover
.
The SLURM tools allows users to schedule their jobs and request the computing resources (such as CPU time, memory, etc.) they need to execute their applications. Please refer to the documentaion below for more information:
To submit jobs using SLURM, the webpage Running Jobs on Discover using Slurm explains how to use the queueing system or an interactive session (for better productivity and for quick access to the processor resources you need).
We recommend using only the Milan and Cascade Lake nodes at NCCS for doing work. You can get these nodes interactively with these commands:
- Cascade Lake (SLES12)
salloc --x11 --constraint=cas --nodes=N --job-name=Interactive --time=HH:MM:SS --account=ACCOUNT
- Milan (SLES15)
salloc --x11 --constraint=mil --nodes=N --job-name=Interactive --time=HH:MM:SS --account=ACCOUNT
You will need to fill in the actual number of nodes, --nodes=N
, the time, HH:MM:SS
and the account to run under, ACCOUNT. So for example 4 Milan nodes for 3 hours using account t1234 would be:
salloc --x11 --constraint=mil --nodes=4 --job-name=Interactive --time=03:00:00 --account=t1234
If you need a node quickly, you can often use the Debug QOS as it has a higher priority by adding:
--qos=debug
but you are limited to one job and for 1 hour.
If you have access to other partitions and QOSs, you can specify them with --partition=PART --qos=QOS
.
Note, the use of --x11
above will forward X11 to your local machine. If you do not want this, remove the --x11
option.
The home directory in any NCCS platform is quite small and is regularly backed up.
We recommend that users keep in their home directories only source code files and avoid storing there any file and data that takes disc space. NCCS has a file storage system provide options to store files for short-term and/or long-term periods. We recommend the use on discover
of the NOBACKUP
file system to compile and run your application.
The disc resources are not unlimited. It is important to be self-aware of any file system you are using and know the maximum number of files you can have and the maximum amount of disc space you can use. The page Show Quota shows how to deterime the quota in each file system.
Running application on NCCS platforms requires data files and generate output files that are stored at different locations. Depeneding on the need, we need to transfer files from one storage location to another. The File transfer webpage provides all the options available and describes how transferring files is done.
For more information, you can contact either the NCCS Support at support_AT_nccs.nasa.gov or the SI Team at siteam_AT_gmao.gsfc.nasa.gov