Python and Jupyter

Jupyterhub was the topic of the HPC Cafe in October 2020. https://jupyterhub.rrze.uni-erlangen.de/ is an experimental service.

 

This page will address some common pitfalls when working with python and related tools on a shared system like a cluster.

The following topics will be discussed in detail on this page:

Available python versions

All unix systems come with a system wide python installation, however for the cluster it is highly recommended to use one of the anaconda installations provided as a modules.

# reminder
module avail python
module load python/XY

These modules come with a wide range of preinstalled packages.

Installing packages

There are different ways of managing python packages on the cluster. This list is not complete, whoever it highlights methods which are known to work well with the local software stack.

As a general note. It is recommended to build packages using an interactive job on the target cluster to make sure all hardware can be used properly.
Make sure to load modules that might be needed by your python code (e.g. CUDA for gpu support)
set if external repositories are needed
export http_proxy=http://proxy:80
export https_proxy=http://proxy:80

Using pip

Pip is a package manager for python. It can be used to easily install packages and manage their versions.
By default pip will try to install packages system wide, which will not be possible due to missing permissions.
The behaviour can be changed by adding --user to the call.
pip install --user package-name
or %pip install —user —proxy http://proxy:80 package-name from within Jupiter-nootbooks

By defining the variable PYTHONUSERBASE (best done in your bashrc/bash_profile) we change the installation location from ~/.local to a different path. Doing so will prevent your home folder from cluttering with stuff that does not need a backup and hitting the quota.
export PYTHONUSERBASE=$WOODYHOME/software/privat
If you intend to share the package with your coworkers consider wrapping the python package inside a module.
For information on the module system see your HPC-Cafe from March 2020.

Setup and define the target folder with PYTHONUSERBASE.
Install the package as above.
Your module file needs to add to PYTHONPATH the site-packages folder
and to PATH the bin folder, if the package comes with binaries.

For an example see the module quantumtools on woody.

Conda envirenment

In order to use conda environments on the HPC cluster some preparation has to be done.
Remember a python module needs to be loaded all the time – see module avail python.

run
conda init bash if you use a different shell replace bash by the shell of your choice
source ~/.bashrc if you use a different shell replace .bashrc if you use a different shell.

The process was successful if you prompt starts with (base).

Create a ~/.profile with the content
if [ -n "$BASH_VERSION" ]; then
# include .bashrc if it exists
if [ -f "$HOME/.bashrc" ]; then
. "$HOME/.bashrc"
fi
fi
Some scientific software comes in form of a Conda environment (e.g. https://docs.gammapy.org/0.17/install/index.html).
By default such an environment will be installed to ~/.conda. However the size can be several GB therefore you should configure Conda to a different path. This will prevent your home folder from hitting the quota. It can be done by following these steps:

conda config # create ~/.condarc
Add the following lines to the file (replace the path if you prefer a different location)
pkgs_dirs:
- ${WOODYHOME}/software/privat/conda/pkgs
envs_dirs:
- ${WOODYHOME}/software/privat/conda/envs
You can check that this configuration file is properly read by inspecting the output of conda info
For more options see https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html
Conda environments can also be used for package management (and more)

Create your own environment with
conda create --name myenv (python=3.9)
conda activate myenv
conda/pip install package-name packages will end up within the conda environment therefor no --user option is needed.
Conda environments come with the benefit of easy use with jupyterhub.rrze.uni-erlangen.de they show up as kernal option when starting a notebook.

Jupyter notebook security

When using Jupyter notebooks with their default configuration, they are protected by a random hashed password, which in some circumstances can cause security issues on a multiuser-system like cshpc or the cluster frontends.
We can change this with a few configuration steps by adding a password protection.

First generate a configuration file by executing
jupyter notebook --generate-config

Open a python terminal and generate a password
from notebook.auth import passwd; passwd()

Add the password hash to your notebook config file

# The string should be of the form type:salt:hashed-password.

c.NotebookApp.password = u''
c.NotebookApp.password_required = True

From now on your notebook will be password protected this comes also with the benefit that you can use bash functions for a more convenient use.

Quick reminder how to use the remote notebook

#start notebook on a frontend (e.g. woody)
jupyter notebook --no-browser --port=XXXX

on your client:
ssh -f user_name@remote_server -L YYYY:localhost:XXXX -N

Open the notebook in your local browser at https://localhost:YYYY
With XXXX and YYYY being 4 digit numbers.
Don’t forget to stop the notebook once you are done. Otherwise you will block resources that could be used by others!

Some useful functions/alias for lazy people 😉

alias remote_notebook_stop='ssh username@remote_server_ip "pkill -u username jupyter"'
Be aware this will kill ALL jupyter processes that you own!

start_jp_woody(){

nohup ssh -J username@cshpc.rrze.fau.de -L $1:localhost:$1 username@woody.rrze.fau.de " 
. /etc/bash.bashrc.local; module load python/3.7-anaconda ; jupyter notebook --port=$1 --no-browser" ;
echo ""; echo " the notebook can be started in your browser at: https://localhost:$1/ " ; echo ""
}

 

start_jp_emmy(){

nohup ssh -J username@cshpc.rrze.fau.de -L $1:localhost:$1 username@emmy.rrze.fau.de " 
. /etc/profile; module load python/3.7-anaconda ; jupyter notebook --port=$1 --no-browser" ;
echo ""; echo " the notebook can be started in your browser at: https://localhost:$1/ " ; echo ""
}

 

If you are using a cshell remove . /etc/bash.bashrc.local and . /etc/profile from the functions.