• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
  • FAUTo the central FAU website
  • RRZE
  • NHR-Verein e.V.
  • Gauß-Allianz

Navigation Navigation close
  • News
  • About us
    • People
    • Funding
    • NHR Compute Time Projects
    • Tier3 User Project Reports
    • Success Stories from the Support
    • Annual Report
    • Jobs
    Portal About us
  • Research
    • Research Focus
    • Publications, Posters and Talks
    • Software & Tools
    • HPC Performance Lab
    • Atomic Structure Simulation Lab
    • NHR PerfLab Seminar
    • Projects
    • Awards
    Portal Research
  • Teaching & Training
    • Lectures and Seminars
    • Tutorials & Courses
    • Theses
    • HPC Café
    • Student Cluster Competition
    Portal Teaching & Training
  • Systems & Services
    • Systems, Documentation & Instructions
    • Support & Contact
    • Training Resources
    • Summary of System Utilization
    Portal Systems & Services
  • FAQ

  1. Home
  2. Systems & Services
  3. Systems, Documentation & Instructions
  4. Special applications, and tips & tricks
  5. Python and Jupyter

Python and Jupyter

In page navigation: Systems & Services
  • Systems, Documentation & Instructions
    • Getting started with HPC
      • NHR@FAU HPC-Portal Usage
    • Job monitoring with ClusterCockpit
    • NHR application rules – NHR@FAU
    • HPC clusters & systems
      • Dialog server
      • Alex GPGPU cluster (NHR+Tier3)
      • Fritz parallel cluster (NHR+Tier3)
      • Meggie parallel cluster (Tier3)
      • Emmy parallel cluster (Tier3)
      • Woody(-old) throughput cluster (Tier3)
      • Woody throughput cluster (Tier3)
      • TinyFat cluster (Tier3)
      • TinyGPU cluster (Tier3)
      • Test cluster
      • Jupyterhub
    • SSH – Secure Shell access to HPC systems
    • File systems
    • Batch Processing
      • Job script examples – Slurm
      • Advanced topics Slurm
    • Software environment
    • Special applications, and tips & tricks
      • Amber/AmberTools
      • ANSYS CFX
      • ANSYS Fluent
      • ANSYS Mechanical
      • Continuous Integration / Gitlab Cx
        • Continuous Integration / One-way syncing of GitHub to Gitlab repositories
      • CP2K
      • CPMD
      • GROMACS
      • IMD
      • Intel MKL
      • LAMMPS
      • Matlab
      • NAMD
      • OpenFOAM
      • ORCA
      • Python and Jupyter
      • Quantum Espresso
      • R and R Studio
      • Spack package manager
      • STAR-CCM+
      • Tensorflow and PyTorch
      • TURBOMOLE
      • VASP
        • Request access to central VASP installation
      • Working with NVIDIA GPUs
      • WRF
  • Support & Contact
    • HPC Performance Lab
    • Atomic Structure Simulation Lab
  • HPC User Training
  • HPC System Utilization

Python and Jupyter

Jupyterhub was the topic of the HPC Cafe in October 2020. https://jupyterhub.rrze.uni-erlangen.de/ is an experimental service.

 

This page will address some common pitfalls when working with python and related tools on a shared system like a cluster.

The following topics will be discussed in detail on this page:

  • Available python versions
  • Installing packages
  • Conda environment
  • Jupyter notebook security
  • Installation and usage of mpi4py under Conda

Available python versions

All Unix systems come with a system-wide python installation, however for the cluster it is highly recommended to use one of the anaconda installations provided as a modules.

# reminder
module avail python
module load python/XY

These modules come with a wide range of preinstalled packages.

Installing packages

There are different ways of managing python packages on the cluster. This list is not complete, whoever it highlights methods which are known to work well with the local software stack.

As a general note. It is recommended to build packages using an interactive job on the target cluster to make sure all hardware can be used properly.
Make sure to load modules that might be needed by your python code (e.g. CUDA for gpu support)
set if external repositories are needed
export http_proxy=http://proxy:80
export https_proxy=http://proxy:80

Using pip

Pip is a package manager for python. It can be used to easily install packages and manage their versions.
By default pip will try to install packages system wide, which will not be possible due to missing permissions.
The behavior can be changed by adding --user to the call.
pip install --user package-name
or %pip install --user --proxy http://proxy:80 package-name from within Jupiter-notebooks

By defining the variable PYTHONUSERBASE (best done in your bashrc/bash_profile) we change the installation location from ~/.local to a different path. Doing so will prevent your home folder from cluttering with stuff that does not need a backup and hitting the quota.
export PYTHONUSERBASE=$WORK/software/privat
If you intend to share the packages/envs with your coworkers consider wrapping the python package inside a module.
For information on the module system see your HPC-Cafe from March 2020.

Setup and define the target folder with PYTHONUSERBASE.
Install the package as above.
Your module file needs to add to PYTHONPATH the site-packages folder
and to PATH the bin folder, if the package comes with binaries.

For an example see the module quantumtools on woody.

Conda environment

In order to use Conda environments on the HPC cluster some preparation has to be done.
Remember a python module needs to be loaded all the time – see module avail python.

run
conda init bash if you use a different shell replace bash by the shell of your choice
source ~/.bashrc if you use a different shell replace .bashrc.

The process was successful if your prompt starts with (base).

Create a ~/.profile with the content
if [ -n "$BASH_VERSION" ]; then
# include .bashrc if it exists
if [ -f "$HOME/.bashrc" ]; then
. "$HOME/.bashrc"
fi
fi

For batch jobs it might be needed to use source activate <myenv> instead of conda activate <myenv>

Some scientific software comes in the form of a Conda environment (e.g. https://docs.gammapy.org/0.17/install/index.html).
By default such an environment will be installed to ~/.conda. However the size can be several GB, therefore you should configure Conda to a different path. This will prevent your home folder from hitting the quota. It can be done by following these steps:

conda config # create ~/.condarc
Add the following lines to the file (replace the path if you prefer a different location)
pkgs_dirs:
- ${WORK}/software/privat/conda/pkgs
envs_dirs:
- ${WORK}/software/privat/conda/envs
You can check that this configuration file is properly read by inspecting the output of conda info
For more options see https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html
Conda environments can also be used for package management (and more)
You can share conda environments with co-workers by having them add your environment path to their envs_dir as well.

Create your own environment with
conda create --name myenv (python=3.9)
conda activate myenv
conda/pip install package-name packages will end up within the conda environment therefore no --user option is needed.
Conda environments come with the extra benefit of ease of use; with jupyterhub.rrze.uni-erlangen.de they show up as a kernel option when starting a notebook.

Jupyter notebook security

When using Jupyter notebooks with their default configuration, they are protected by a random hashed password, which in some circumstances can cause security issues on a multi-user system like cshpc or the cluster frontends. We can change this with a few configuration steps by adding a password protection.

First generate a configuration file by executing
jupyter notebook --generate-config

Open a python terminal and generate a password:
from notebook.auth import passwd; passwd()

Add the password hash to your notebook config file

# The string should be of the form type:salt:hashed-password.

c.NotebookApp.password = u''
c.NotebookApp.password_required = True

From now on your notebook will be password protected. This comes with the benefit that you can use bash functions for a more convenient use.

Quick reminder how to use the remote notebook:

#start notebook on a frontend (e.g. woody)
jupyter notebook --no-browser --port=XXXX

On your client, use:
ssh -f user_name@remote_server -L YYYY:localhost:XXXX -N

Open the notebook in your local browser at https://localhost:YYYY
With XXXX and YYYY being 4 digit numbers.
Don’t forget to stop the notebook once you are done. Otherwise you will block resources that could be used by others!

Some useful functions/aliases for lazy people 😉

alias remote_notebook_stop='ssh username@remote_server_ip "pkill -u username jupyter"'
Be aware this will kill all jupyter processes that you own!

start_jp_woody(){

nohup ssh -J username@cshpc.rrze.fau.de -L $1:localhost:$1 username@woody.nhr.fau.de " 
. /etc/bash.bashrc.local; module load python/3.7-anaconda ; jupyter notebook --port=$1 --no-browser" ;
echo ""; echo " the notebook can be started in your browser at: https://localhost:$1/ " ; echo ""
}

 

start_jp_meggie(){

nohup ssh -J username@cshpc.rrze.fau.de -L $1:localhost:$1 username@meggie.rrze.fau.de " 
. /etc/profile; module load python/3.7-anaconda ; jupyter notebook --port=$1 --no-browser" ;
echo ""; echo " the notebook can be started in your browser at: https://localhost:$1/ " ; echo ""
}

 

If you are using a cshell remove . /etc/bash.bashrc.local and . /etc/profile from the functions.

Installation and usage of mpi4py under Conda

Installing mpi4py via pip will install a generic MPI that will not work on our clusters. We recommend separately installing mpi4py for each cluster through the following steps:

  • If conda is not already configured and initialized follow the steps documented under Conda environment.
  • For more details regarding the installation refer to the official documentation of mpi4py.

Note: Running MPI parallel Python scripts is only supported on the compute nodes and not on frontend nodes.

Installation

Installation must be performed on the cluster frontend node:

  • Load Anaconda module.
  • Load MPI module.
  • Install mpi4py and specify the path to the MPI compiler wrapper:
    MPICC=$(which mpicc) pip install --no-cache-dir mpi4py

Testing the installation must be performed inside an interactive job:

  • Load the Anaconda and MPI module versions mpi4py was build with.
  • Activate environment.
  • Run MPI parallel Python script:
    srun python -m mpi4py.bench helloworld

    This should print for each process a line in the form of:

    Hello, World! I am process  <rank> of <size> on <hostname>

    The number of processes to start is configured through the respective options of salloc.

Usage

MPI parallel python scripts with mpi4py only work inside a job on a compute node.

In an interactive job or inside a job script run the following steps:

  • Load the Anaconda and MPI module versions mpi4py was build with.
  • Initialize/activate environment.
  • Run MPI parallel Python script
    srun python <script>

    The number of processes to start is configured through the respective options in the job script or of salloc.

For how to request an interactive job via salloc and how to write a job script see batch processing.

Erlangen National High Performance Computing Center (NHR@FAU)
Martensstraße 1
91058 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • How to find us
Up