Installation

TECA is designed to deliver the highest available performance on platforms ranging from Cray supercomputers to laptops. The installation procedure depends on the platform and desired use.

On a Cray Supercomputer

Tip

CASCADE team members who need to run the command line applications on NERSC Cori can use the m1517 group installs instead of manually installing TECA. See m1517 CASCADE installs for more information.

When installing TECA on a supercomputer one of the best options is the superbuild, a piece of CMake code that downloads and builds TECA and its many dependencies. The superbuild is located in a git repository here TECA_superbuild.

Installing TECA on the Cray requires pointing the superbuild to Cray’s MPI and is sensitive to the modules loaded at compile time. You want a clean environment with only GNU compilers and CMake modules loaded. The superbuild will take care of compiling and installing TECA and its dependencies. As occasionally occurs, something may go awry. If that happens it is best to start over from scratch by rm -rf * the contents of both build and install directories. This is because of CMake’s caching mechanism which will remember bad values and also because the superbuild makes use of what’s already been installed as it progresses and if you don’t clean it out you may end up referring to a broken library.

Overview

A quick overview of installing the latest stable version of TECA from the master branch is:

module swap PrgEnv-intel PrgEnv-gnu
module load cmake
git clone https://github.com/LBL-EESA/TECA_superbuild.git
cd TECA_superbuild
git checkout master
mkdir build && cd build
run config-teca-sb.sh ..

When using the resulting TECA install you’ll need to load the teca modulefile installed by the superbuild find this file in the modulefiles directory.

Versioning

Over time both TECA and its dependencies evolve such that the compatible versions of dependencies and build options change. This is handled by tagging releases in both TECA and the TECA_superbuild. Each such stable release of TECA is paired with a corresponding release of the TECA_superbuild. The superbuild must be explicitly checked out to the same TECA release one is compiling. Be sure to call git checkout X in the superbuild clone where X is either master, develop, or a release number found at the releases page of the TECA github repo.

Here we show how to install from the master branch which has the latest stable version of the code. One may also wish to install from the develop branch which gets one the most up to date experimental codes. No matter which branch or release is selected the process is similar, one will modify the examples by replacing master with develop or the desired release number.

Selecting the file system

Supercomputers often come with a number of file systems attached. The choice of which file system to install the software on can impact the performance of large scale runs as many processes simultaneously will load code from library files stored on disk. Often home directories are located on a low performance networked file system not designed for use at large parallel concurrencies making the home folder a poor choice for an install. Typically there is also a high throughput parallel file system, such as Lustre, available that is designed for massive concurrent access. Parallel file systems will deliver the best performance and can be a good choice for installs. Consult your HPC center’s documentation before deciding on a file system for the install. For more information on the file systems available at NERSC see File Systems.

Configure script

The configure script is a shell script that captures build settings specific to Cray environment (see step 7 above). You may wish to edit the TECA_SOURCE and TECA_INSTALL variables before running the script. Additional CMake arguments may be passed on the command line. You must pass the location of the superbuild sources(see step 3 above).

Here is the script config-teca-sb.sh in use at NERSC for Cori.

#!/bin/bash

# use the GNU compiler collection
module swap PrgEnv-intel PrgEnv-gnu

# mpich is not in the pkg-config path on Cori
export PKG_CONFIG_PATH=$CRAY_MPICH_DIR/lib/pkgconfig:$PKG_CONFIG_PATH

# set the banch of TECA and the path where TECA is installed to.
TECA_SOURCE=master
TECA_INSTALL=$SCRATCH/teca_installs/$TECA_SOURCE

echo TECA_SOURCE=${TECA_SOURCE}
echo TECA_INSTALL=${TECA_INSTALL}
rm -rfI ${TECA_INSTALL}

# Configure TECA superbuild
cmake \
  -DCMAKE_CXX_COMPILER=`which g++` \
  -DCMAKE_C_COMPILER=`which gcc` \
  -DCMAKE_BUILD_TYPE=Release \
  -DTECA_SOURCE=${TECA_SOURCE} \
  -DCMAKE_INSTALL_PREFIX=${TECA_INSTALL} \
  -DENABLE_MPICH=OFF \
  -DENABLE_OPENMPI=OFF \
  -DENABLE_CRAY_MPICH=ON \
  -DENABLE_TECA_TEST=OFF \
  -DENABLE_TECA_DATA=ON \
  $*

# build and install
make -j16 install

This script configures the superbuild such that the release or branch named in TECA_SOURCE variable is compiled and installed in the TECA_INSTALL directory. As discussed in Versioning, the superbuild should be checked out to the same branch or release number named in TECA_source. Note that $SCRATCH is a NERSC specific environment variable that points to a directory owned by the user on a Lustre based file system. On other HPC centers you will need to replace $SCRATCH with the path to the file system you wish to install TECA on. See Selecting the file system for more information.

Hint

When trouble shooting the superbuild it is necessary to rm -rf both the build and install prefix directories. Failing to do so will lead to confusing build failures.

Configuring the runtime environment

During the install an environment module is generated and installed to $TECA_INSTALL/modulefiles/. To use the new install of TECA you will need to use it to configure the run time environment.

module swap PrgEnv-intel PrgEnv-gnu
module use ${TECA_INSTALL}/modulefiles
module load teca

The teca module must be loaded each time you use TECA and is usually best done from within your batch script.

Debugging and development on a supercomputer

Modifying the source code directly in the superbuild is a cumbersome process. It is far easier to keep a separate build for development and debugging. In this case the superbuild is still useful for installing the dependencies. To setup for TECA development and debugging on a supercomputer run the superbuild with -DENABLE_TECA=OFF. This will build the dependencies but not TECA itself. Once the superbuild completes, load the installed module, and compile a separate clone of the TECA github repo (See Compiling TECA from sources). This enables one to make local modifications, quickly recompile, and run out of the build directory.

On a laptop or desktop

On a laptop or desktop system one may use local package managers to install third-party dependencies, and then proceed with compiling and installing TECA. A simple procedure exists for those wishing to use TECA for Python scripting. See section Python only. For those wishing access to TECA libraries, command line applications, and Python scripting, compiling from sources is the best option. See section Compiling TECA from sources.

Note, that as with any install, post install the environment will likely need to be set to pick up the install. Specifically, PATH, LD_LIBRARY_PATH (or DYLD_LIBRRAY_PATH on Mac), and PYTHONPATH need to be set correctly. See section Post Install.

Compiling TECA from sources

TECA depends on a number of third party libraries. Before attempting to compile TECA please install dependencies as described in section Installing dependencies and then set up the Python environment as described in section Python environment.

Once dependencies are installed, a typical install might proceed as follows.

git clone --recursive https://github.com/LBL-EESA/TECA.git
svn co svn://svn.code.sf.net/p/teca/TECA_data TECA_data
cd TECA
mkdir bin
cd bin
cmake ..
make -j
make -j install

If all goes well, at the end of this TECA will be installed. However, note that the install location should be added to various system paths, See Post Install for how to configure the run time environment.

When running CMake one can pass -DCMAKE_INSTALL_PREFIX=<some path> to control where the install lands, and -DBUILD_TESTING=ON to enable regression tests.

The most common problem is when CMake failed to locate a dependency. Usually the error message has information about correcting the situation. Usually the remedy is to explicitly pass the path where the dependency is installed directly to CMake on the command line. While not recommended, as a last resort one may disable a problematic dependency using -DREQUIRE_<X>=OFF where X is the dependency.

Installing dependencies

Most of the dependencies can be installed by the OS specific package manager. For Python package dependencies pip is used as described in Python environment.

It is recommended to have a parallel HDF5 based NetCDF install, on some systems (Ubuntu, Mac) this requires installing NetCDF from source as outlined in NetCDF w/ Parallel 4.

Apple Mac OS

brew update && brew upgrade
brew install open-mpi hdf5-mpi swig svn udunits openssl python3

Ubuntu 20.04

$ apt-get update
$ apt-get install -y gcc g++ gfortran cmake swig \
    libmpich-dev libhdf5-dev libnetcdf-dev \
    libboost-program-options-dev python3-dev python3-pip \
    libudunits2-0 libudunits2-dev zlib1g-dev libssl-dev

Fedora 32

$ dnf update
$ dnf install -qq -y environment-modules which git-all gcc-c++ gcc-gfortran \
    make cmake swig mpich-devel hdf5-mpich-devel netcdf-mpich-devel \
    boost-devel python3-devel python3-pip subversion udunits2 udunits2-devel \
    zlib-devel openssl-devel wget redhat-rpm-config

Some of these packages may need an environment module loaded, for instance MPI

$ module load mpi

Python environment

TECA’s Python dependencies can be easily installed via pip.

$ python3 -mpip install numpy mpi4py matplotlib torch

However, when building TECA from sources it can be useful to setup a virtual environment. Creating the virtual environment is something that you do once, typically in your home folder or the SCRATCH file system on the Cray. Once setup the venv will need to be activated each time you use TECA.

$ cd ~
$ python3 -mvenv teca-py3k
$ source teca-py3k/bin/activate
$ python3 -mpip install numpy matplotlib mpi4py torch

Before building TECA, and every time you use TECA be sure to activate the same venv.

$ source teca-py3k/bin/activate

Once the venv is installed and activated, see Compiling TECA from sources.

Note

As of 1/1/2020 TECA switched to Python 3. Python 2 may still work but is no longer maintained and should not be used.

NetCDF w/ Parallel 4

As of 7/31/2020 TECA relies on HDF5 NetCDF with MPI collective I/O. The NetCDF project calls this feature set “parallel 4”. At this time neither Mac OS homebrew nor Ubuntu 20.04 have a functional parallel NetCDF package. On those systems one should install NetCDF from sources.

On Ubuntu 20.04

$ cd ~
$ sudo apt-get remove libhdf5-dev
$ sudo apt-get install libmpich-dev libhdf5-mpich-dev
$ wget https://downloads.unidata.ucar.edu/netcdf-c/4.8.1/netcdf-c-4.8.1.tar.gz
$ tar -xvf netcdf-c-4.8.1.tar.gz
$ cd netcdf-c-4.8.1
$ ./configure CC=mpicc CFLAGS="-O3 -I/usr/include/hdf5/mpich"       \
      LDFLAGS="-L/usr/lib/x86_64-linux-gnu/hdf5/mpich/ -lhdf5"      \
      --prefix=`pwd`/../netcdf-c-4.8.1-install --enable-parallel4   \
      --disable-dap
$ make -j install

On Apple Mac OS

$ brew uninstall netcdf hdf5 mpich
$ brew install open-mpi hdf5-mpi
$ wget https://downloads.unidata.ucar.edu/netcdf-c/4.8.1/netcdf-c-4.8.1.tar.gz
$ tar -xvf netcdf-c-4.8.1.tar.gz
$ cd netcdf-c-4.8.1
$ ./configure CC=mpicc --enable-shared --enable-static          \
    --enable-fortran --disable-dap --enable-netcdf-4            \
    --enable-parallel4 --prefix=`pwd`/../netcdf-c-4.8.1-install
$ make -j install

When configuring the TECA build pass the location of your NetCDF install on the CMake command line -DNETCDF_DIR=<path to the install>.

Post Install

When installing after compiling from sources the user’s environment should be updated to use the install. One may use the following shell script as a template for this purpose by replacing @CMAKE_INSTALL_PREFIX@ and @PYTHON_VERSION@ with the value used during the install.

#!/bin/bash

export LD_LIBRARY_PATH=@CMAKE_INSTALL_PREFIX@/lib/:@CMAKE_INSTALL_PREFIX@/lib64/:$LD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=@CMAKE_INSTALL_PREFIX@/lib/:@CMAKE_INSTALL_PREFIX@/lib64/:$DYLD_LIBRARY_PATH
export PKG_CONFIG_PATH=@CMAKE_INSTALL_PREFIX@/lib/pkgconfig:@CMAKE_INSTALL_PREFIX@/lib64/pkgconfig:$PKG_CONFIG_PATH
export PYTHONPATH=@CMAKE_INSTALL_PREFIX@/lib:@CMAKE_INSTALL_PREFIX@/lib/python@PYTHON_VERSION@/site-packages/
export PATH=@CMAKE_INSTALL_PREFIX@/bin/:$PATH

# for server install without graphics capability
#export MPLBACKEND=Agg

With this shell script in hand one configures the environment for use by sourcing it.

When developing TECA it is common to skip the install step and run out of the build directory. When doing so one must also set LD_LIBRARY_PATH, DYLD_LIBRARY_PATH, PYTHONPATH, and PATH to point to the build directory.

Python only

TECA’s C++ codes are wrapped in Python, and a number of pure Python implementations exist in the code base as well. This makes it possible to develop new TECA applications in Python using the teca Python package. Two installation methods have been documented here, pip and conda. Currently the conda method has some limitations. As a result pip is the recommended method.

with pip

The TECA Python package can be installed from PyPi using pip. This may be useful for developing new Python based applications and post processing codes. A virtual environment is recommended.

Before attempting to install TECA, install system library dependencies as shown in section Installing dependencies. Pure Python package dependencies may then be installed via pip.

python3 -m venv py3k-teca
source py3k-teca/bin/activate
pip3 install numpy matploptlib mpi4py torch
pip3 install teca

Note

When installing PyTorch, especially when using GPUs, follow the PyTorch_install instructions found on the PyTorch site.

The pip install teca command may take a few minutes as TECA compiles from sources. Errors are typically due to missing dependencies, from the corresponding CMake output it should be apparent which dependency was not found.

TECA makes heavy use of MPI and NetCDF parallel I/O. On some systems, notably Unbuntu and Mac OS the MPI enabled NetCDF libraries available from package managers are broken or missing. In this case one can install NetCDF with MPI features enabled (in NetCDF docs this is called “parallel 4”) and point the build to the local install by passing options on the pip command line.

pip install teca --global-option=build_ext \
    --global-option="--with-netcdf=/Users/bloring/netcdf-c-4.8.1-install/"

See section NetCDF w/ Parallel 4 for information on compiling NetCDF with MPI enabled.

with conda

The following is an experimental recipe for installing TECA into a conda environment.

conda create --yes -c conda-forge -n tecapy \
    python=3.9 numpy mpi4py netCDF4 boost openmpi \
    matplotlib python-dateutil cython swig pyparsing \
    cycler pytz torch
source activate tecapy
pip install teca --global-option=build_ext \
    --global-option="--without-netcdf-mpi"

This method does not support parallel I/O. As a result it is recommended to use with pip installation method.