Development

Online Source Code Documentation

TECA’s C++ sources are documented via Doxygen at the TECA Doxygen site.

Class Indices

Tip

The following tables contain a listing of some commonly used TECA classes. The TECA Doxygen site is a more complete reference.

Algorithms

TECA’s suite of algorithms that can be inserted in functional pipelines. (For more details, click on the class name)

Table 2 TECA Classes

Class

Description

teca_2d_component_area

An algorithm that computes the areas of labeled regions.

teca_apply_binary_mask

Applies a mask to a given list of variables.

teca_apply_tempest_remap

Moves data from one mesh to anotehr using remapping weights generated by TempestRemap.

teca_bayesian_ar_detect

The TECA BARD atmospheric river detector.

teca_bayesian_ar_detect_parameters

An algorithm that constructs and serves up the parameter table needed to run the Bayesian AR detector.

teca_binary_segmentation

An algorithm that computes a binary segmentation.

teca_cartesian_mesh_coordinate_transform

teca_cartesian_mesh_regrid

Transfers data between spatially overlapping meshes of potentially different resolutions.

teca_cartesian_mesh_source

An algorithm that generates a teca_cartesian_mesh of the requested spatial and temporal dimensions with optional user defined fields.

teca_cartesian_mesh_subset

applies a subset given in world coordinates to the upstream request

teca_component_area_filter

An algorithm that applies a mask based on connected component area.

teca_component_statistics

compute statistics about connected components

teca_connected_components

an algorithm that computes connected component labeling

teca_dataset_diff

compute the element wise difference between to datasets

teca_derived_quantity

a programmable algorithm specialized for simple array based computations

teca_descriptive_statistics

compute descriptive statistics over a set of arrays.

teca_elevation_mask

Generates a mask indicating where mesh points with a vertical pressure coordinate lie above the surface of the Earth. The mask is set to 1 where data is above the Earth’s surface and 0 otherwise.

teca_evaluate_expression

An algorithm that evaluates an expression stores the result in a new variable.

teca_face_to_cell_centering

An algorithm that transforms from face to cell centering.

teca_indexed_dataset_cache

Caches N datasets such that repeated requests for the same dataset are served from the cache.

teca_integrated_vapor_transport

An algorithm that computes integrated vapor transport (IVT)

teca_integrated_water_vapor

An algorithm that computes integrated water vapor (IWV)

teca_l2_norm

An algorithm that computes L2 norm.

teca_laplacian

An algorithm that computes the Laplacian from a vector field.

teca_latitude_damper

Inverted Gaussian damper for scalar fields.

teca_mask

an algorithm that masks a range of values

teca_normalize_coordinates

An algorithm to ensure that Cartesian mesh coordinates follow conventions.

teca_python_algorithm

teca_pytorch_algorithm

teca_rename_variables

An algorithm that renames variables.

teca_simple_moving_average

an algorithm that averages data in time

teca_table_calendar

An algorithm that transforms NetCDF CF-2 time variable into an absolute date.

teca_table_reduce

A reduction on tabular data over time steps.

teca_table_region_mask

An algorithm that identifies rows in the table that are inside the list of regions provided.

teca_table_remove_rows

An algorithm that removes rows from a table where a given expression evaluates to true.

teca_table_sort

an algorithm that sorts a table in ascending order

teca_table_to_stream

An algorithm that serializes a table to a C++ stream object.

teca_tc_candidates

GFDL tropical storms detection algorithm.

teca_tc_classify

an algorithm that classifies storms using Saphire-Simpson scale

teca_tc_trajectory

GFDL tropical storms trajectory tracking algorithm.

teca_tc_wind_radii

computes wind radius at the specified coordinates

teca_threaded_python_algorithm

teca_unpack_data

an algorithm that unpacks NetCDF packed values

teca_valid_value_mask

an algorithm that computes a mask identifying valid values

teca_vertical_coordinate_transform

An algorithm that transforms the vertical cooridinates of a mesh.

teca_vertical_reduction

The base class for vertical reducitons.

teca_vorticity

An algorithm that computes vorticity from a vector field.

I/O

TECA’s I/O components to read datasets efficiently. (For more details, click on the class name)

Table 3 TECA Classes

Class

Description

teca_array_collection_reader

A reader for collections of arrays stored in NetCDF format.

teca_cartesian_mesh_reader

A reader for data stored in binary cartesian_mesh format.

teca_cartesian_mesh_writer

An algorithm that writes Cartesian meshes in VTK format.

teca_cf_block_time_step_mapper

Maps time steps to files in fixed sized blocks.

teca_cf_interval_time_step_mapper

NetCDF CF2 files time step mapper.

teca_cf_layout_manager

Puts data on disk using NetCDF CF2 conventions.

teca_cf_reader

A reader for Cartesian mesh based data stored in NetCDF CF format.

teca_cf_time_axis_data

A dataset used to read NetCDF CF2 time and metadata in parallel.

teca_cf_time_axis_data_reduce

Gathers the time axis and metadata from a parallel read of a set of NetCDF CF2 files.

teca_cf_time_axis_reader

An algorithm to read time axis and its attributes in parallel.

teca_cf_time_step_mapper

Defines the interface for mapping time steps to files.

teca_cf_writer

A writer for Cartesian meshes in NetCDF CF2 format.

teca_multi_cf_reader

A reader for data stored in NetCDF CF format in multiple files.

teca_shape_file_mask

Generates a valid value mask defined by regions in the given ESRI shape file.

teca_table_reader

a reader for data stored in binary table format

teca_table_writer

An algorithm that writes tabular data in a binary or CSV (comma separated value) format that is easily ingested by most spreadsheet apps. Each page of a database is written to a file.

teca_wrf_reader

A reader for data stored in WRF ARW format.

Core

TECA’s core components. (For more details, click on the class name)

Table 4 TECA Classes

Class

Description

object

teca_algorithm

The interface to TECA pipeline architecture.

teca_algorithm_executive

Base class and default implementation for executives.

teca_bad_cast

An exception that maybe thrown when a conversion between two data types fails.

teca_binary_stream

Serialize objects into a binary stream.

teca_dataset

Interface for TECA datasets.

teca_dataset_capture

An algorithm that takes a reference to dataset produced by the upstream algorithm it is connected to.

teca_dataset_source

An algorithm that serves up user provided data and metadata.

teca_index_executive

An executive that generates requests using a upstream or user defined index.

teca_index_reduce

Base class for MPI + threads map reduce reduction over an index.

teca_memory_profiler

MemoryProfiler - A sampling memory use profiler.

teca_metadata

A generic container for meta data in the form of name=value pairs.

teca_mpi_manager

A RAII class to ease MPI initalization and finalization.

teca_parallel_id

A helper class for debug and error messages.

teca_profiler

A class containing methods managing memory and time profiling.

teca_programmable_algorithm

An algorithm implemented with user provided callbacks.

teca_programmable_reduce

Callbacks implement a user defined reduction over time steps.

teca_thread_pool

A class to manage a fixed size pool of threads that dispatch I/O work.

teca_threaded_algorithm

This is the base class defining a threaded algorithm.

teca_threaded_programmable_algorithm

An threaded algorithm implemented with user provided callbacks.

teca_threadsafe_queue

A thread safe queue.

teca_time_event

A helper class that times it’s life.

teca_uuid

A universally uniquer identifier.

teca_variant_array

A type agnostic container for array based data.

teca_variant_array_impl

The concrete implementation of our type agnostic container for contiguous arrays.

Data

TECA’s data structures. (For more details, click on the class name)

Table 5 TECA Classes

Class

Description

teca_arakawa_c_grid

A representation of mesh based data on an Arkawa C Grid.

teca_array_collection

A collection of named arrays.

teca_cartesian_mesh

An object representing data on a stretched Cartesian mesh.

teca_curvilinear_mesh

Data on a physically uniform curvilinear mesh.

teca_database

A collection of named tables.

teca_mesh

A base class for geometric data.

teca_priority_queue

An indirect priority queue that supports random access modification of priority.

teca_table

A collection of columnar data with row based accessors and communication and I/O support.

teca_table_collection

A collection of named tables.

teca_uniform_cartesian_mesh

Data on a uniform cartesian mesh.

Testing

TECA comes with an extensive regression test suite which can be used to validate your build. The tests can be executed from the build directory with the ctest command.

ctest --output-on-failure

Note that PYTHONPATH, LD_LIBRARY_PATH and DYLD_LIBRARY_PATH will need to be set to include the build’s lib directory and PATH will need to be set to include “.”.

Timing and Profiling

TECA contains built in profiling mechanism which captures the run time of each stage of a pipeline’s execution and a sampling memory profiler.

The profiler records the times of user defined events and sample memory at a user specified interval. The resulting data is written in parallel to a CSV file in rank order. Times are stored in one file and memory use samples in another. Each memory use sample includes the time it was taken, so that memory use can be mapped back to corresponding events.

Warning

In some cases TECA’s built in profiling can negatively impact run time performance as the number of threads is increased. For that reason one should not use it in performance studies. However, it is well suited to debugging and diagnosing scaling issues and understanding control flow.

Compilation

The profiler is not built by default and must be compiled in by adding -DTECA_ENABLE_PROFILER=ON to the CMake command line. Be sure to build in release mode with -DCMAKE_BUILD_TYPE=Release and also add -DNDEBUG to the CMAKE_CXX_FLAGS_RELEASE. Once compiled the built in profilier may be enabled at run time via environment variables described below or directly using its API.

Runtime controls

The profiler is activated by the following environment variables. Environmental variables are parsed in teca_profiler::initialize. This should be automatic in most cases as it’s called from teca_mpi_manager which is used by parallel TECA applications and tests.

Variable

Description

PROFILER_ENABLE

a binary mask that enables logging. 0x01 – event profiling enabled. 0x02 – memory profiling enabled.

PROFILER_LOG_FILE

path to write timer log to

MEMPROF_LOG_FILE

path to write memory profiler log to

MEMPROF_INTERVAL

float number of seconds between memory recordings

Visualization

The command line application teca_profile_explorer can be used to analyze the log files. The application requires a timer profile file and a list of MPI ranks to analyze be passed on the command line. Optionally a memory profile file can be passed as well. For instance, the following command was used to generate figure Fig. 16.

./bin/teca_profile_explorer -e bin/test/test_bayesian_ar_detect \
   -m bin/test/test_bayesian_ar_detect_mem -r 0

When run the teca_profile_explorer creast an interactive window displaying a Gantt chart for each MPI rank. The chart is organized with a row for each thread. Threads with more events are displayed higher up. For each thread, and every logged event, a colored rectangle is rendered. There can be 10’s - 100’s of unique events per thread thus it is impractical to display a legend. However, clicking on an event rectangle in the plot will result in all the data associated with the event being printed in the terminal. If a memory profile is passed on the command line the memory profile is normalized to the height of the plot and shown on top of the event profile. The maximum memory use is added to the title of the plot. Example output is shown in Fig. 16.

_images/tpc_rank_profile_data_0.png

Fig. 16 Visualization of TECA’s run time profiler for the test_bayesian_ar_detect regression test, run with 1 MPI rank and 10 threads.

Creating PyPi Packages

The typical sequence for pushing and testing to PyPi is as follows. Be sure to add an rc number to the version in setup.py when testing since these are unique and cannot be reused.

python3 setup.py build_ext
python3 setup.py install
python3 setup.py sdist
python3 -m twine upload --repository-url https://test.pypi.org/legacy/ dist/*
pip3 install --index-url https://test.pypi.org/simple/ teca