TECA
The Toolkit for Extreme Climate Analysis
teca_thread_util Namespace Reference

Codes for dealing with threading. More...

Functions

TECA_EXPORT int thread_parameters (MPI_Comm comm, int base_core_id, int n_requested, int n_threads_per_device, bool bind, bool verbose, int &n_threads, std::deque< int > &affinity, std::vector< int > &device_ids)
 

Detailed Description

Codes for dealing with threading.

Function Documentation

◆ thread_parameters()

TECA_EXPORT int teca_thread_util::thread_parameters ( MPI_Comm  comm,
int  base_core_id,
int  n_requested,
int  n_threads_per_device,
bool  bind,
bool  verbose,
int &  n_threads,
std::deque< int > &  affinity,
std::vector< int > &  device_ids 
)

load balances threads across an MPI communication space such that on the individual nodes physical cores each receive the same number of threads. This is an MPI collective call. Building the affinity map relies on features available only in _GNU_SOURCE. On systems where these features are unavailable, when automated detection of the number of threads is requested, the call will fail and the n_threads will be set to 1,

Parameters
[in]comman MPI communcation space to load balance threads across. the communicator is used to coordinate affinity mapping such that each rank can allocate a number of threads bound to unique cores.
[in]base_core_ididentifies the core in use by this MPI rank's main thread. if -1 is passed this will be automatically determined.
[in]n_requestedthe number of requested threads per rank. Passing a value of -1 results in use of all the cores on the node such that each physical core is assigned exactly 1 thread. Note that for performance reasons hyperthreads are not used here. The suggested number of threads is retruned in n_threads, and the returned affinity map specifies which core the thread should be bound to to acheive this. Passing n_requested >= 1 specifies a run time override. This indicates that caller wants to use a specific number of threads, rather than one per physical core. In this case the affinity map is also constructed.
[in]bindif true extra work is done to determine an affinity map such that each thread can be bound to a unique core on the node.
[in]verboseprints a report decribing the affinity map.
[in,out]n_threadsif n_requested is -1, this will be set to the number of threads one can use such that there is one thread per phycial core taking into account all ranks running on the node. if n_requested is >= 1 n_threads will be set to n_requested. This allows a run time override for cases when the caller knows how she wants to schedule things. if an error occurs and n_requested is -1 this will be set to 1.
[out]affinityan affinity map, describing for each of n_threads, a core id that the thread can be bound to. if n_requested is -1 then the map will conatin an entry for each of n_threads where each of the threads is assigned a unique phyical core. when n_requested is >= 1 the map contains an enrty for each of the n_requested threads such that when more threads are requested than cores each core is assigned approximately the same number of threads.
Returns
0 on success

Environment variables:

Variable Description
TECA_THREADS_PER_DEVICE The number of threads that will service each GPU
TECA_RANKS_PER_DEVICE The number of MPI ranks allowed to use each GPU