TECA
The Toolkit for Extreme Climate Analysis
teca_thread_util.h
Go to the documentation of this file.
1 #ifndef teca_thread_utils_h
2 #define teca_thread_utils_h
3 
4 /// @file
5 
6 #include "teca_config.h"
7 #include "teca_common.h"
8 #include "teca_mpi.h"
9 
10 #include <deque>
11 
12 /// Codes for dealing with threading
14 {
15 /** load balances threads across an MPI communication space such that on the
16  * individual nodes physical cores each receive the same number of threads.
17  * This is an MPI collective call. Building the affinity map relies on
18  * features available only in _GNU_SOURCE. On systems where these features are
19  * unavailable, when automated detection of the number of threads is requested,
20  * the call will fail and the n_threads will be set to 1,
21  *
22  * @param[in] comm an MPI communcation space to load balance threads across.
23  * the communicator is used to coordinate affinity mapping such that
24  * each rank can allocate a number of threads bound to unique cores.
25  *
26  * @param[in] base_core_id identifies the core in use by this MPI rank's main
27  * thread. if -1 is passed this will be automatically
28  * determined.
29  *
30  * @param[in] n_requested the number of requested threads per rank. Passing a
31  * value of -1 results in use of all the cores on the
32  * node such that each physical core is assigned exactly
33  * 1 thread. Note that for performance reasons
34  * hyperthreads are not used here. The suggested number
35  * of threads is retruned in n_threads, and the returned
36  * affinity map specifies which core the thread should
37  * be bound to to acheive this. Passing n_requested >= 1
38  * specifies a run time override. This indicates that
39  * caller wants to use a specific number of threads,
40  * rather than one per physical core. In this case the
41  * affinity map is also constructed.
42  *
43  * @param[in] bind if true extra work is done to determine an affinity map such
44  * that each thread can be bound to a unique core on the node.
45  *
46  * @param[in] verbose prints a report decribing the affinity map.
47  *
48  * @param[in,out] n_threads if n_requested is -1, this will be set to the number
49  * of threads one can use such that there is one
50  * thread per phycial core taking into account all
51  * ranks running on the node. if n_requested is >= 1
52  * n_threads will be set to n_requested. This allows a
53  * run time override for cases when the caller knows
54  * how she wants to schedule things. if an error
55  * occurs and n_requested is -1 this will be set to 1.
56  *
57  * @param[out] affinity an affinity map, describing for each of n_threads,
58  * a core id that the thread can be bound to. if
59  * n_requested is -1 then the map will conatin an entry
60  * for each of n_threads where each of the threads is
61  * assigned a unique phyical core. when n_requested is >=
62  * 1 the map contains an enrty for each of the n_requested
63  * threads such that when more threads are requested than
64  * cores each core is assigned approximately the same
65  * number of threads.
66  *
67  * @returns 0 on success
68  *
69  * Environment variables:
70  *
71  * | Variable | Description |
72  * | ----------------------- | ----------- |
73  * | TECA_THREADS_PER_DEVICE | The number of threads that will service each GPU |
74  * | TECA_RANKS_PER_DEVICE | The number of MPI ranks allowed to use each GPU |
75  */
77 int thread_parameters(MPI_Comm comm, int base_core_id, int n_requested,
78  int n_threads_per_device, bool bind, bool verbose, int &n_threads,
79  std::deque<int> &affinity, std::vector<int> &device_ids);
80 };
81 
82 #endif
teca_thread_util
Codes for dealing with threading.
Definition: teca_thread_util.h:13
teca_common.h
teca_thread_util::thread_parameters
TECA_EXPORT int thread_parameters(MPI_Comm comm, int base_core_id, int n_requested, int n_threads_per_device, bool bind, bool verbose, int &n_threads, std::deque< int > &affinity, std::vector< int > &device_ids)
teca_error::TECA_EXPORT
p_teca_error_handler error_handler TECA_EXPORT
The global error handler instance.