-
Notifications
You must be signed in to change notification settings - Fork 24
Using APEX with HPX
For specialized instructions, here is a link to build instructions for building HPX and the Octotiger application on NERSC Cori Phase II (KNL).
There are a number of dependencies for HPX, and a number for APEX:
- Modern C++ compiler with C++11 support or better (i.e. GCC 7.3)
- MPI compiler for distributed support (optional, but recommended)
- Modern CMake (i.e. v3.15 or better)
- Boost (1.61 or better) https://www.boost.org
- Portable Hardware Locality (hwloc) https://www.open-mpi.org/projects/hwloc/ - included with the "Open MPI" MPI implementation
- GPerftools or JEMalloc (optional heap manager with thread support, but significantly speeds up memory accesses over the system memory management)
HPX will automatically download APEX as a dependency, so once the above dependencies are installed, download the HPX source code:
git clone --branch stable --depth 1 https://github.com/STEllAR-GROUP/hpx.git
HPX has many branches and deep history, so to speed up the clone and save disk space, be specific:
git clone --branch stable --depth 1 https://github.com/STEllAR-GROUP/hpx.git
Here's an example for how to build HPX without MPI and with APEX support on an OSX laptop, using Spack to manage dependencies:
#!/bin/zsh -e
. ${HOME}/spack/share/spack/setup-env.sh
spack load cmake
spack load boost
spack load gperftools
spack load hwloc@2.2.0%clang@11.0.3-apple~cairo~cuda~gl~libudev+libxml2~netloc~nvml~pci+shared
spack load otf2@2.2%clang@11.0.3-apple
if [ -d build ] ; then
rm -rf build
fi
mkdir build
cd build
cwd=`pwd`
boost=`spack location -i boost`
gperftools=`spack location -i gperftools` \
hwloc=`spack location -i hwloc@2.2.0%clang@11.0.3-apple~cairo~cuda~gl~libudev+libxml2~netloc~nvml~pci+shared`
otf2=`spack location -i otf2@2.2%clang@11.0.3-apple`
cmake \
-DCMAKE_CXX_COMPILER=`which g++` \
-DCMAKE_C_COMPILER=`which gcc` \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DBOOST_ROOT=${boost} \
-DTCMALLOC_ROOT=${gperftools} \
-DHPX_WITH_MALLOC=tcmalloc \
-DHWLOC_ROOT=${hwloc} \
-DCMAKE_INSTALL_PREFIX=${cwd}/install \
-DHPX_WITH_THREAD_IDLE_RATES=ON \
-DHPX_WITH_PARCELPORT_MPI=OFF \
-DHPX_WITH_TOOLS=ON \
-DHPX_WITH_TESTS=ON \
-DHPX_WITH_EXAMPLES=ON \
-DHPX_WITH_APEX=TRUE \ # Enables APEX support
-DHPX_WITH_APEX_TAG=develop \ # Optional, only for getting latest code updates
-DAPEX_WITH_ACTIVEHARMONY=FALSE \ # Optional, used for executing policies for runtime adaptation
-DAPEX_WITH_OTF2=TRUE \ # Optional, used for generating OTF2 traces read by Vampir/Traveler
-DOTF2_ROOT=${otf2} \ # Optional, path to OTF2 library installation
-DAPEX_WITH_PAPI=FALSE \ # Optional, enables hardware counter support
..
make -j8 -l8 core tests.examples.quickstart
ctest -V -R tests.examples.quickstart
HPX has many test / example programs. For brevity, we'll use the fibonacci example. To run the fibonacci program from the build directory:
khuck@Kevins-MacBook-Air build % ./bin/fibonacci
fibonacci(10) == 55
elapsed time: 0.001749 [s]
To run and see an APEX summary of execution, set the APEX_SCREEN_OUTPUT
environment variable (or export it in your environment):
khuck@Kevins-MacBook-Air build % APEX_SCREEN_OUTPUT=1 ./bin/fibonacci
fibonacci(10) == 55
elapsed time: 0.002364 [s]
Elapsed time: 0.0306946 seconds
Cores detected: 8
Worker Threads observed: 4
Available CPU time: 0.122778 seconds
Timer : #calls | mean | total | % total
------------------------------------------------------------------------------------------------
APEX MAIN : 1 0.031 0.031 100.000
apex::profiler_listener::process_profiles : 1 0.000 0.000 0.079
async : 2 0.000 0.000 0.003
async_launch_policy_dispatch : 5 0.000 0.000 0.239
broadcast_call_shutdown_functions_action : 2 0.000 0.000 0.065
call_shutdown_functions_action : 2 0.000 0.000 0.181
fibonacci_action : 174 0.000 0.008 6.767
load_components_action : 1 0.026 0.026 21.234
primary_namespace_colocate_action : 2 0.000 0.000 0.038
run_helper : 1 0.001 0.001 0.739
shutdown_all_action : 1 0.000 0.000 0.110
APEX Idle : 0.087 70.544
------------------------------------------------------------------------------------------------
Total timers : 191
The HPX runtime is instrumented with APEX callbacks, so any HPX task is automatically measured. Note that because the APEX data is not reduced to node (process/locality) 0 before exit, the screen report is only from node 0 data.
APEX can generate task graphs from HPX. To see them, use the APEX_TASKGRAPH_OUTPUT
environment variable when the application is executed. Then run dot (from graphviz) on the resulting taskgraph:
khuck@Kevins-MacBook-Air build % APEX_TASKGRAPH_OUTPUT=1 ./bin/fibonacci
fibonacci(10) == 55
elapsed time: 0.002013 [s]
khuck@Kevins-MacBook-Air build % ls
CMakeCache.txt apex/ hpx/ scripts/
CMakeFiles/ arch.c init/ src/
CTestTestfile.cmake bin/ lib/ taskgraph.0.dot
DartConfiguration.tcl cmake_install.cmake libs/ tests/
Makefile components/ out.bmp tools/
Testing/ examples/ plugins/ wrap/
khuck@Kevins-MacBook-Air build % dot -Tpdf -O taskgraph.0.dot
khuck@Kevins-MacBook-Air build % ls
CMakeCache.txt arch.c lib/ taskgraph.0.dot.pdf
CMakeFiles/ bin/ libs/ tests/
CTestTestfile.cmake cmake_install.cmake out.bmp tools/
DartConfiguration.tcl components/ plugins/ wrap/
Makefile examples/ scripts/
Testing/ hpx/ src/
apex/ init/ taskgraph.0.dot
khuck@Kevins-MacBook-Air build % open taskgraph.0.dot.pdf
APEX can generate scatterplots of a sample (1/100) of tasks that are executed. The x-axis is the time since start of the program, the y-axis is the duration of the task. To see them, use the APEX_SCATTERPLOT_OUTPUT
environment variable, and then run the APEX post-processing python script on them to generate the charts. For this example, we run with a larger fibonacci number to generate more samples, and we run the fibonacci_futures example, which tries different parallel implementations:
khuck@Kevins-MacBook-Air build % APEX_TASK_SCATTERPLOT=1 ./bin/fibonacci_futures --n-value=20
fibonacci_serial(20) == 6765,elapsed time:,45086,[s]
fibonacci_future_one(20) == 6765,elapsed time:,165061319,[s]
fibonacci(20) == 6765,elapsed time:,32537395,[s]
fibonacci_fork(20) == 6765,elapsed time:,20437048,[s]
fibonacci_future(20) == 6765,elapsed time:,65245878,[s]
fibonacci_future_fork(20) == 6765,elapsed time:,49399325,[s]
fibonacci_future_when_all(20) == 6765,elapsed time:,68501537,[s]
fibonacci_future_unwrapped_when_all(20) == 6765,elapsed time:,68637877,[s]
fibonacci_future_all(20) == 6765,elapsed time:,52566179,[s]
fibonacci_future_all_when_all(20) == 6765,elapsed time:,50315426,[s]
khuck@Kevins-MacBook-Air build % ../apex/src/scripts/task_scatterplot.py
Parsed 2467 samples
Plotting async_launch_policy_dispatch
Plotting async_launch_policy_dispatch::call
Plotting async
Rendering...
khuck@Kevins-MacBook-Air build % open image.png
APEX can generate an OTF2 trace suitable for visualization with Vampir (a commercial tool) or Traveler. To collect an OTF2 trace, use the APEX_OTF2 environment variable:
khuck@Kevins-MacBook-Air build % APEX_OTF2=1 ./bin/fibonacci
Rank 0 of 1.
fibonacci(10) == 55
elapsed time: 0.003572 [s]
Closing OTF2 event files...
Writing OTF2 definition files...
Writing OTF2 Global definition file...
Writing OTF2 Node information...
Writing OTF2 Communicators...
Closing the archive...
done.
To validate the trace, you can use the otf2-print
utility that comes with the OTF2 library:
khuck@Kevins-MacBook-Air build % otf2-print -A ./OTF2_archive/APEX.otf2
=== OTF2-PRINT ===
Content of OTF2 anchor file:
Version 2.2.0
Chunk size events 1048576
Chunk size definitions 4194304
File substrate POSIX
Compression NONE
Number of locations 5
Number of global definitions 52
Machine name
Creator APEX version stable-6cbbe6b878-master
Built on: 09:47:32 Jul 17 2020
C++ Language Standard version : 201402
Clang Compiler version : 4.2.1 Compatible Apple LLVM 11.0.3 (clang-1103.0.32.62)
Description
Number of properties 0
Trace identifier 9a80b630b08826d7
Number of snapshots: 0
Number of thumbnails: 0
=== Global Definitions =========================================================
Definition ID Attributes
--------------------------------------------------------------------------------
STRING 0 ""
STRING 1 "run_helper"
REGION 0 Name: "run_helper" <1> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING 2 "load_components_action"
REGION 1 Name: "load_components_action" <2> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING 3 "async_launch_policy_dispatch"
REGION 2 Name: "async_launch_policy_dispatch" <3> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING 4 "fibonacci_action"
REGION 3 Name: "fibonacci_action" <4> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING 5 "apex::profiler_listener::process_profiles"
REGION 4 Name: "apex::profiler_listener::process_profiles" <5> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: MEASUREMENT_SYSTEM, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING 6 "apex::process_profiles"
REGION 5 Name: "apex::process_profiles" <6> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: MEASUREMENT_SYSTEM, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING 7 "shutdown_all_action"
REGION 6 Name: "shutdown_all_action" <7> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING 8 "primary_namespace_colocate_action"
REGION 7 Name: "primary_namespace_colocate_action" <8> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING 9 "broadcast_call_shutdown_functions_action"
REGION 8 Name: "broadcast_call_shutdown_functions_action" <9> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING 10 "call_shutdown_functions_action"
REGION 9 Name: "call_shutdown_functions_action" <10> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING 11 "async"
REGION 10 Name: "async" <11> (Aka. "" <0>), Descr.: "" <0>, Role: TASK, Paradigm: USER, Flags: NONE, File: "" <0>, Begin: 0, End: 0
STRING 12 "GUID"
STRING 13 "Globaly unique identifier"
ATTRIBUTE 0 Name: "GUID" <12>, Description: "Globaly unique identifier" <13>, Type: UINT64
STRING 14 "Parent GUID"
STRING 15 "Globaly unique identifier of the parent task"
ATTRIBUTE 1 Name: "Parent GUID" <14>, Description: "Globaly unique identifier of the parent task" <15>, Type: UINT64
STRING 16 "count"
CLOCK_PROPERTIES Ticks per Seconds: 1000000000, Global Offset: 0, Length: 37693000
STRING 17 "node"
STRING 18 "Kevins-MacBook-Air.local"
SYSTEM_TREE_NODE 0 Name: "Kevins-MacBook-Air.local" <18>, Class: "node" <17>, Parent: UNDEFINED
STRING 19 "process 93544"
LOCATION_GROUP 0 Name: "process 93544" <19>, Type: PROCESS, Parent: "Kevins-MacBook-Air.local" <0>
STRING 20 "thread 00"
LOCATION 0 Name: "thread 00" <20>, Type: CPU_THREAD, # Events: 11, Group: "process 93544" <0>
STRING 21 "thread 01"
LOCATION 1 Name: "thread 01" <21>, Type: CPU_THREAD, # Events: 11, Group: "process 93544" <0>
STRING 22 "thread 02"
LOCATION 2 Name: "thread 02" <22>, Type: CPU_THREAD, # Events: 11, Group: "process 93544" <0>
STRING 23 "thread 03"
LOCATION 3 Name: "thread 03" <23>, Type: CPU_THREAD, # Events: 11, Group: "process 93544" <0>
STRING 24 "thread 04"
LOCATION 4 Name: "thread 04" <24>, Type: CPU_THREAD, # Events: 11, Group: "process 93544" <0>
STRING 25 "MPI_COMM_WORLD_LOCATIONS"
GROUP 0 Name: "MPI_COMM_WORLD_LOCATIONS" <25>, Type: COMM_LOCATIONS, Paradigm: MPI, Flags: NONE, 1 Member: "thread 00" <0>
STRING 26 "MPI_COMM_WORLD_GROUP"
GROUP 1 Name: "MPI_COMM_WORLD_GROUP" <26>, Type: COMM_GROUP, Paradigm: MPI, Flags: NONE, 1 Member: 0 ("thread 00" <0>)
STRING 27 "MPI_COMM_WORLD"
COMM 0 Name: "MPI_COMM_WORLD" <27>, Group: "MPI_COMM_WORLD_GROUP" <1>, Parent: UNDEFINED
=== Events =====================================================================
Event Location Timestamp Attributes
--------------------------------------------------------------------------------
ENTER 2 2500000 Region: "run_helper" <0>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
ENTER 1 2630000 Region: "load_components_action" <1>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2305843009213693954), ("Parent GUID" <1>; UINT64; 2)
LEAVE 2 2659000 Region: "run_helper" <0>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
ENTER 2 29096000 Region: "run_helper" <0>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
LEAVE 1 29124000 Region: "load_components_action" <1>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2305843009213693954), ("Parent GUID" <1>; UINT64; 2)
LEAVE 2 29655000 Region: "run_helper" <0>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
ENTER 3 29666000 Region: "async_launch_policy_dispatch" <2>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2305843009213693956), ("Parent GUID" <1>; UINT64; 2)
ENTER 1 29759000 Region: "run_helper" <0>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
LEAVE 3 29765000 Region: "async_launch_policy_dispatch" <2>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2305843009213693956), ("Parent GUID" <1>; UINT64; 2)
LEAVE 1 29772000 Region: "run_helper" <0>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
ENTER 2 29776000 Region: "async_launch_policy_dispatch" <2>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 9223372036854775811), ("Parent GUID" <1>; UINT64; 2)
ENTER 3 29794000 Region: "run_helper" <0>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
LEAVE 2 29797000 Region: "async_launch_policy_dispatch" <2>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 9223372036854775811), ("Parent GUID" <1>; UINT64; 2)
ENTER 1 29882000 Region: "async_launch_policy_dispatch" <2>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 4611686018427387906), ("Parent GUID" <1>; UINT64; 2)
ENTER 4 29890000 Region: "async_launch_policy_dispatch" <2>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 4611686018427387908), ("Parent GUID" <1>; UINT64; 2)
LEAVE 3 29894000 Region: "run_helper" <0>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2), ("Parent GUID" <1>; UINT64; 0)
LEAVE 4 29909000 Region: "async_launch_policy_dispatch" <2>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 4611686018427387908), ("Parent GUID" <1>; UINT64; 2)
ENTER 2 29912000 Region: "async_launch_policy_dispatch" <2>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 4611686018427387908), ("Parent GUID" <1>; UINT64; 2)
ENTER 4 29939000 Region: "fibonacci_action" <3>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2305843009213693958), ("Parent GUID" <1>; UINT64; 4611686018427387908)
LEAVE 1 29948000 Region: "async_launch_policy_dispatch" <2>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 4611686018427387906), ("Parent GUID" <1>; UINT64; 2)
LEAVE 2 29951000 Region: "async_launch_policy_dispatch" <2>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 4611686018427387908), ("Parent GUID" <1>; UINT64; 2)
ENTER 3 29953000 Region: "fibonacci_action" <3>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 9223372036854775813), ("Parent GUID" <1>; UINT64; 4611686018427387906)
ENTER 1 29965000 Region: "fibonacci_action" <3>
ADDITIONAL ATTRIBUTES: ("GUID" <0>; UINT64; 2305843009213693960), ("Parent GUID" <1>; UINT64; 4611686018427387908)
...
A view of the trace in Vampir:
A view of the trace in Traveler: