Skip to content

Commit

Permalink
Update oneDPL documentation for 2022.4.0 release
Browse files Browse the repository at this point in the history
  • Loading branch information
ValentinaKats committed Nov 21, 2023
1 parent 281dddf commit 4131ceb
Show file tree
Hide file tree
Showing 45 changed files with 8,882 additions and 60 deletions.
4 changes: 2 additions & 2 deletions _sources/cmake_support.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The minimal supported CMake version for |onedpl_short| is 3.11 on Linux and 3.20
The supported `CMake Generator <https://cmake.org/cmake/help/latest/manual/cmake-generators.7.html#ninja-generators>`_ for Linux is `Unix Makefiles <https://cmake.org/cmake/help/latest/generator/Unix%20Makefiles.html>`_ (default). In the Windows environment, the supported generator is `Ninja <https://cmake.org/cmake/help/latest/generator/Ninja.html>`_ as described in the `Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference <https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/current/use-cmake-with-the-compiler.html>`_ which may be specified via ``-GNinja``.

|onedpl_short| Backend Options
==============
==============================

Backend for Parallel Execution Policies (par and par_unseq)
-----------------------------------------------------------
Expand Down Expand Up @@ -88,7 +88,7 @@ Below is an example ``Linux`` CMake invocation which generates Unix makefiles fo
mkdir build && cd build
cmake -DCMAKE_CXX_COMPILER=icpx -DCMAKE_BUILD_TYPE=release -DONEDPL_PAR_BACKEND=tbb ..
Below is an example ``Windows`` CMake invocation which generates ``Ninja`` build scripts (see :ref:`Requirements Section<Requirements>`) for the project in the parent directory with the ``icx`` compiler, ``OpenMP`` backend and ``debug`` build type:
Below is an example ``Windows`` CMake invocation which generates ``Ninja`` build scripts (see the Requirements Section) for the project in the parent directory with the ``icx`` compiler, ``OpenMP`` backend and ``debug`` build type:

.. code:: cpp
Expand Down
248 changes: 248 additions & 0 deletions _sources/dynamic_selection_api/auto_tune_policy.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
Auto-Tune Policy
################

The dynamic selection API is an experimental feature in the |onedpl_long|
(|onedpl_short|) that selects an *execution resource* based on a chosen
*selection policy*. There are several policies provided as part
of the API. Policies encapsulate the logic and any associated state needed
to make a selection.

The auto-tune policy selects resources using runtime profiling. ``auto_tune_policy``
is useful for determining which resource performs best
for a given kernel. The choice is made based on runtime performance
history, so this policy is only useful for kernels that have stable
performance. Initially, this policy acts like ``round_robin_policy``,
rotating through each resource (one or more times). Then, once it has
determined which resource is performing best, it uses that resource
thereafter. Optionally, a resampling interval can be set to return to
the profiling phase periodically.

.. code:: cpp
namespace oneapi::dpl::experimental {
template<typename Backend = sycl_backend>
class auto_tune_policy {
public:
// useful types
using resource_type = typename Backend::resource_type;
using wait_type = typename Backend::wait_type;
class selection_type {
public:
auto_tune_policy<Backend> get_policy() const;
resource_type unwrap() const;
};
// constructors
auto_tune_policy(deferred_initialization_t);
auto_tune_policy(uint64_t resample_interval_in_milliseconds = 0);
auto_tune_policy(const std::vector<resource_type>& u,
uint64_t resample_interval_in_milliseconds = 0);
// deferred initializer
void initialize(uint64_t resample_interval_in_milliseconds = 0);
void initialize(const std::vector<resource_type>& u,
uint64_t resample_interval_in_milliseconds = 0);
// queries
auto get_resources() const;
auto get_submission_group();
// other implementation defined functions...
};
}
This policy can be used with all the dynamic selection functions, such as ``select``, ``submit``,
and ``submit_and_wait``. It can also be used with ``policy_traits``.

Example
-------

In the following example, an ``auto_tune_policy`` is used to dynamically select between
two queues, a CPU queue and a GPU queue.

.. code:: cpp
#include <oneapi/dpl/dynamic_selection>
#include <sycl/sycl.hpp>
#include <iostream>
namespace ex = oneapi::dpl::experimental;
int main() {
std::vector<sycl::queue> r { sycl::queue{sycl::cpu_selector_v},
sycl::queue{sycl::gpu_selector_v} };
const std::size_t N = 10000;
std::vector<float> av(N, 0.0);
std::vector<float> bv(N, 0.0);
std::vector<float> cv(N, 0.0);
for (int i = 0; i < N; ++i) {
av[i] = bv[i] = i;
}
ex::auto_tune_policy p{r}; // (1)
{
sycl::buffer<float> a_b(av);
sycl::buffer<float> b_b(bv);
sycl::buffer<float> c_b(cv);
for (int i = 0; i < 6; ++i) {
ex::submit_and_wait(p, [&](sycl::queue q) { // (2)
// (3)
std::cout << (q.get_device().is_cpu() ? "using cpu\n" : "using gpu\n");
return q.submit([&](sycl::handler &h) { // (4)
sycl::accessor a_a(a_b, h, sycl::read_only);
sycl::accessor b_a(b_b, h, sycl::read_only);
sycl::accessor c_a(c_b, h, sycl::read_write);
h.parallel_for(N, [=](auto i) { c_a[i] = a_a[i] + b_a[i]; });
});
});
};
}
for (int i = 0; i < N; ++i) {
if (cv[i] != 2*i) {
std::cout << "ERROR!\n";
}
}
std::cout << "Done.\n";
}
The key points in this example are:

#. An ``auto_tune_policy`` is constructed to select between the CPU and GPU.
#. ``submit_and_wait`` is invoked with the policy as the first argument. The selected queue will be passed to the user-provided function.
#. For clarity when run, the type of device is displayed.
#. The queue is used in function to perform and asynchronous offload. The SYCL event returned from the call to ``submit`` is returned. Returning an event is required for functions passed to ``submit`` and ``submit_and_wait``.

Selection Algorithm
-------------------

The selection algorithm for ``auto_tune_policy`` uses runtime profiling
to choose the best resource for the given function. A simplified, expository
implementation of the selection algorithm follows:

.. code::
template<typename Function, typename ...Args>
selection_type auto_tune_policy::select(Function&& f, Args&&...args) {
if (initialized_) {
auto k = make_task_key(f, args...);
auto tuner = get_tuner(k);
auto offset = tuner->get_resource_to_profile();
if (offset == use_best) {
return selection_type {*this, tuner->best_resource_, tuner};
} else {
auto r = resources_[offset];
return selection{*this, r, tuner};
}
} else {
throw std::logic_error(“selected called before initialization”);
}
}
where ``make_task_key`` combines the inputs, including the function and its
arguments, into a key that uniquely identifies the user function that is being
profiled. ``tuner`` is the encapsulated logic for performing runtime profiling
and choosing the best option for a given key. When the call to ``get_resource_to_profile()``
return ``use_best``, the tuner is not in the profiling phase, and so the previously
determined best resource is used. Otherwise, the resource at index ``offset``
in the ``resources_`` vector is used and its resulting performance is profiled.
When an ``auto_tune_policy`` is initialized with a non-zero resample interval,
the policy will periodically return to the profiling phase base on the provided
interval value.

Constructors
------------

``auto_tune_policy`` provides three constructors.

.. list-table:: ``auto_tune_policy`` constructors
:widths: 50 50
:header-rows: 1

* - Signature
- Description
* - ``auto_tune_policy(deferred_initialization_t);``
- Defers initialization. An ``initialize`` function must be called prior to use.
* - ``auto_tune_policy(uint64_t resample_interval_in_milliseconds = 0);``
- Initialized to use the default set of resources. An optional resampling interval can be provided.
* - ``auto_tune_policy(const std::vector<resource_type>& u, uint64_t resample_interval_in_milliseconds = 0);``
- Overrides the default set of resources. An optional resampling interval can be provided.

Deferred Initialization
-----------------------

A ``auto_tune_policy`` that was constructed with deferred initialization must be
initialized by calling one its ``initialize`` member functions before it can be used
to select or submit.

.. list-table:: ``auto_tune_policy`` constructors
:widths: 50 50
:header-rows: 1

* - Signature
- Description
* - ``initialize(uint64_t resample_interval_in_milliseconds = 0);``
- Initialize to use the default set of resources. An optional resampling interval can be provided.
* - ``initialize(const std::vector<resource_type>& u, uint64_t resample_interval_in_milliseconds = 0);``
- Overrides the default set of resources. An optional resampling interval can be provided.

Queries
-------

A ``auto_tune_policy`` has ``get_resources`` and ``get_submission_group``
member functions.

.. list-table:: ``auto_tune_policy`` constructors
:widths: 50 50
:header-rows: 1

* - Signature
- Description
* - ``std::vector<resource_type> get_resources();``
- Returns the set of resources the policy is selecting from.
* - ``auto get_submission_group();``
- Returns an object that can be used to wait for all active submissions.

Reporting Requirements
----------------------

If a resource returned by ``select`` is used directly without calling
``submit`` or ``submit_and_wait``, it may be necessary to call ``report``
to provide feedback to the policy. The ``auto_tune_policy`` tracks the
performance of submissions on each device via callbacks that report
the execution time. The instrumentation to report these events is included
in the implementations of ``submit`` and ``submit_and_wait``. However, if you
use ``select`` and then submit work directly to the selected resource, it
is necessary to explicitly report these events.

.. list-table:: ``auto_tune_policy`` reporting requirements
:widths: 50 50
:header-rows: 1

* - ``execution_info``
- is reporting required?
* - ``task_submission``
- No
* - ``task_completion``
- No
* - ``task_time``
- Yes

In generic code, it is possible to perform compile-time checks to avoid
reporting overheads when reporting is not needed, while still writing
code that will work with any policy, as demonstrated below:

.. code:: cpp
auto s = select(my_policy);
if constexpr (report_info_v<decltype(s), execution_info::task_submission_t>)
{
s.report(execution_info::task_submission);
}
Loading

0 comments on commit 4131ceb

Please sign in to comment.