Skip to content

Commit

Permalink
Merge branch 'main' into remove_opencl_assertions
Browse files Browse the repository at this point in the history
  • Loading branch information
lbushi25 authored Jun 22, 2024
2 parents 5464a37 + 1e9b1b4 commit 69d536a
Show file tree
Hide file tree
Showing 74 changed files with 908 additions and 718 deletions.
9 changes: 9 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Run dependencies versions update
version: 2
updates:
- package-ecosystem: "pip"
directory: "/third_party" # Location of package manifests
schedule:
interval: "daily"
# Run only required security updates
open-pull-requests-limit: 0
4 changes: 2 additions & 2 deletions include/ur_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -5023,8 +5023,8 @@ urKernelSetArgPointer(
ur_kernel_handle_t hKernel, ///< [in] handle of the kernel object
uint32_t argIndex, ///< [in] argument index in range [0, num args - 1]
const ur_kernel_arg_pointer_properties_t *pProperties, ///< [in][optional] pointer to USM pointer properties.
const void *pArgValue ///< [in][optional] USM pointer to memory location holding the argument
///< value. If null then argument value is considered null.
const void *pArgValue ///< [in][optional] Pointer obtained by USM allocation or virtual memory
///< mapping operation. If null then argument value is considered null.
);

///////////////////////////////////////////////////////////////////////////////
Expand Down
35 changes: 19 additions & 16 deletions scripts/core/PROG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,14 +77,14 @@ Initialization and Discovery
Device handle lifetime
----------------------

The device objects are reference-counted, and there are ${x}DeviceRetain and ${x}DeviceRelease.
The ref-count of a device is automatically incremented when device is obtained by ${x}DeviceGet.
After device is no longer needed to the application it must call to ${x}DeviceRelease.
When ref-count of the underlying device handle becomes zero then that device object is deleted.
Note, that besides the application itself, the Unified Runtime may increment and decrement ref-count on its own.
So, after the call to ${x}DeviceRelease below, the device may stay alive until other
objects attached to it, like command-queues, are deleted. But application may not use the device
after it released its own reference.
Device objects are reference-counted, using ${x}DeviceRetain and ${x}DeviceRelease.
The ref-count of a device is automatically incremented when a device is obtained by ${x}DeviceGet.
After a device is no longer needed by the application it must call ${x}DeviceRelease.
When the ref-count of the underlying device handle becomes zero then that device object is deleted.
Note that a Unified Runtime adapter may internally increment and decrement a device's ref-count.
So after the call to ${x}DeviceRelease below, the device may stay active until other
objects using it, such as a command-queue, are deleted. However, an application
may not use the device after it releases its last reference.

.. parsed-literal::
Expand Down Expand Up @@ -120,7 +120,7 @@ In case where the info size is only known at runtime then two calls are needed,
Device partitioning into sub-devices
------------------------------------

The ${x}DevicePartition could partition a device into sub-device. The exact representation and
${x}DevicePartition partitions a device into a sub-device. The exact representation and
characteristics of the sub-devices are device specific, but normally they each represent a
fixed part of the parent device, which can explicitly be programmed individually.

Expand Down Expand Up @@ -161,9 +161,10 @@ An implementation will return "0" in the count if no further partitioning is sup
Contexts
========

Contexts are serving the purpose of resources sharing (between devices in the same context),
and resources isolation (resources do not cross context boundaries). Resources such as memory allocations,
events, and programs are explicitly created against a context. A trivial work with context looks like this:
Contexts serve the purpose of resource sharing (between devices in the same context),
and resource isolation (ensuring that resources do not cross context
boundaries). Resources such as memory allocations, events, and programs are
explicitly created against a context.

.. parsed-literal::
Expand Down Expand Up @@ -235,18 +236,20 @@ explicit and implicit kernel arguments along with data needed for launch.
Queue and Enqueue
=================

A queue object represents a logic input stream to a device. Kernels
and commands are submitted to queue for execution using Equeue commands:
Queue objects are used to submit work to a given device. Kernels
and commands are submitted to queue for execution using Enqueue commands:
such as ${x}EnqueueKernelLaunch, ${x}EnqueueMemBufferWrite. Enqueued kernels
and commands can be executed in order or out of order depending on the
queue's property ${X}_QUEUE_FLAG_OUT_OF_ORDER_EXEC_MODE_ENABLE when the
queue is created.
queue is created. If a queue is out of order, the queue may internally do some
scheduling of work to achieve concurrency on the device, while honouring the
event dependencies that are passed to each Enqueue command.

.. parsed-literal::
// Create an out of order queue for hDevice in hContext
${x}_queue_handle_t hQueue;
${x}QueueCreate(hContext, hDevice,
${x}QueueCreate(hContext, hDevice,
${X}_QUEUE_FLAG_OUT_OF_ORDER_EXEC_MODE_ENABLE, &hQueue);
// Launch a kernel with 3D workspace partitioning
Expand Down
2 changes: 1 addition & 1 deletion scripts/core/kernel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,7 @@ params:
desc: "[in][optional] pointer to USM pointer properties."
- type: "const void*"
name: pArgValue
desc: "[in][optional] USM pointer to memory location holding the argument value. If null then argument value is considered null."
desc: "[in][optional] Pointer obtained by USM allocation or virtual memory mapping operation. If null then argument value is considered null."
returns:
- $X_RESULT_ERROR_INVALID_KERNEL_ARGUMENT_INDEX
- $X_RESULT_ERROR_INVALID_KERNEL_ARGUMENT_SIZE
Expand Down
6 changes: 5 additions & 1 deletion source/adapters/cuda/adapter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,12 @@ class ur_legacy_sink : public logger::Sink {

~ur_legacy_sink() = default;
};

// FIXME: Remove the default log level when querying logging info is supported
// through UR entry points. See #1330.
ur_adapter_handle_t_::ur_adapter_handle_t_()
: logger(logger::get_logger("cuda")) {
: logger(logger::get_logger("cuda",
/*default_log_level*/ logger::Level::ERR)) {

if (std::getenv("UR_LOG_CUDA") != nullptr)
return;
Expand Down
15 changes: 5 additions & 10 deletions source/adapters/cuda/context.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -142,16 +142,11 @@ UR_APIEXPORT ur_result_t UR_APICALL urContextGetNativeHandle(
}

UR_APIEXPORT ur_result_t UR_APICALL urContextCreateWithNativeHandle(
ur_native_handle_t hNativeContext, uint32_t numDevices,
const ur_device_handle_t *phDevices,
const ur_context_native_properties_t *pProperties,
ur_context_handle_t *phContext) {
std::ignore = hNativeContext;
std::ignore = numDevices;
std::ignore = phDevices;
std::ignore = pProperties;
std::ignore = phContext;

[[maybe_unused]] ur_native_handle_t hNativeContext,
[[maybe_unused]] uint32_t numDevices,
[[maybe_unused]] const ur_device_handle_t *phDevices,
[[maybe_unused]] const ur_context_native_properties_t *pProperties,
[[maybe_unused]] ur_context_handle_t *phContext) {
return UR_RESULT_ERROR_UNSUPPORTED_FEATURE;
}

Expand Down
6 changes: 3 additions & 3 deletions source/adapters/cuda/event.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ ur_event_handle_t_::ur_event_handle_t_(ur_context_handle_t Context,
CUevent EventNative)
: CommandType{UR_COMMAND_EVENTS_WAIT}, RefCount{1}, HasOwnership{false},
HasBeenWaitedOn{false}, IsRecorded{false}, IsStarted{false},
StreamToken{std::numeric_limits<uint32_t>::max()}, EventID{0},
EvEnd{EventNative}, EvStart{nullptr}, EvQueued{nullptr}, Queue{nullptr},
Stream{nullptr}, Context{Context} {
IsInterop{true}, StreamToken{std::numeric_limits<uint32_t>::max()},
EventID{0}, EvEnd{EventNative}, EvStart{nullptr}, EvQueued{nullptr},
Queue{nullptr}, Stream{nullptr}, Context{Context} {
urContextRetain(Context);
}

Expand Down
7 changes: 6 additions & 1 deletion source/adapters/cuda/event.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ struct ur_event_handle_t_ {

bool isCompleted() const noexcept;

bool isInterop() const noexcept { return IsInterop; };

uint32_t getExecutionStatus() const noexcept {

if (!isRecorded()) {
Expand Down Expand Up @@ -141,6 +143,8 @@ struct ur_event_handle_t_ {
bool IsStarted; // Signifies wether the operation associated with the
// UR event has started or not

const bool IsInterop{false}; // Made with urEventCreateWithNativeHandle

uint32_t StreamToken;
uint32_t EventID; // Queue identifier of the event.

Expand Down Expand Up @@ -195,7 +199,8 @@ ur_result_t forLatestEvents(const ur_event_handle_t *EventWaitList,
CUstream LastSeenStream = 0;
for (size_t i = 0; i < Events.size(); i++) {
auto Event = Events[i];
if (!Event || (i != 0 && Event->getStream() == LastSeenStream)) {
if (!Event || (i != 0 && !Event->isInterop() &&
Event->getStream() == LastSeenStream)) {
continue;
}

Expand Down
3 changes: 2 additions & 1 deletion source/adapters/cuda/kernel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,8 @@ urKernelSetArgPointer(ur_kernel_handle_t hKernel, uint32_t argIndex,
const ur_kernel_arg_pointer_properties_t *pProperties,
const void *pArgValue) {
std::ignore = pProperties;
hKernel->setKernelArg(argIndex, sizeof(pArgValue), pArgValue);
// setKernelArg is expecting a pointer to our argument
hKernel->setKernelArg(argIndex, sizeof(pArgValue), &pArgValue);
return UR_RESULT_SUCCESS;
}

Expand Down
32 changes: 10 additions & 22 deletions source/adapters/cuda/memory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,6 @@ UR_APIEXPORT ur_result_t UR_APICALL urMemBufferCreate(

auto URMemObj = std::unique_ptr<ur_mem_handle_t_>(
new ur_mem_handle_t_{hContext, flags, AllocMode, HostPtr, size});
if (URMemObj == nullptr) {
return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY;
}

// First allocation will be made at urMemBufferCreate if context only
// has one device
Expand All @@ -74,6 +71,8 @@ UR_APIEXPORT ur_result_t UR_APICALL urMemBufferCreate(
MemObj = URMemObj.release();
} catch (ur_result_t Err) {
return Err;
} catch (std::bad_alloc &) {
return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY;
} catch (...) {
return UR_RESULT_ERROR_OUT_OF_RESOURCES;
}
Expand Down Expand Up @@ -102,15 +101,9 @@ UR_APIEXPORT ur_result_t UR_APICALL urMemRelease(ur_mem_handle_t hMem) {
return UR_RESULT_SUCCESS;
}

// make sure hMem is released in case checkErrorUR throws
// Call destructor
std::unique_ptr<ur_mem_handle_t_> MemObjPtr(hMem);

if (hMem->isSubBuffer()) {
return UR_RESULT_SUCCESS;
}

UR_CHECK_ERROR(hMem->clear());

} catch (ur_result_t Err) {
Result = Err;
} catch (...) {
Expand Down Expand Up @@ -230,13 +223,12 @@ UR_APIEXPORT ur_result_t UR_APICALL urMemImageCreate(
UR_ASSERT(pImageFormat->channelOrder == UR_IMAGE_CHANNEL_ORDER_RGBA,
UR_RESULT_ERROR_UNSUPPORTED_IMAGE_FORMAT);

auto URMemObj = std::unique_ptr<ur_mem_handle_t_>(
new ur_mem_handle_t_{hContext, flags, *pImageFormat, *pImageDesc, pHost});

UR_ASSERT(std::get<SurfaceMem>(URMemObj->Mem).PixelTypeSizeBytes,
UR_RESULT_ERROR_UNSUPPORTED_IMAGE_FORMAT);

try {
auto URMemObj = std::unique_ptr<ur_mem_handle_t_>(new ur_mem_handle_t_{
hContext, flags, *pImageFormat, *pImageDesc, pHost});
UR_ASSERT(std::get<SurfaceMem>(URMemObj->Mem).PixelTypeSizeBytes,
UR_RESULT_ERROR_UNSUPPORTED_IMAGE_FORMAT);

if (PerformInitialCopy) {
for (const auto &Device : hContext->getDevices()) {
// Synchronous behaviour is best in this case
Expand All @@ -248,16 +240,12 @@ UR_APIEXPORT ur_result_t UR_APICALL urMemImageCreate(
}
}

if (URMemObj == nullptr) {
return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY;
}

*phMem = URMemObj.release();
} catch (ur_result_t Err) {
(*phMem)->clear();
return Err;
} catch (std::bad_alloc &) {
return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY;
} catch (...) {
(*phMem)->clear();
return UR_RESULT_ERROR_UNKNOWN;
}
return UR_RESULT_SUCCESS;
Expand Down
1 change: 1 addition & 0 deletions source/adapters/cuda/memory.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -394,6 +394,7 @@ struct ur_mem_handle_t_ {
}

~ur_mem_handle_t_() {
clear();
if (isBuffer() && isSubBuffer()) {
urMemRelease(std::get<BufferMem>(Mem).Parent);
return;
Expand Down
12 changes: 9 additions & 3 deletions source/adapters/cuda/physical_mem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,14 @@ UR_APIEXPORT ur_result_t UR_APICALL urPhysicalMemCreate(
default:
UR_CHECK_ERROR(Result);
}
*phPhysicalMem = new ur_physical_mem_handle_t_(ResHandle, hContext, hDevice);

try {
*phPhysicalMem =
new ur_physical_mem_handle_t_(ResHandle, hContext, hDevice);
} catch (std::bad_alloc &) {
return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY;
} catch (...) {
return UR_RESULT_ERROR_UNKNOWN;
}
return UR_RESULT_SUCCESS;
}

Expand All @@ -53,10 +59,10 @@ urPhysicalMemRelease(ur_physical_mem_handle_t hPhysicalMem) {

ScopedContext Active(hPhysicalMem->getDevice());
UR_CHECK_ERROR(cuMemRelease(hPhysicalMem->get()));
return UR_RESULT_SUCCESS;
} catch (ur_result_t err) {
return err;
} catch (...) {
return UR_RESULT_ERROR_OUT_OF_RESOURCES;
}
return UR_RESULT_SUCCESS;
}
57 changes: 37 additions & 20 deletions source/adapters/cuda/program.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -187,23 +187,30 @@ ur_result_t createProgram(ur_context_handle_t hContext,
UR_RESULT_ERROR_INVALID_CONTEXT);
UR_ASSERT(size, UR_RESULT_ERROR_INVALID_SIZE);

std::unique_ptr<ur_program_handle_t_> RetProgram{
new ur_program_handle_t_{hContext, hDevice}};

if (pProperties) {
if (pProperties->count > 0 && pProperties->pMetadatas == nullptr) {
return UR_RESULT_ERROR_INVALID_NULL_POINTER;
} else if (pProperties->count == 0 && pProperties->pMetadatas != nullptr) {
return UR_RESULT_ERROR_INVALID_SIZE;
try {
std::unique_ptr<ur_program_handle_t_> RetProgram{
new ur_program_handle_t_{hContext, hDevice}};

if (pProperties) {
if (pProperties->count > 0 && pProperties->pMetadatas == nullptr) {
return UR_RESULT_ERROR_INVALID_NULL_POINTER;
} else if (pProperties->count == 0 &&
pProperties->pMetadatas != nullptr) {
return UR_RESULT_ERROR_INVALID_SIZE;
}
UR_CHECK_ERROR(
RetProgram->setMetadata(pProperties->pMetadatas, pProperties->count));
}
UR_CHECK_ERROR(
RetProgram->setMetadata(pProperties->pMetadatas, pProperties->count));
}

auto pBinary_string = reinterpret_cast<const char *>(pBinary);
auto pBinary_string = reinterpret_cast<const char *>(pBinary);

UR_CHECK_ERROR(RetProgram->setBinary(pBinary_string, size));
*phProgram = RetProgram.release();
UR_CHECK_ERROR(RetProgram->setBinary(pBinary_string, size));
*phProgram = RetProgram.release();
} catch (std::bad_alloc &) {
return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY;
} catch (...) {
return UR_RESULT_ERROR_UNKNOWN;
}

return UR_RESULT_SUCCESS;
}
Expand Down Expand Up @@ -317,6 +324,8 @@ urProgramLink(ur_context_handle_t hContext, uint32_t count,

} catch (ur_result_t Err) {
Result = Err;
} catch (std::bad_alloc &) {
return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY;
}
return Result;
}
Expand Down Expand Up @@ -345,16 +354,24 @@ urProgramGetBuildInfo(ur_program_handle_t hProgram, ur_device_handle_t hDevice,
UrReturnHelper ReturnValue(propSize, pPropValue, pPropSizeRet);

switch (propName) {
case UR_PROGRAM_BUILD_INFO_STATUS: {
case UR_PROGRAM_BUILD_INFO_STATUS:
return ReturnValue(hProgram->BuildStatus);
}
case UR_PROGRAM_BUILD_INFO_OPTIONS:
return ReturnValue(hProgram->BuildOptions.c_str());
case UR_PROGRAM_BUILD_INFO_LOG:
return ReturnValue(hProgram->InfoLog, hProgram->MaxLogSize);
case UR_PROGRAM_BUILD_INFO_BINARY_TYPE: {
return ReturnValue(hProgram->BinaryType);
case UR_PROGRAM_BUILD_INFO_LOG: {
// We only know the maximum log length, which CUDA guarantees will include
// the null terminator.
// To determine the actual length of the log, search for the first
// null terminator, not searching past the known maximum. If that does find
// one, it will return the length excluding the null terminator, so remember
// to include that.
auto LogLen =
std::min(hProgram->MaxLogSize,
strnlen(hProgram->InfoLog, hProgram->MaxLogSize) + 1);
return ReturnValue(hProgram->InfoLog, LogLen);
}
case UR_PROGRAM_BUILD_INFO_BINARY_TYPE:
return ReturnValue(hProgram->BinaryType);
default:
break;
}
Expand Down
7 changes: 3 additions & 4 deletions source/adapters/cuda/queue.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -167,12 +167,11 @@ urQueueCreate(ur_context_handle_t hContext, ur_device_handle_t hDevice,

return UR_RESULT_SUCCESS;
} catch (ur_result_t Err) {

return Err;

} catch (std::bad_alloc &) {
return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY;
} catch (...) {

return UR_RESULT_ERROR_OUT_OF_RESOURCES;
return UR_RESULT_ERROR_UNKNOWN;
}
}

Expand Down
Loading

0 comments on commit 69d536a

Please sign in to comment.