Version 0.8.0 Beta 2: nvFatbin support, CUDA 12.x features, sync-async op unification, etc. #697
eyalroz
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Changes since v0.7.1:
Support for the
nvFatbin
library (#681)cuda::fatbin_builder_t
class: One creates a builder, adds various fragments of fatbin-contained content (cubin, PTX, LTO IR etc.), then finally uses thebuild()
orbuild_at()
method to obtain the completed, final, tabin file data, in a region of memory.cuda-api-wrappers::fatbin
, which one should depend on when actually using the builder.Support for more CUDA 12.x features
cuda::module_t
, with the methodunique_span<kernel_t> get_kernels() const
.kernel_t::mangled_name()
(regards Make mangled and unmangled (kernel) names more difficult to mix up #674)(Note these features are not accessible if you're using the wrappers with CUDA 11.x)
More
unique_span
class changesLike a recently-cut gem, one slowly polishes it until it gains it shines brightly... we had some work on unique_span in version 0.7.1 as well, and it continues:
swap()
implementationT
to a span ofconst T
.release()
, nor our move construction, can benoexcept
- removed that marking based only on optimismoptional_ref
& partial unification of async and non-async memory operationsoptional_ref
class, for passing optional arguments which are references. See this blog post by Foonathan about the problems of putting references in C++ optional's.cuda::memory::foo()
andcuda::memory::async::foo()
variants now have a single variant,cuda::memory::foo()
, which takes an extraoptional_ref<stream_t>
parameter: When it's not set, it's a synchronous(ish) operation; when it is set - the operation is asynchronous and scheduled on the stream.copy_single()
had disagreed - one took a pointer, the other a reference. With their unification, they now agree (and take a pointer).Bug fixes
The poor man's optional class
value_or()
now returns a value...value_or()
is now constIn example programs
Other changes
Build mechanism
In the wrapper APIs themselves
size_t
's inlaunch_config_builder_t
's methods - so as to prevent narrowing-cast warnings and checking limits ourselves.In example programs
This discussion was created from the release Version 0.8.0 Beta 2: nvFatbin support, CUDA 12.x features, sync-async op unification, etc..
Beta Was this translation helpful? Give feedback.
All reactions