-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Embree build with SYCL support and OneAPI 2024.1.0 #6808
Enable Embree build with SYCL support and OneAPI 2024.1.0 #6808
Conversation
Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes. |
Hi @lumurillo can you update the embree Python examples to use SYCL devices, if available?
|
Switch to standard sycl:: namespace for sycl API Rename internal open3d namespace to open3d::core::sy to avoid name collision TODO: Replace deprecated SYCL APIs
94a7db2
to
048008c
Compare
struct CPUImpl; | ||
#ifdef BUILD_SYCL_MODULE | ||
struct SYCLImpl; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue here. To keep the ABI same for sycl and C++ builds, use a single pointer to DevImpl (device impl.) instead of keeping 2 structs. In the ctor, we can initialize this to CPUImpl * or SYCLImpl* appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to keep both, e.g. if the user explicitly wants to choose CPU, despite having a SYCL device. In this case, let's keep both all the time.
I would recommend changing each XImpl
to XImpl*
(for all), in line with the PIMPL pattern (pointer to implementation). This ensures the ABI doesn't change later, even if we add data to the XImpl
structs.
typedef Eigen::Matrix<float, 2, 1, Eigen::DontAlign> Vec2f; | ||
typedef Eigen::Vector3f Vec3f; | ||
|
||
void enablePersistentJITCache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYCL only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use sycl_target_sources for files with sycl code in cmake. style fix
#else | ||
setenv("SYCL_CACHE_PERSISTENT","1",1); | ||
setenv("SYCL_CACHE_DIR","cache",1); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to SYCLUtils.cpp and enable the user to enable and disable caching. Also, we should not overwrite user preferences (if they have already set these env vars, do not change them). The SYCL_CACHE_DIR should be somewhere in the home folder. For reference, here is the CUDA ptx cache default folders:
https://developer.nvidia.com/blog/cuda-pro-tip-understand-fat-binaries-jit-caching/
Perhaps $HOME/.sycl/ComputeCache/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults for sycl:
https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md.
I think we should not change the default cache location. Enabling cache by default makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cb71f22
to
aa8f03e
Compare
aa8f03e
to
d3e68ea
Compare
f4c3a26
to
6284e3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Luis, looks good! See some review comments below.
[Performance] About queue.wait_and_throw()
- I would recommend avoiding it in each function. Instead have a sycl utils level wait_and_throw()
. This lets users launch a raycast task on the GPU, do something else on the CPU (maybe read more data, etc.) then do a queue.wait_and_throw()
to get results.
The current alternative would be multithreading (check if GIL is released for raycast ops in Python), but this needs quite a bit more effort from the user.
@@ -12,10 +12,14 @@ | |||
#include "open3d/t/geometry/RaycastingScene.h" | |||
|
|||
// This header is in the embree src dir (embree/src/ext_embree/..). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move comment
#else | ||
setenv("SYCL_CACHE_PERSISTENT","1",1); | ||
setenv("SYCL_CACHE_DIR","cache",1); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults for sycl:
https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md.
I think we should not change the default cache location. Enabling cache by default makes sense.
struct CPUImpl; | ||
#ifdef BUILD_SYCL_MODULE | ||
struct SYCLImpl; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to keep both, e.g. if the user explicitly wants to choose CPU, despite having a SYCL device. In this case, let's keep both all the time.
I would recommend changing each XImpl
to XImpl*
(for all), in line with the PIMPL pattern (pointer to implementation). This ensures the ABI doesn't change later, even if we add data to the XImpl
structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Optional] Does it make sense to move the SYCL code to a new file RaycastingSceneSYCL.cpp
?
queue_.wait_and_throw(); | ||
|
||
// Free the allocated memory | ||
sycl::free(previous_geom_prim_ID_tfar, queue_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a memory leak, if there's an exception in the queue above? To prevent that, set previous_geom_prim_ID_tfar=nullptr;
after free
and do another free
in the SYCLImpl
dtor if the ptr is not null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -689,16 +1169,50 @@ struct RaycastingScene::Impl { | |||
LoopFn); | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use STL here for ArraySum and ArrayPartialSum, and memcpy for CopyArray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -122,14 +122,10 @@ def test_test_lots_of_occlusions(device): | |||
def test_add_triangle_mesh(device): | |||
cube = o3d.t.geometry.TriangleMesh.from_legacy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove from_legacy
for mesh creation functions now.
Some updates needed to pass the CUDA CI tests. I made a branch copy and ran them here: @lumurillo I'll take care of this. |
Looks good. We can visit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Enable the SYCL support for the Embree package.
Type
Motivation and Context
This change will allow the implementation of SYCL GPU implementation for the
RayCastingScene
class.Checklist:
python util/check_style.py --apply
to apply Open3D code styleto my code.
updated accordingly.
results (e.g. screenshots or numbers) here.
Description