[GPU] Fix in-order queue synchronization issue related to OCL/OneDNN impls interaction with CPU impls #17976

sshlyapn · 2023-06-09T13:52:02Z

Details:

Aligned behavior of ocl_stream::enqueue_marker() method with OpenCL specification (marker with empty dependencies vector waits for all previous tasks in the queue)
Fixed events collection in network::execute_primitive() method. Previously we stored events only for OOO queue and profiling mode, despite the fact that we need events for synchronization between GPU and CPU impls in in-order queue
Added dependency events preparation in primitive_inst::execute() method for CPU implementations and optimized out implementation if it has CPU users (previously it was done only for OOO queue)
Return marker from onednn_primitive::execute() method if the next operation require synchronization
Related tests added

Tickets:

ticket-id

isanghao · 2023-06-12T05:11:43Z

src/plugins/intel_gpu/src/graph/impls/onednn/primitive_onednn_base.h

+            // Enqueue marker with empty events wait list (which will trigger wait for all previously enqueued tasks) and
+            // return it as oneDNN primitive's event as it is a single option for proper synchronization
+            if (instance.is_output_event())
+                event = stream.enqueue_marker({});


Is it to ensure that all onednn kernels are done before CPU layer execution, not for network output? If so, could you describe that on comment? I think it is unclear why it is necessary..

Yes, in most cases we need it to ensure that onednn kernel is finished before CPU layer execution, but also event with marker will be created if onednn primitive will be network's output (it may be used in loop primitive for networks synchronization, and maybe in some other cases I'm not aware of...)
And I think we can try to use it for get_memory() or reset_network() methods for in_order queue - in such a case we will avoid clFinish() call at all which theoretically may improve performance a bit (if I'm not mistaken, clFinish() call flushes caches, and maybe has some other side-effects)

@isanghao, I have updated comment and mentioned cases when this event is used

Hmm basically I agree with removing clfinish as much as possible.
However I have a curiosity about "flush cache" you mentioned, becuase
previously I experienced cache inconsistency b/w CPU and dGPU on host memory even though I did clFinish.

src/plugins/intel_gpu/src/graph/primitive_inst.cpp

…impls interaction with CPU impls

src/plugins/intel_gpu/src/graph/impls/common/loop.cpp

isanghao · 2023-06-14T01:13:34Z

Performance looks good:
--this /home/mingyuki/log/pr/2023-06-13--21-31-PR17976.report --ref /home/mingyuki/log/pr/2023-06-13--18-06-PR18003.report

…impls interaction with CPU impls (openvinotoolkit#17976)

sshlyapn added the category: GPU OpenVINO GPU plugin label Jun 9, 2023

sshlyapn added this to the 2023.1 milestone Jun 9, 2023

sshlyapn requested review from a team as code owners June 9, 2023 13:52

sshlyapn force-pushed the fix_in_order_queue_sync branch from 78e171f to 97bd3cd Compare June 9, 2023 13:53

isanghao reviewed Jun 12, 2023

View reviewed changes

[GPU] Fix in-order queue synchronization issue related to OCL/OneDNN …

9c331b9

…impls interaction with CPU impls

sshlyapn force-pushed the fix_in_order_queue_sync branch from 97bd3cd to 9c331b9 Compare June 12, 2023 16:32

yeonbok reviewed Jun 13, 2023

View reviewed changes

src/plugins/intel_gpu/src/graph/impls/common/loop.cpp Show resolved Hide resolved

yeonbok approved these changes Jun 13, 2023

View reviewed changes

isanghao approved these changes Jun 14, 2023

View reviewed changes

isanghao merged commit e631f65 into openvinotoolkit:master Jun 14, 2023

alvoron pushed a commit to alvoron/openvino that referenced this pull request Jun 21, 2023

[GPU] Fix in-order queue synchronization issue related to OCL/OneDNN …

bf1033b

…impls interaction with CPU impls (openvinotoolkit#17976)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Fix in-order queue synchronization issue related to OCL/OneDNN impls interaction with CPU impls #17976

[GPU] Fix in-order queue synchronization issue related to OCL/OneDNN impls interaction with CPU impls #17976

sshlyapn commented Jun 9, 2023 •

edited

Loading

isanghao Jun 12, 2023

sshlyapn Jun 12, 2023

sshlyapn Jun 12, 2023

yeonbok Jun 13, 2023

isanghao commented Jun 14, 2023

[GPU] Fix in-order queue synchronization issue related to OCL/OneDNN impls interaction with CPU impls #17976

[GPU] Fix in-order queue synchronization issue related to OCL/OneDNN impls interaction with CPU impls #17976

Conversation

sshlyapn commented Jun 9, 2023 • edited Loading

Details:

Tickets:

isanghao Jun 12, 2023

Choose a reason for hiding this comment

sshlyapn Jun 12, 2023

Choose a reason for hiding this comment

sshlyapn Jun 12, 2023

Choose a reason for hiding this comment

yeonbok Jun 13, 2023

Choose a reason for hiding this comment

isanghao commented Jun 14, 2023

sshlyapn commented Jun 9, 2023 •

edited

Loading