Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow separate queue for fft and ifft #17

Merged
merged 9 commits into from
Jan 6, 2023
Merged

Conversation

isuruf
Copy link
Contributor

@isuruf isuruf commented Jun 9, 2022

No description provided.

@@ -155,20 +155,24 @@ def _make_config(self):
int(self.use_lut), int(self.keepShaderCode),
n_batch, skipx, skipy, skipz)

def fft(self, src: cla.Array, dest: cla.Array = None):
def fft(self, src: cla.Array, dest: cla.Array = None, queue: cl.CommandQueue = None):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the array also carries a queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which can be None right? Any suggestions on what to do?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be None, but I think it's reasonable to expect an array passed in here to have a valid queue.

If you were designing this API from scratch, I think the following fallback order would make sense:

  • queue (if passed)
  • src.queue and dest.queue (if given, must be the same)
  • I would not consider self.queue.

For backward compatibility, I would warn if self.queue disagrees with queue resulting from the above and say that self.queue is being used for now, but that that behavior is deprecated, with a certain fuse.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the 'simple' fft API (pyvkfft.fft.fftn, pyvkfft.fft.ifftn...) already allows to transform just by passing an array - its queue will be used as default, and a different one can be supplied. This already allows to use different queues for different transforms.

If you were designing this API from scratch, I think the following fallback order would make sense:

  • queue (if passed)
  • src.queue and dest.queue (if given, must be the same)
  • I would not consider self.queue.

For backward compatibility, I would warn if self.queue disagrees with queue resulting from the above and say that self.queue is being used for now, but that that behavior is deprecated, with a certain fuse.

There can be some strange results when different queues are used, e.g. when using a different queue from the array's, the get() method will not wait for the fft to be finished.

I would still keep VkFFTApp's self.queue as the default (assuming it's a deliberate choice from the developer), but a warning about when different queues are used will be useful.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queue choice is indeed debatable.. Preferring the array queues could also make sense.. It really depends on the use case.

The main use case is probably not a single array which is transformed with different queues at different times, but rather different arrays which are transformed by the same VkFFTApp. In which case using the array queues would indeed be easier...

vd = []
vapp = []

with cl.CommandQueue(ctx) as cq:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the queue using with may be dangerous here. This means the queue should not be used anymore outside of the withblock, but a reference to that queue will still persist in the VkFFTApp object. It is not used later in the code so it is fine here, but I think it's giving a dangerous example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that it's dangerous to keep persisting the queue in VkFFTApp. Latest pyopencl will give a warning if the queue is used outside of the with block though. inducer/pyopencl#561

Test multiple FFT with queues different from the queue used
in creating the VkFFTApp
"""
for dtype in (np.complex64, np.complex128):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check for complex128/fp64 support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. I'm already checking for complex128.

pyvkfft/opencl.py Outdated Show resolved Hide resolved
@vincefn
Copy link
Owner

vincefn commented Oct 24, 2022

Hi @isuruf , @inducer - sorry for the long delay, as you can see I'm still undecided regarding which queue policy to follow, but it would be good to finalise this soon.

Using the queue supplied as a parameter to fft or ifft as the first choice is very clear.

However when no queue is passed to fft/ifft, I don't like the idea of having another two (or three) extra choices (the app, source or destiantion queues) - that will be confusing.

So in the end the two possibilities:

  1. use the FFT app queue and ignore the array queues. This has the advantage of using the same approach as the cuda API (the arrays don't have a stream attribute AFAIK), and avoids issues with mismatched source and destination arrays queue, and has no backward-compatibility issues
  2. use the source and destination arrays (hopefully matching) and deprecate the app's queue (as in the current PR). This allows to use the embedded queues and rely in with_queue but I don't use this personally

So I prefer 1) which seems clearer but if there is a widespread reliance on the embedded arrays queues then maybe 2) would be better. Let me know you thoughts.

@inducer
Copy link

inducer commented Dec 28, 2022

I think the key point is that a single app object could validly be used, concurrently, by work going on in multiple queues, and in that case, the app's queue is an unhappy default. The queue that comes with the array is much more likely to be right.

As for input and output having separate queues, pyopencl only ever considers queues on inputs, at least for now. inducer/pyopencl#668. But if the information is used, I think it's wise to check that things match, at least in __debug__ mode.

@vincefn
Copy link
Owner

vincefn commented Jan 5, 2023

I think the key point is that a single app object could validly be used, concurrently, by work going on in multiple queues, and in that case, the app's queue is an unhappy default. The queue that comes with the array is much more likely to be right.

Ok, you probably have a broader view of usage of pyopencl queues, so let's do it your way (#17 (comment)). I'll try to add warning if queues don't match the app's - with a way to disable it.

I need to finish handling strides first...

* strides:
  Update notebooks
  Add support for F-ordered array in the simple fft interface (C2C and R2C). Fox use of opencl_platform in tests
  Fix tests using F-strided arrays. Allow to select the backend also for the basic test, and add the option to select the opencl platform.
  Check also for OSError in fftp.py. Should fix vincefn#21
  Also display the platforma and device name along vkfft error check
  Non-C-contiguous arrays: add systematic test, fix DCT check, upodate some doc and outputs
  Add support for non C-ordered arrays (F-order or different types of strides). Supports both C2C and R2C (as long as the fast axis is transformed). Adresses vincefn#20
  Update changelog
  More elegant re-ordering array during accuracy test, when necessary
  During accuracy tests, make sure the array after an R2C transform still has the fastest axis along the last dimension (X) - since this is not the case after a 2D numpy rfftn transform. Fixes vincefn#19
Fix backend check in pyvkfft-test
queue = src.queue

if queue != self.queue:
self.queue.finish()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to include a finish() here , as it would break asynchronous execution.
It may create issues but anyone juggling multiple queues should manage synchronisation explicitly.

elif src.queue is None:
queue = dest.queue
else:
warnings.warn("queue is not given and the source/dest arrays are not equal. "
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will switch to a RuntimeError instead, no queue supplied and different src and destination queues is a recipe for disasters

@vincefn
Copy link
Owner

vincefn commented Jan 6, 2023

Note that I am now editing this in https://github.com/vincefn/pyvkfft/tree/queue

…nd destination arrays differ and no queue is supplied to fft() or ifft() ; use array queues also for the simple fft interface ; expand tests.

Fix use of config variables.
@vincefn vincefn merged commit 52a45f7 into vincefn:master Jan 6, 2023
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants