.to_numpy(), .to_cupy(), etc. #55

ax3l · 2022-08-03T16:08:30Z

For all objects that expose the __array_interface__, we should add a helper member function called .to_numpy() that does nothing but create a view:

np.array(self, copy=False, order='F')
cupy.array(self, copy=False, order='F')

Similar to:

Equivalently, we want to add .to_copy() et al. functions for #30 and later DLPack interfaces.

The text was updated successfully, but these errors were encountered:

ax3l · 2022-10-21T00:42:16Z

Did some tests with the order='F' and it's not fully obvious if helpful. Does keep the order of args in shape and array index access the same way it seems...

ax3l · 2023-06-06T23:48:51Z

@dpgrote @RemiLehe and I did some performance tests of:

import cupy as cp
x = cp.random.rand(10000, 20000)
f = cp.copy(x, order='F')

x.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True

f.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True

r=x**2; cp.cuda.runtime.deviceSynchronize()
%timeit -n 10 r=x**2; cp.cuda.runtime.deviceSynchronize()
283 ms ± 3.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

r=f**2; cp.cuda.runtime.deviceSynchronize()
%timeit -n 10 r=f**2; cp.cuda.runtime.deviceSynchronize()
409 ms ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Here (and we also remember from numpy), the kernels always loop contiguously by what is assumed the fastest running index in C. Thus, we would not do the user a favor returning our data with:

strides updated for F
shape updated for F
np.array(self, copy=False, order='F') (for arr.flags being F_CONTIGUOUS).

ax3l · 2023-06-07T00:00:40Z

Idea: we add an order='F' argument to our .to_... functions, doing the conventional/convenient (F) thing by default and documenting the fast (C) thing for the expert tuner to be preferred with external libraries :)

ax3l · 2023-08-07T18:24:27Z

It turns out this is pretty easy to achieve via .T, which does not return a copy but a view:

In [1]: import numpy as np

In [2]: x = np.array([[1,2,3], [4,5,6]])

In [3]: x.flags
Out[3]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

In [4]: x.T.flags
Out[4]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

Seen in an implementation by @dpgrote

ax3l · 2023-08-07T18:42:15Z

Updated #88.

ax3l · 2023-08-07T18:42:24Z

Opened a performance request in cupy in cupy/cupy#7783

ax3l added the enhancement New feature or request label Aug 3, 2022

ax3l mentioned this issue Oct 21, 2022

Helpers: to_numpy/cupy #88

Merged

7 tasks

ax3l mentioned this issue Aug 7, 2023

Improve Performance for F-Ordered Arrays cupy/cupy#7783

Open

ax3l closed this as completed in #88 Sep 21, 2023

ax3l added the backend: cuda Specific to CUDA execution (GPUs) label Oct 3, 2023

ax3l mentioned this issue Oct 3, 2023

Copy to Host: Array4.to_numpy(copy=True) #196

Merged

ax3l mentioned this issue Oct 19, 2023

Fix: Pinned Allocators First #209

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.to_numpy(), .to_cupy(), etc. #55

.to_numpy(), .to_cupy(), etc. #55

ax3l commented Aug 3, 2022 •

edited

Loading

ax3l commented Oct 21, 2022

ax3l commented Jun 6, 2023

ax3l commented Jun 7, 2023 •

edited

Loading

ax3l commented Aug 7, 2023 •

edited

Loading

ax3l commented Aug 7, 2023

ax3l commented Aug 7, 2023

.to_numpy(), .to_cupy(), etc. #55

.to_numpy(), .to_cupy(), etc. #55

Comments

ax3l commented Aug 3, 2022 • edited Loading

ax3l commented Oct 21, 2022

ax3l commented Jun 6, 2023

ax3l commented Jun 7, 2023 • edited Loading

ax3l commented Aug 7, 2023 • edited Loading

ax3l commented Aug 7, 2023

ax3l commented Aug 7, 2023

ax3l commented Aug 3, 2022 •

edited

Loading

ax3l commented Jun 7, 2023 •

edited

Loading

ax3l commented Aug 7, 2023 •

edited

Loading