cuDF-style operations & NVTX annotations for local CuPy benchmark #548

charlesbluca · 2021-03-09T19:41:32Z

This PR adds the following to the local CuPy benchmark:

Binary column sum and gather operations to simulate cuDF workloads
NVTX annotations for the array creation + execution of operations
args.rmm_pool_size to the worker memory pool setup so user-defined pool sizes work

Some questions:

Any other binary operations that would be good to add here?
Is it useful to annotate the creation of the arrays or would we only want to focus on the execution?

charlesbluca · 2021-03-10T16:56:59Z

I'm getting several compute failures from the workers when running col_gather on GPU:

distributed.worker - WARNING -  Compute Failed
Function:  subgraph_callable
args:      (array([11.08869371,  9.76416487,  9.31292166, ..., 10.41836123,
        9.80410377,  9.77784505]), array([2291, 5547, 8535, 1871, 2223, 4458, 6611, 7833, 8261, 7254, 6056,
       7562, 9498, 5647, 7596, 8377, 4880, 1084, 2684,  963, 5022, 9411,
       6386, 4742, 3783, 2484, 3130, 2164,  819, 3724, 8617, 8715, 1461,
         17, 5364, 7686, 1637, 6863, 6874, 3560, 4886, 4124, 2118, 6952,
       2808, 6576, 4916, 6581, 9481, 8495, 5226, 6423, 5373, 9632, 8108,
       3473, 6938, 7232, 1036, 2322, 8367, 2768, 4547, 4037, 7740, 4796,
       5866, 3676, 3729, 3463, 7181, 4787, 9370,  307, 1434, 4229,  679,
       2565, 9295, 7488, 5522,  178, 9472, 3680, 5059, 8768,  722, 8821,
        517, 5554, 4750, 6523, 3201, 3481, 8359, 3766, 4612, 2760, 7938,
       8729,  613, 6512, 4262, 4310, 7651, 3464, 3890, 8028, 1525, 7431,
       8875, 2472, 8936, 4221, 6307, 4995, 1810, 2369, 3319, 2230, 9494,
       6608, 8302, 8743, 9187, 2860, 1606, 5046, 4743, 7703, 4226, 9117,
       2999,  700, 1127,
kwargs:    {}
Exception: TypeError("Unsupported type <class 'numpy.ndarray'>")

From a cursory glance, it looks like something is happening in slice_with_int_dask_array that is returning a Dask array with NumPy chunks from our CuPy chunked input.

pentschev · 2021-03-10T20:18:01Z

@charlesbluca dask/dask#7364 should fix the issue above.

charlesbluca · 2021-03-10T20:22:43Z

Thanks @pentschev!

jakirkham · 2021-03-11T03:53:35Z

rerun tests

jakirkham · 2021-03-11T03:56:02Z

It looks like there is a style issue. Charles, could you please run black locally and commit the changes?

charlesbluca · 2021-03-11T16:41:26Z

Done! Thanks for the catch @jakirkham 🙂

jakirkham · 2021-03-11T17:07:39Z

@pentschev do you have any more thoughts here? 🙂

charlesbluca · 2021-03-11T17:46:58Z

dask_cuda/benchmarks/local_cupy.py

+
+        func_args = (x, idx)
+
+        func = lambda x, idx: x[idx]


Should we be using .copy() here like we do with the slicing operation?

Good point, I remember when I wrote CuPy benchmarks, I had to add .copy to ensure it actually slice the array instead of returning a view. I'm not totally sure whether there's a case where Dask would return a view only, do you know @jakirkham ?

That depends on what computational backend Dask is using. If we are using the threaded scheduler, it is probably a view. If we are using the Distributed Scheduler, it may be a view if the data was already on that worker. Otherwise it wouldn't be

I guess the safest here is to actually profile both cases. Ideally what we would see if both cases aren't returning a view is:

With .copy(): some kernels with some copy (or copies) at the end;

Without .copy(): same kernels as above but not copies at the end.

charlesbluca · 2021-03-11T20:00:36Z

It looks like the compute failures are still happening even with @pentschev's fix - I'll try to dig more into this.

jakirkham · 2021-03-11T20:29:09Z

Where are you seeing failures? Is this locally? AFAICT CI passed (though haven't dug into the logs)

pentschev · 2021-03-11T20:33:00Z

It looks like the compute failures are still happening even with @pentschev's fix - I'll try to dig more into this.

Do you have more details?

This is the simplified version of your code I used as sample to test for that Dask PR:

import numpy as np, cupy as cp, dask.array as da

rs = da.random.RandomState(RandomState=cp.random.RandomState)

x = rs.normal(10, 1, (1000,))
idx = rs.randint(0, len(x), (1000,))

print(x[idx].compute())
print(type(x[idx].compute()))

charlesbluca · 2021-03-11T20:58:17Z

Where are you seeing failures? Is this locally? AFAICT CI passed (though haven't dug into the logs)

This is local, using the col_gather operation with default arguments (though the results are the same with UCX enabled).

Do you have more details?

It seems like this is an issue with persist(); the compute failures come up when trying to return a result there. Could this be a problem with the creation of the Dask graph there?

https://github.com/dask/distributed/blob/bd65bef6dcdea1f057ac782a28db02cfe77de4e3/distributed/client.py#L2947-L2961

jakirkham · 2021-03-11T23:45:22Z

Could you please reduced this to an MRE and file on the Distributed repo?

charlesbluca · 2021-03-12T14:28:51Z

Sure! Would this be better suited for Distributed or Dask? It seems like this is a problem regardless of if a cluster is in use:

import cupy
import dask.array as da

rs = da.random.RandomState(RandomState=cupy.random.RandomState)

x = rs.normal(10, 1, (1000,))
idx = rs.randint(0, len(x), (1000,))

x[idx].persist()

Outputs:

Traceback (most recent call last):
  File "mre.py", line 9, in <module>
    x[idx].persist()
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/base.py", line 256, in persist
    (result,) = persist(self, traverse=False, **kwargs)
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/base.py", line 770, in persist
    results = schedule(dsk, keys, **kwargs)
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/threaded.py", line 76, in get
    results = get_async(
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/local.py", line 487, in get_async
    raise_exception(exc, tb)
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/local.py", line 317, in reraise
    raise exc
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/core.py", line 121, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/core.py", line 121, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/core.py", line 115, in _execute_task
    return [_execute_task(a, cache) for a in arg]
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/core.py", line 115, in <listcomp>
    return [_execute_task(a, cache) for a in arg]
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/optimization.py", line 963, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/core.py", line 151, in get
    result = _execute_task(task, cache)
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/utils.py", line 35, in apply
    return func(*args, **kwargs)
  File "/datasets/charlesb/miniconda3/envs/ptds-bench/lib/python3.8/site-packages/dask/array/chunk.py", line 317, in slice_with_int_dask_array
    idx = idx - offset
  File "cupy/core/core.pyx", line 1079, in cupy.core.core.ndarray.__sub__
  File "cupy/core/core.pyx", line 1466, in cupy.core.core.ndarray.__array_ufunc__
  File "cupy/core/_kernel.pyx", line 1060, in cupy.core._kernel.ufunc.__call__
  File "cupy/core/_kernel.pyx", line 109, in cupy.core._kernel._preprocess_args
TypeError: Unsupported type <class 'numpy.ndarray'>

pentschev · 2021-03-12T15:18:22Z

Can you please double-check you indeed have dask/dask#7364 in your install? The example you posted works for me after that PR.

charlesbluca · 2021-03-12T15:30:47Z

Just checked and I do have those commits in my install. I also see now that your local tests are failing too, so this is probably something with my local env.

What version of CuPy are you using? I realized I had been using a version with cupy#4322 but I'm still getting the same errors with 8.5.0.

jakirkham · 2021-03-12T15:41:07Z

Would make sure that any existing dask install has been removed before installing dask in development mode

charlesbluca · 2021-03-12T15:50:46Z

Made sure dask was entirely uninstalled + removed from the env before doing dev install, still getting the same error - I think this might be an issue with my version/installation of CuPy.

pentschev · 2021-03-12T15:58:18Z

You also need NumPy>=1.20, can you confirm you have that too?

charlesbluca · 2021-03-12T16:04:41Z

Yup, that was it - thanks for the help!

pentschev

LGTM, and since the issues @charlesbluca was having were resolved, I'm gonna go ahead and merge this. Thanks for working on this!

pentschev · 2021-03-12T22:12:19Z

dask_cuda/benchmarks/local_cupy.py

+        rng = start_range(message="make array(s)", color="green")
        x = rs.random((args.size, args.size), chunks=args.chunk_size).persist()
        await wait(x)
+        end_range(rng)


Given it's only two lines, I'm not sure it's worth the trouble, but feels like the start_range/end_range would be a perfect candidate for a decorate function. Just saying for future consideration, no action required. 🙂

Or using contextmanager?

cc @shwina (in case you have thoughts on how to do this 🙂)

I was considering the same - I'll look into that if we end up extending the benchmark again 😁

pentschev · 2021-03-12T22:13:18Z

@gpucibot merge

jakirkham · 2021-03-12T22:35:45Z

Thanks for the PR Charles and Peter for the review! 😄

charlesbluca · 2021-03-12T23:52:42Z

Thanks for the reviews and environment help!

jakirkham · 2021-03-15T22:10:57Z

Just realized we missed the column masking case, submitted PR ( #553 ) to include that

charlesbluca added 2 commits March 9, 2021 11:02

Add cuDF-style operations, NVTX traces

669d61e

Remove boolean masking operation

d687312

charlesbluca added python python code needed 3 - Ready for Review Ready for review by team non-breaking Non-breaking change labels Mar 9, 2021

charlesbluca requested a review from a team as a code owner March 9, 2021 19:41

charlesbluca mentioned this pull request Mar 9, 2021

PTDS Benchmarks #517

Open

charlesbluca added the improvement Improvement / enhancement to an existing function label Mar 9, 2021

Fix 'col_gather' operation

6cf4c27

pentschev mentioned this pull request Mar 10, 2021

Support int slicing for non-NumPy arrays dask/dask#7364

Merged

3 tasks

Black formatting

51970b8

jakirkham approved these changes Mar 11, 2021

View reviewed changes

charlesbluca commented Mar 11, 2021

View reviewed changes

pentschev approved these changes Mar 12, 2021

View reviewed changes

rapids-bot bot merged commit 09196cb into rapidsai:branch-0.19 Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuDF-style operations & NVTX annotations for local CuPy benchmark #548

cuDF-style operations & NVTX annotations for local CuPy benchmark #548

charlesbluca commented Mar 9, 2021

charlesbluca commented Mar 10, 2021 •

edited

Loading

pentschev commented Mar 10, 2021

charlesbluca commented Mar 10, 2021

jakirkham commented Mar 11, 2021

jakirkham commented Mar 11, 2021

charlesbluca commented Mar 11, 2021

jakirkham commented Mar 11, 2021

charlesbluca Mar 11, 2021

pentschev Mar 11, 2021

jakirkham Mar 11, 2021

pentschev Mar 11, 2021

charlesbluca commented Mar 11, 2021

jakirkham commented Mar 11, 2021

pentschev commented Mar 11, 2021

charlesbluca commented Mar 11, 2021

jakirkham commented Mar 11, 2021

charlesbluca commented Mar 12, 2021

pentschev commented Mar 12, 2021

charlesbluca commented Mar 12, 2021

jakirkham commented Mar 12, 2021

charlesbluca commented Mar 12, 2021

pentschev commented Mar 12, 2021

charlesbluca commented Mar 12, 2021

pentschev left a comment

pentschev Mar 12, 2021

jakirkham Mar 12, 2021

charlesbluca Mar 12, 2021

pentschev commented Mar 12, 2021

jakirkham commented Mar 12, 2021

charlesbluca commented Mar 12, 2021

jakirkham commented Mar 15, 2021

cuDF-style operations & NVTX annotations for local CuPy benchmark #548

cuDF-style operations & NVTX annotations for local CuPy benchmark #548

Conversation

charlesbluca commented Mar 9, 2021

charlesbluca commented Mar 10, 2021 • edited Loading

pentschev commented Mar 10, 2021

charlesbluca commented Mar 10, 2021

jakirkham commented Mar 11, 2021

jakirkham commented Mar 11, 2021

charlesbluca commented Mar 11, 2021

jakirkham commented Mar 11, 2021

charlesbluca Mar 11, 2021

Choose a reason for hiding this comment

pentschev Mar 11, 2021

Choose a reason for hiding this comment

jakirkham Mar 11, 2021

Choose a reason for hiding this comment

pentschev Mar 11, 2021

Choose a reason for hiding this comment

charlesbluca commented Mar 11, 2021

jakirkham commented Mar 11, 2021

pentschev commented Mar 11, 2021

charlesbluca commented Mar 11, 2021

jakirkham commented Mar 11, 2021

charlesbluca commented Mar 12, 2021

pentschev commented Mar 12, 2021

charlesbluca commented Mar 12, 2021

jakirkham commented Mar 12, 2021

charlesbluca commented Mar 12, 2021

pentschev commented Mar 12, 2021

charlesbluca commented Mar 12, 2021

pentschev left a comment

Choose a reason for hiding this comment

pentschev Mar 12, 2021

Choose a reason for hiding this comment

jakirkham Mar 12, 2021

Choose a reason for hiding this comment

charlesbluca Mar 12, 2021

Choose a reason for hiding this comment

pentschev commented Mar 12, 2021

jakirkham commented Mar 12, 2021

charlesbluca commented Mar 12, 2021

jakirkham commented Mar 15, 2021

charlesbluca commented Mar 10, 2021 •

edited

Loading