Avoid `cudf.dtype` internally in favor of pre-defined, supported types #17839

mroeschke · 2025-01-28T18:48:32Z

Description

cudf.dtype is useful when cudf is passed a dtype argument from a user to perform inference on the input to make it cudf-compatable. Internally, we don't need this inference because we know the exact types to be passed & that are supported by cudf (columns), so this PR avoids calling cudf.dtype internally.

Generally:

Define CUDF_STRING_DTYPE as a definitive cudf Python string type instead of cudf/np.dtype("O"/"object", "str")
Prefer using np.<type> instead of "<type>" (using np. like an enum namespace)

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

vyasr · 2025-02-04T01:10:34Z

python/cudf/cudf/core/column/datetime.py

+            int64 = np.dtype(np.int64)
+            max_int = np.iinfo(int64).max


For the record I did some quick timings to see if these are worth caching (they're probably not):

In [2]: %timeit np.dtype(np.int64) 129 ns ± 0.738 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) In [6]: %timeit np.iinfo(np.dtype(np.int64)) 563 ns ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) In [7]: %timeit np.iinfo(np.dtype(np.int64)).max 633 ns ± 5.25 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

vyasr · 2025-02-04T01:12:08Z

/merge

mroeschke added 6 commits January 27, 2025 10:50

Use less cudf.dtype

24bc57a

Merge remote-tracking branch 'upstream/branch-25.02' into cln/dtype

72ba09c

use less cudf.dtype

27c2c8a

Merge remote-tracking branch 'upstream/branch-25.02' into cln/dtype

5f3540c

Fix some typos

9f6e8d3

Merge remote-tracking branch 'upstream/branch-25.02' into cln/dtype

ca82449

mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 28, 2025

mroeschke self-assigned this Jan 28, 2025

mroeschke requested a review from a team as a code owner January 28, 2025 18:48

mroeschke requested review from bdice and Matt711 January 28, 2025 18:48

Merge remote-tracking branch 'upstream/branch-25.02' into cln/dtype

0dee036

mroeschke changed the base branch from branch-25.02 to branch-25.04 January 31, 2025 22:13

vyasr approved these changes Feb 4, 2025

View reviewed changes

rapids-bot bot merged commit a7e0257 into rapidsai:branch-25.04 Feb 4, 2025
109 checks passed

mroeschke deleted the cln/dtype branch February 4, 2025 01:26

mroeschke mentioned this pull request Feb 5, 2025

More avoid cudf.dtype internally in favor of pre-defined, supported types #17918

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid `cudf.dtype` internally in favor of pre-defined, supported types #17839

Avoid `cudf.dtype` internally in favor of pre-defined, supported types #17839

mroeschke commented Jan 28, 2025 •

edited

Loading

vyasr Feb 4, 2025

vyasr commented Feb 4, 2025

Avoid cudf.dtype internally in favor of pre-defined, supported types #17839

Avoid cudf.dtype internally in favor of pre-defined, supported types #17839

Conversation

mroeschke commented Jan 28, 2025 • edited Loading

Description

Checklist

vyasr Feb 4, 2025

Choose a reason for hiding this comment

vyasr commented Feb 4, 2025

Avoid `cudf.dtype` internally in favor of pre-defined, supported types #17839

Avoid `cudf.dtype` internally in favor of pre-defined, supported types #17839

mroeschke commented Jan 28, 2025 •

edited

Loading