Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid cudf.dtype internally in favor of pre-defined, supported types #17839

Merged
merged 7 commits into from
Feb 4, 2025

Conversation

mroeschke
Copy link
Contributor

@mroeschke mroeschke commented Jan 28, 2025

Description

xref #12494 and #12495

cudf.dtype is useful when cudf is passed a dtype argument from a user to perform inference on the input to make it cudf-compatable. Internally, we don't need this inference because we know the exact types to be passed & that are supported by cudf (columns), so this PR avoids calling cudf.dtype internally.

Generally:

  • Define CUDF_STRING_DTYPE as a definitive cudf Python string type instead of cudf/np.dtype("O"/"object", "str")
  • Prefer using np.<type> instead of "<type>" (using np. like an enum namespace)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 28, 2025
@mroeschke mroeschke self-assigned this Jan 28, 2025
@mroeschke mroeschke requested a review from a team as a code owner January 28, 2025 18:48
@mroeschke mroeschke requested review from bdice and Matt711 January 28, 2025 18:48
@mroeschke mroeschke changed the base branch from branch-25.02 to branch-25.04 January 31, 2025 22:13
Comment on lines +830 to +831
int64 = np.dtype(np.int64)
max_int = np.iinfo(int64).max
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record I did some quick timings to see if these are worth caching (they're probably not):

In [2]: %timeit np.dtype(np.int64)
129 ns ± 0.738 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [6]: %timeit np.iinfo(np.dtype(np.int64))
563 ns ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [7]: %timeit np.iinfo(np.dtype(np.int64)).max
633 ns ± 5.25 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

@vyasr
Copy link
Contributor

vyasr commented Feb 4, 2025

/merge

@rapids-bot rapids-bot bot merged commit a7e0257 into rapidsai:branch-25.04 Feb 4, 2025
109 checks passed
@mroeschke mroeschke deleted the cln/dtype branch February 4, 2025 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants