Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA kernels that are implemented but not optimal #2987

Open
1 of 16 tasks
jpivarski opened this issue Jan 25, 2024 · 0 comments
Open
1 of 16 tasks

CUDA kernels that are implemented but not optimal #2987

jpivarski opened this issue Jan 25, 2024 · 0 comments
Assignees
Labels
performance Works, but not fast enough or uses too much memory

Comments

@jpivarski
Copy link
Member

jpivarski commented Jan 25, 2024

This is primarily for record-keeping, so that we don't forget about CUDA kernels that should be revisited someday. To be in this list, a kernel must be implemented correctly (in main or an impending PR), but have some reason to be rewritten. The list is to help us stick to the policy that existence is the first priority and optimization is second, without the temptation to go down a rabbit-hole of optimizing every kernel before moving on to the next one.

Variable-length inner loop:

  • awkward_IndexedArray_ranges_next_64
  • awkward_IndexedArray_ranges_carry_next_64
  • awkward_ListArray_getitem_jagged_numvalid
  • awkward_ListArray_getitem_next_range_spreadadvanced
  • awkward_ListArray_broadcast_tooffsets
  • awkward_ListArray_localindex
  • awkward_ListOffsetArray_drop_none_indexes
  • awkward_ListOffsetArray_reduce_local_nextparents_64
  • awkward_ListArray_rpad_axis1
  • awkward_ListOffsetArray_rpad_axis1
  • awkward_ListArray_combinations_length
  • awkward_NumpyArray_pad_zero_to_length
  • awkward_NumpyArray_rearrange_shifted
  • awkward_UnionArray_flatten_combine
  • awkward_UnionArray_nestedfill_tags_index
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Works, but not fast enough or uses too much memory
Projects
None yet
Development

No branches or pull requests

2 participants