Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test #7

Open
wants to merge 1,699 commits into
base: branch-24.06
Choose a base branch
from
Open

test #7

wants to merge 1,699 commits into from

Conversation

galipremsagar
Copy link
Owner

No description provided.

@github-actions github-actions bot added the ci label Mar 27, 2024
@galipremsagar galipremsagar reopened this Mar 27, 2024
galipremsagar pushed a commit that referenced this pull request Nov 8, 2024
Fixes call to `data_type{}` ctor in `json_test.cpp`. The 2-parameter ctor is for fixed-point-types only and will assert in a debug build if used incorrectly: https://github.com/rapidsai/cudf/blob/2db58d58b4a986c2c6fad457f291afb1609fd458/cpp/include/cudf/types.hpp#L277-L280

Partial stack trace from a gdb run
```
#5  0x000077b1530bc71b in __assert_fail_base (fmt=0x77b153271130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=0x58c3e4baaa98 "id == type_id::DECIMAL32 || id == type_id::DECIMAL64 || id == type_id::DECIMAL128",
    file=0x58c3e4baaa70 "/cudf/cpp/include/cudf/types.hpp", line=279, function=<optimized out>) at ./assert/assert.c:92
#6  0x000077b1530cde96 in __GI___assert_fail (
    assertion=0x58c3e4baaa98 "id == type_id::DECIMAL32 || id == type_id::DECIMAL64 || id == type_id::DECIMAL128",
    file=0x58c3e4baaa70 "/cudf/cpp/include/cudf/types.hpp", line=279, function=0x58c3e4baaa38 "cudf::data_type::data_type(cudf::type_id, int32_t)")
    at ./assert/assert.c:101
#7  0x000058c3e48ba594 in cudf::data_type::data_type (this=0x7fffdd3f7530, id=cudf::type_id::STRING, scale=0) at /cudf/cpp/include/cudf/types.hpp:279
#8  0x000058c3e49215d9 in JsonReaderTest_MixedTypesWithSchema_Test::TestBody (this=0x58c3e5ea13a0) at /cudf/cpp/tests/io/json/json_test.cpp:2887

```

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - Muhammad Haseeb (https://github.com/mhaseeb123)

URL: rapidsai#17273
mroeschke and others added 19 commits December 13, 2024 18:30
Follow up to rapidsai#16715.

Now that the usages of the `masked` keyword in RAPIDS have been address (rapidsai/cuspatial#1496 is the only one I could find), I think we can remove this keyword all together in this method

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#17530
From an offline conversation, fixes the follow discrepancy between cudf and pandas

```python
In [1]: import cudf

In [2]: import numpy as np

In [3]: ser = cudf.Series([np.nan, np.nan, 0.9], nan_as_null=False)

In [4]: ser
Out[4]: 
0    NaN
1    NaN
2    0.9
dtype: float64

In [5]: ser.quantile(0.9)
Out[5]: np.float64(nan)

In [6]: import pandas as pd

In [7]: ser = pd.Series([np.nan, np.nan, 0.9])

In [8]: ser.quantile(0.9)
Out[8]: np.float64(0.9)
```

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#17593
This PR exposes all json reader options in pylibcudf and enables it via kwargs in `cudf.read_json`
since kwargs cannot be used in cython, kwargs is passed as dict to cython.
These options are hidden in docs intentionally, as these options are mostly used for testing feature requests from spark json reader now. These options are expected to change.

Authors:
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Matthew Murray (https://github.com/Matt711)

URL: rapidsai#17563
Contributes to rapidsai#17317

More can be removed once my other cudf._lib PRs are in

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#17586
Contributes to rapidsai#17317

Also I found that `PackedColumns` was not being use anywhere. It appears it was added back in rapidsai#8153 for dask_cudf but I cannot see it being used there anymore

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#17548
… `/src`) (rapidsai#17550)

Replaced the calls to `cudaMemcpyAsync` with the new `cuda_memcpy`/`cuda_memcpy_async` utility, which optionally avoids using the copy engine.

Also took the opportunity to use cudf::detail::host_vector and its factories to enable wider pinned memory use.

Remaining instances are either not viable (e.g. copying `h_needs_fallback`, interop) or D2D copies.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Nghia Truong (https://github.com/ttnghia)

URL: rapidsai#17550
…, ...)` type (rapidsai#17604)

From an offline discussion, a pandas object with an `category[interval[...]]` type would be incorrectly be interpreted as a `category[struct[...]]` type. This can cause further problems with `cudf.pandas` as a `category[struct[...]]` type cannot be properly interpreted by pandas.

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#17604
Clang-tidy does not like `[[nodiscard]]` after `__device__` and I don't like red squigly lines.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - David Wendt (https://github.com/davidwendt)

URL: rapidsai#17608
Recent changes in dask and dask-expr have broken `dask_cudf.read_csv` (dask/dask-expr#1178, dask/dask#11603). Fortunately, the breaking changes help us avoid legacy CSV code in the long run.

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#17612
…i#17611)

A recent nightly failure discovered by @davidwendt here: https://github.com/rapidsai/cudf/actions/runs/12367794950/job/34543121050 indicates an environment cannot be created with `pytorch>=2.4.0` and `pyarrow==14.0.0 & 14.0.1`. Thus this bump to `14.0.2`.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#17611
This PR has two fixes:
- Since we're pinning to a commit, a shallow clone will start failing as soon as HEAD gets bumped on the main branch (which will happen next when cuml/raft logging features are merged). We need to stop using shallow clones.
- The CMake code for setting the default logging levels was setting the wrong macro name.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#17588
…apidsai#17610)

Fixes memcheck error found in nightly build checks in the STREAM_REPLACE_TEST's `ReplaceTest.NormalizeNansAndZerosMutable` gtest. The mutable-view passed to the `cudf::normalize_nans_and_zeros` API was pointing to invalidated data.

The following line created the invalid view
```
cudf::mutable_column_view mutable_view = cudf::column(input, cudf::test::get_default_stream());
```
The temporary `cudf::column` is destroyed once the `mutable_view` is created so this view would now point to a freed column. The view must be created from a non-temporary column and also must be non-temporary itself so that it is not implicitly converted to a `column_view`.

Error introduced by rapidsai#17436

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: rapidsai#17610
Forward-merge branch-25.02 into branch-25.04
@galipremsagar galipremsagar force-pushed the test branch 2 times, most recently from 2b55b8b to e627f5e Compare January 31, 2025 21:44
mroeschke and others added 25 commits January 31, 2025 22:28
This migrates amd64 CI jobs (PRs and nightlies) to use L4 GPUs from the NVKS cluster.

xref: rapidsai/build-infra#184

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: rapidsai#17877
Moving forward with removal of the (redundant) `gpu` namespace in cuIO.
Also moved the entire ORC implementation to `cudf::io::orc::detail`, leaving only the implementation of the public API in `cudf::io::orc`.

Also removed a few unused headers, or moved them to be included in the right files.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Muhammad Haseeb (https://github.com/mhaseeb123)

URL: rapidsai#17891
## Description
This PR fixes cudf ci nightly test failures:
https://github.com/rapidsai/cudf/actions/runs/13097249137/job/36541039646

## Checklist
- [x] I am familiar with the [Contributing
Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md).
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.
Forward-merge branch-25.02 into branch-25.04
`data` attribute of numpy should be marked private as it actually points to the underlying memory and it will be distinct for a cupy array.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#17890
Forward-merge branch-25.02 into branch-25.04
rapidsai#17839)

xref rapidsai#12494 and rapidsai#12495

`cudf.dtype` is useful when cudf is passed a `dtype` argument from a user to perform inference on the input to make it cudf-compatable. Internally, we don't need this inference because we know the exact types to be passed & that are supported by cudf (columns), so this PR avoids calling `cudf.dtype` internally.

Generally:

* Define `CUDF_STRING_DTYPE` as a definitive cudf Python string type instead of `cudf/np.dtype("O"/"object", "str")`
* Prefer using `np.<type>` instead of `"<type>"` (using `np.` like an enum namespace)

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#17839
Contributes to rapidsai/build-planning#146

Proposes:

* setting `[tool.scikit-build].ninja.make-fallback = false`, so `scikit-build-core` will not silently fallback to using GNU Make if `ninja` is not available

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#17894
…sistency (rapidsai#17908)

Some older code in the ORC reader/writer uses PascalCase, which is not used in the rest of libcudf. This PR renames such functions and types to align the style with the rest of the code base.

The types that are based on the ORC specs are kept as PascalCase to make it easy to identify such types.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Muhammad Haseeb (https://github.com/mhaseeb123)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#17908
This is a small change adding a script to run pylibcudf tests, like we have for other Python libraries in this repository.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Matthew Murray (https://github.com/Matt711)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#17882
Forward-merge branch-25.02 into branch-25.04
Run the CI for `spark-rapids-jni` to ensure that we don't break their build.

Resolves rapidsai#17337

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

Approvers:
  - Gera Shegalov (https://github.com/gerashegalov)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#17781
…es in cudf::io (rapidsai#17734)

As part of the improvement effort discussed in rapidsai#15907, this merge request removes some of the excessive `std::string` copies and uses `std::string_view` in place of `std::string` when the lifetime semantics are clear.

`std::string` is only replaced in this MR in linear functions and constructors, but not in structs as there's no established ownership or lifetime semantics to guarantee the `string_view`s will not outlive their source.
There were also some cases of excessive copies, i.e. consider:

```cpp
struct source_info{
source_info(std::string const& s) : str{s}{}

private:
std::string str;
};
```

In the above example, the string is likely to be allocated twice if a temporary/string-literal is used to construct "s": one for the temporary and one for the copy constructor for `str`

```cpp
struct source_info{
source_info(std::string  s) : str{std::move(s)}{}

private:
std::string str;
};
```

The string is only allocated once in all scenarios.
This also applies to `std::vector` and is arguably worse as there's no small-vector-optimization (i.e. `std::string`'s small-string-optimization/SSO).

Authors:
  - Basit Ayantunde (https://github.com/lamarrr)
  - Muhammad Haseeb (https://github.com/mhaseeb123)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Muhammad Haseeb (https://github.com/mhaseeb123)
  - David Wendt (https://github.com/davidwendt)

URL: rapidsai#17734
…apidsai#17859)

The PTX parser replaces PTX code with inline PTX code (using inline ASM blocks).
It considers a branch label and the immediate instruction as a single unit to process.  
During the ASM->CUDA transform step,  it searches for the `ret` instruction in the string and replaces the whole statement and not the substring that contains the `ret;` instruction. which means an expression like:

```asm

BB0_1:
ret;
```

is parsed as: 

```asm 

BB0_1: ret;

```

and then transformed to: 

```asm

bra RETTGT

``` 

instead of:

```asm 

BB0_1: bra RETTGT

```

This merge request fixes this bug.

Authors:
  - Basit Ayantunde (https://github.com/lamarrr)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Shruti Shivakumar (https://github.com/shrshi)

URL: rapidsai#17859
Fixes incorrect pylibcudf/libcudf example created in rapidsai#17803.

Authors:
  - Matthew Murray (https://github.com/Matt711)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#17912
Forward-merge branch-25.02 into branch-25.04
Adds multi-partition `Join` support following the same design as rapidsai#17441

In order to support parallel joins, this PR also introduces a special `Shuffle` node.

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Matthew Murray (https://github.com/Matt711)
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#17518
Currently pylibcudf does not export a dependency on libcudf at all, which is incorrect.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Bradley Dice (https://github.com/bdice)
  - James Lamb (https://github.com/jameslamb)

URL: rapidsai#17915
Forward-merge branch-25.02 into branch-25.04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.