Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36845: [C++][Python] Allow type promotion on pa.concat_tables #36846

Merged
merged 93 commits into from
Oct 10, 2023

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Jul 24, 2023

Revival of #12000

Rationale for this change

It would be great to be able to do promotions when concat'ing a table, such as:

def test_concat_tables_with_promotion_int():
    import pyarrow as pa
    t1 = pa.Table.from_arrays(
        [pa.array([1, 2], type=pa.int64())], ["int"])
    t2 = pa.Table.from_arrays(
        [pa.array([3, 4], type=pa.int32())], ["int"])

    result = pa.concat_tables([t1, t2], promote=True)

    assert result.equals(pa.Table.from_arrays([
        pa.array([1, 2, 3, 4], type=pa.int64())
    ], ["int"]))

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@Fokko Fokko requested a review from westonpace as a code owner July 24, 2023 14:19
@github-actions
Copy link

⚠️ GitHub issue #36845 has been automatically assigned in GitHub to PR creator.

@Fokko Fokko changed the title GH-36845: [ GH-36845: [C++][Python] Allow type promotion on pa.concat_tables Jul 24, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 6, 2023
t2 = pa.Table.from_arrays(
[pa.array([1.0, 2.0], type=pa.float32())], ["float_field"])

result = pa.concat_tables([t1, t2], promote=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = pa.concat_tables([t1, t2], promote=True)
with pytest.warns(FutureWarning):
result = pa.concat_tables([t1, t2], promote=True)

This asserts the warning is raised and at the same time also ensures we don't unnecessarily see the warning in the pytest logs

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Oct 9, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 9, 2023
@jorisvandenbossche jorisvandenbossche merged commit 5f57219 into apache:main Oct 10, 2023
@jorisvandenbossche jorisvandenbossche removed the awaiting change review Awaiting change review label Oct 10, 2023
@github-actions github-actions bot added the awaiting merge Awaiting merge label Oct 10, 2023
@jorisvandenbossche
Copy link
Member

Thanks @Fokko! (and @lidavidm for the initial PR)

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 5f57219.

There were 2 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them.

@jorisvandenbossche
Copy link
Member

FYI, those reported performance regressions were just flakes. The timings are still stable at the same level for later commits.

@Fokko Fokko deleted the arrow-14705 branch October 17, 2023 13:18
JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…es` (apache#36846)

Revival of apache#12000

### Rationale for this change

It would be great to be able to do promotions when `concat`'ing a table, such as:

```python
def test_concat_tables_with_promotion_int():
    import pyarrow as pa
    t1 = pa.Table.from_arrays(
        [pa.array([1, 2], type=pa.int64())], ["int"])
    t2 = pa.Table.from_arrays(
        [pa.array([3, 4], type=pa.int32())], ["int"])

    result = pa.concat_tables([t1, t2], promote=True)

    assert result.equals(pa.Table.from_arrays([
        pa.array([1, 2, 3, 4], type=pa.int64())
    ], ["int"]))
```

### What changes are included in this PR?

### Are these changes tested?

### Are there any user-facing changes?

* Closes: apache#36845

Lead-authored-by: Fokko Driesprong <fokko@tabular.io>
Co-authored-by: David Li <li.davidm96@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…es` (apache#36846)

Revival of apache#12000

### Rationale for this change

It would be great to be able to do promotions when `concat`'ing a table, such as:

```python
def test_concat_tables_with_promotion_int():
    import pyarrow as pa
    t1 = pa.Table.from_arrays(
        [pa.array([1, 2], type=pa.int64())], ["int"])
    t2 = pa.Table.from_arrays(
        [pa.array([3, 4], type=pa.int32())], ["int"])

    result = pa.concat_tables([t1, t2], promote=True)

    assert result.equals(pa.Table.from_arrays([
        pa.array([1, 2, 3, 4], type=pa.int64())
    ], ["int"]))
```

### What changes are included in this PR?

### Are these changes tested?

### Are there any user-facing changes?

* Closes: apache#36845

Lead-authored-by: Fokko Driesprong <fokko@tabular.io>
Co-authored-by: David Li <li.davidm96@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…es` (apache#36846)

Revival of apache#12000

### Rationale for this change

It would be great to be able to do promotions when `concat`'ing a table, such as:

```python
def test_concat_tables_with_promotion_int():
    import pyarrow as pa
    t1 = pa.Table.from_arrays(
        [pa.array([1, 2], type=pa.int64())], ["int"])
    t2 = pa.Table.from_arrays(
        [pa.array([3, 4], type=pa.int32())], ["int"])

    result = pa.concat_tables([t1, t2], promote=True)

    assert result.equals(pa.Table.from_arrays([
        pa.array([1, 2, 3, 4], type=pa.int64())
    ], ["int"]))
```

### What changes are included in this PR?

### Are these changes tested?

### Are there any user-facing changes?

* Closes: apache#36845

Lead-authored-by: Fokko Driesprong <fokko@tabular.io>
Co-authored-by: David Li <li.davidm96@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python] Allow promotion from int32 to int64
5 participants