Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Dataset.write_parquet(**arrow_parquet_args) not work #45493

Closed
bbtfr opened this issue May 22, 2024 · 2 comments
Closed

[Data] Dataset.write_parquet(**arrow_parquet_args) not work #45493

bbtfr opened this issue May 22, 2024 · 2 comments
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P1 Issue that should be fixed within a few weeks

Comments

@bbtfr
Copy link

bbtfr commented May 22, 2024

What happened + What you expected to happen

ray.data.Dataset.write_parquet(**arrow_parquet_args)

arrow_parquet_args does not work anymore, since it has not passed to pq.ParquetWriter here

with pq.ParquetWriter(file, schema) as writer:

And so does arrow_parquet_args_fn

Versions / Dependencies

Ubuntu 22.04 LTS
Python 3.10.14
Ray 2.22.0

Reproduction script

ray.data.read_parquet(
    input_path, 
    columns=columns,
).write_parquet(
    output_path, 
    compression="ZSTD", 
    compression_level=100,
)

Issue Severity

High: It blocks me from completing my task.

@bbtfr bbtfr added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 22, 2024
@bbtfr bbtfr changed the title [Data] Dataset.write_parquet() [Data] Dataset.write_parquet(**arrow_parquet_args) not work May 22, 2024
@anyscalesam anyscalesam added the data Ray Data-related issues label May 24, 2024
@raulchen
Copy link
Contributor

raulchen commented Jun 3, 2024

thanks for reporting. would you like to submit a fix? I can review the PR.

@raulchen raulchen added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 3, 2024
@MaxVanDijck
Copy link
Contributor

@raulchen I have opened a PR to resolve #45772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants