Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string[pyarrow] dtype does not roundtrip in P2P shuffling #7420

Closed
hendrikmakait opened this issue Dec 19, 2022 · 3 comments · Fixed by #7879
Closed

string[pyarrow] dtype does not roundtrip in P2P shuffling #7420

hendrikmakait opened this issue Dec 19, 2022 · 3 comments · Fixed by #7879
Labels
bug Something is broken shuffle

Comments

@hendrikmakait
Copy link
Member

As discovered in #7410, string[pyarrow] columns do not roundtrip in a P2P shuffle but get converted to string[python] instead.

@hendrikmakait hendrikmakait added bug Something is broken shuffle labels Dec 19, 2022
@jrbourbeau
Copy link
Member

Might be pandas-dev/pandas#50074?

@mrocklin
Copy link
Member

mrocklin commented Jun 1, 2023

Any thoughts or plans on how we should resolve this? This seems fairly important to me

@hendrikmakait
Copy link
Member Author

Not yet. So far, we have not seen much urgency in fixing this. dask/dask#10326 finally highlighted its impact.

I am currently prioritizing getting P2P rechunking in a shape where it can become a possible default (dask/dask#10226) and failure recovery (#7353) over this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken shuffle
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants