-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyarrow.dataset.write_dataset do not preserve order #39030
Comments
Just curious, does |
yes. |
Hello any updates on this? Thanks! |
Interested in this as well. Would be great if there was a way to ensure ordering for datasets |
…s, because (as of now) it does not preserve ordering on a filesystem write. apache/arrow#26818 apache/arrow#39030
This seems like quite a serious bug - a good portion of my tools rely on the assumption that |
I finally got around this by adding an index column to the data. Not great, but it makes the pipeline much more robust. |
Thanks for the suggestion, Adam. It seems that indeed |
Describe the bug, including details regarding any error messages, version, and platform.
As described, when writing a file with
pyarrow.dataset.write_dataset
, the order is not preserved. I have tested this with bothparquet
andcsv
file format.Component(s)
Python
The text was updated successfully, but these errors were encountered: