-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move determine_partitions and helper methods to io.pyarrow #906
Move determine_partitions and helper methods to io.pyarrow #906
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumya-ghosh Thanks for taking this. It looks great! Just have 3 minor requests. Please let me know what you think.
Changes done as per review comments. |
4fdf236
to
e4eb226
Compare
@HonahX Resolved merge conflict with pyiceberg/table/init.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks @HonahX for the prompt review and comments. |
commit 1ed3abd Author: Sung Yun <107272191+syun64@users.noreply.github.com> Date: Wed Jul 17 02:04:52 2024 -0400 Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order, and support type promotion on write (apache#921) * merge * thanks @HonahX :) Co-authored-by: Honah J. <undefined.newdb.newtable@gmail.com> * support promote * revert promote * use a visitor * support promotion on write * fix * Thank you @Fokko ! Co-authored-by: Fokko Driesprong <fokko@apache.org> * revert * add-files promotiontest * support promote for add_files * add tests for uuid * add_files subset schema test --------- Co-authored-by: Honah J. <undefined.newdb.newtable@gmail.com> Co-authored-by: Fokko Driesprong <fokko@apache.org> commit 0f2e19e Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jul 15 23:25:08 2024 -0700 Bump zstandard from 0.22.0 to 0.23.0 (apache#934) Bumps [zstandard](https://github.com/indygreg/python-zstandard) from 0.22.0 to 0.23.0. - [Release notes](https://github.com/indygreg/python-zstandard/releases) - [Changelog](https://github.com/indygreg/python-zstandard/blob/main/docs/news.rst) - [Commits](indygreg/python-zstandard@0.22.0...0.23.0) --- updated-dependencies: - dependency-name: zstandard dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit ec73d97 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jul 15 23:24:47 2024 -0700 Bump griffe from 0.47.0 to 0.48.0 (apache#933) Bumps [griffe](https://github.com/mkdocstrings/griffe) from 0.47.0 to 0.48.0. - [Release notes](https://github.com/mkdocstrings/griffe/releases) - [Changelog](https://github.com/mkdocstrings/griffe/blob/main/CHANGELOG.md) - [Commits](mkdocstrings/griffe@0.47.0...0.48.0) --- updated-dependencies: - dependency-name: griffe dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit d05a423 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jul 15 23:24:16 2024 -0700 Bump mkdocs-material from 9.5.28 to 9.5.29 (apache#932) Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.28 to 9.5.29. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](squidfunk/mkdocs-material@9.5.28...9.5.29) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit e27cd90 Author: Yair Halevi (Spock) <118175475+spock-abadai@users.noreply.github.com> Date: Sun Jul 14 22:11:04 2024 +0300 Allow empty `names` in mapped field of Name Mapping (apache#927) * Remove check_at_least_one field validator Iceberg spec permits an emtpy list of names in the default name mapping. check_at_least_one is therefore unnecessary. * Remove irrelevant test case * Fixing pydantic model No longer requiring minimum length of names list to be 1. * Added test case for empty names in name mapping * Fixed formatting error commit 3f44dfe Author: Soumya Ghosh <ghoshsoumya92@gmail.com> Date: Sun Jul 14 00:35:38 2024 +0530 Lowercase bool values in table properties (apache#924) commit b11cdb5 Author: Sung Yun <107272191+syun64@users.noreply.github.com> Date: Fri Jul 12 16:45:04 2024 -0400 Deprecate to_requested_schema (apache#918) * deprecate to_requested_schema * prep for release commit a3dd531 Author: Honah J <honahx@apache.org> Date: Fri Jul 12 13:14:40 2024 -0700 Glue endpoint config variable, continue apache#530 (apache#920) Co-authored-by: Seb Pretzer <24555985+sebpretzer@users.noreply.github.com> commit 32e8f88 Author: Sung Yun <107272191+syun64@users.noreply.github.com> Date: Fri Jul 12 15:26:00 2024 -0400 support PyArrow timestamptz with Etc/UTC (apache#910) Co-authored-by: Fokko Driesprong <fokko@apache.org> commit f6d56e9 Author: Sung Yun <107272191+syun64@users.noreply.github.com> Date: Fri Jul 12 05:31:06 2024 -0400 fix invalidation logic (apache#911) commit 6488ad8 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu Jul 11 22:56:48 2024 -0700 Bump coverage from 7.5.4 to 7.6.0 (apache#917) Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.5.4 to 7.6.0. - [Release notes](https://github.com/nedbat/coveragepy/releases) - [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst) - [Commits](nedbat/coveragepy@7.5.4...7.6.0) --- updated-dependencies: - dependency-name: coverage dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit dceedfa Author: Sung Yun <107272191+syun64@users.noreply.github.com> Date: Thu Jul 11 20:32:14 2024 -0400 Check if schema is compatible in `add_files` API (apache#907) Co-authored-by: Fokko Driesprong <fokko@apache.org> commit aceed2a Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu Jul 11 15:52:06 2024 +0200 Bump mypy-boto3-glue from 1.34.136 to 1.34.143 (apache#912) Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.34.136 to 1.34.143. - [Release notes](https://github.com/youtype/mypy_boto3_builder/releases) - [Commits](https://github.com/youtype/mypy_boto3_builder/commits) --- updated-dependencies: - dependency-name: mypy-boto3-glue dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 1b9b884 Author: Fokko Driesprong <fokko@apache.org> Date: Thu Jul 11 12:45:20 2024 +0200 PyArrow: Don't enforce the schema when reading/writing (apache#902) * PyArrow: Don't enforce the schema PyIceberg struggled with the different type of arrow, such as the `string` and `large_string`. They represent the same, but are different under the hood. My take is that we should hide these kind of details from the user as much as possible. Now we went down the road of passing in the Iceberg schema into Arrow, but when doing this, Iceberg has to decide if it is a large or non-large type. This PR removes passing down the schema in order to let Arrow decide unless: - The type should be evolved - In case of re-ordering, we reorder the original types * WIP * Reuse Table schema * Make linter happy * Squash some bugs * Thanks Sung! Co-authored-by: Sung Yun <107272191+syun64@users.noreply.github.com> * Moar code moar bugs * Remove the variables wrt file sizes * Linting * Go with large ones for now * Missed one there! --------- Co-authored-by: Sung Yun <107272191+syun64@users.noreply.github.com> commit 8f47dfd Author: Soumya Ghosh <ghoshsoumya92@gmail.com> Date: Thu Jul 11 11:52:55 2024 +0530 Move determine_partitions and helper methods to io.pyarrow (apache#906) commit 5aa451d Author: Soumya Ghosh <ghoshsoumya92@gmail.com> Date: Thu Jul 11 07:57:05 2024 +0530 Rename data_sequence_number to sequence_number in ManifestEntry (apache#900) commit 77a07c9 Author: Honah J <honahx@apache.org> Date: Wed Jul 10 03:56:13 2024 -0700 Support MergeAppend operations (apache#363) * add ListPacker + tests * add merge append * add merge_append * fix snapshot inheritance * test manifest file and entries * add doc * fix lint * change test name * address review comments * rename _MergingSnapshotProducer to _SnapshotProducer * fix a serious bug * update the doc * remove merge_append as public API * make default to false * add test description * fix merge conflict * fix snapshot_id issue commit 66b92ff Author: Fokko Driesprong <fokko@apache.org> Date: Wed Jul 10 10:09:20 2024 +0200 GCS: Fix incorrect token description (apache#909) commit c25e080 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Jul 9 20:50:29 2024 -0700 Bump zipp from 3.17.0 to 3.19.1 (apache#905) Bumps [zipp](https://github.com/jaraco/zipp) from 3.17.0 to 3.19.1. - [Release notes](https://github.com/jaraco/zipp/releases) - [Changelog](https://github.com/jaraco/zipp/blob/main/NEWS.rst) - [Commits](jaraco/zipp@v3.17.0...v3.19.1) --- updated-dependencies: - dependency-name: zipp dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 301e336 Author: Sung Yun <107272191+syun64@users.noreply.github.com> Date: Tue Jul 9 23:35:11 2024 -0400 Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write (apache#848) commit 3f574d3 Author: Fokko Driesprong <fokko@apache.org> Date: Tue Jul 9 11:36:43 2024 +0200 Support partial deletes (apache#569) * Add option to delete datafiles This is done through the Iceberg metadata, resulting in efficient deletes if the data is partitioned correctly * Pull in main * WIP * Change DataScan to accept Metadata and io For the partial deletes I want to do a scan on in memory metadata. Changing this API allows this. * fix name-mapping issue * WIP * WIP * Moar tests * Oops * Cleanup * WIP * WIP * Fix summary generation * Last few bits * Fix the requirement * Make ruff happy * Comments, thanks Kevin! * Comments * Append rather than truncate * Fix merge conflicts * Make the tests pass * Add another test * Conflicts * Add docs (apache#33) * docs * docs * Add a partitioned overwrite test * Fix comment * Skip empty manifests --------- Co-authored-by: HonahX <honahx@apache.org> Co-authored-by: Sung Yun <107272191+syun64@users.noreply.github.com> commit cdc3e54 Author: Fokko Driesprong <fokko@apache.org> Date: Tue Jul 9 08:28:27 2024 +0200 Disallow writing empty Manifest files (apache#876) * Disallow writing empty Avro files/blocks Raising an exception when doing this might look extreme, but there is no real good reason to allow this. * Relax the constaints a bit commit b68e109 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jul 8 22:16:23 2024 -0700 Bump fastavro from 1.9.4 to 1.9.5 (apache#904) Bumps [fastavro](https://github.com/fastavro/fastavro) from 1.9.4 to 1.9.5. - [Release notes](https://github.com/fastavro/fastavro/releases) - [Changelog](https://github.com/fastavro/fastavro/blob/master/ChangeLog) - [Commits](fastavro/fastavro@1.9.4...1.9.5) --- updated-dependencies: - dependency-name: fastavro dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 90547bb Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jul 8 22:15:39 2024 -0700 Bump moto from 5.0.10 to 5.0.11 (apache#903) Bumps [moto](https://github.com/getmoto/moto) from 5.0.10 to 5.0.11. - [Release notes](https://github.com/getmoto/moto/releases) - [Changelog](https://github.com/getmoto/moto/blob/master/CHANGELOG.md) - [Commits](getmoto/moto@5.0.10...5.0.11) --- updated-dependencies: - dependency-name: moto dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 7dff359 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Sun Jul 7 07:50:19 2024 +0200 Bump tenacity from 8.4.2 to 8.5.0 (apache#898) commit 4aa469e Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Sat Jul 6 22:30:59 2024 +0200 Bump certifi from 2024.2.2 to 2024.7.4 (apache#899) Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.2.2 to 2024.7.4. - [Commits](certifi/python-certifi@2024.02.02...2024.07.04) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit aa7ad78 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Sat Jul 6 20:37:51 2024 +0200 Bump deptry from 0.16.1 to 0.16.2 (apache#897) Bumps [deptry](https://github.com/fpgmaas/deptry) from 0.16.1 to 0.16.2. - [Release notes](https://github.com/fpgmaas/deptry/releases) - [Changelog](https://github.com/fpgmaas/deptry/blob/main/CHANGELOG.md) - [Commits](fpgmaas/deptry@0.16.1...0.16.2) --- updated-dependencies: - dependency-name: deptry dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Implements #896
Moves
_determine_partitions
and helper method_get_table_partitions
andTablePartition
class topyiceberg,io.pyarrow
module.The tests for
_determine_partitions
method are moved totest_pyarrow.py