Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write #848

sungwy · 2024-06-22T22:20:06Z

Closes #541 and #840

Question: Are timestamp_ns and timestamptz_ns already supported? If so, should we just limit this PR to casting 's' and 'ms' to 'us' precision, and instead introduce the new timestamp_ns types?

Fokko · 2024-06-23T09:40:18Z

Thanks @syun64 for working on this! 🙌

Question: Are timestamp_ns and timestamptz_ns already supported? If so, should we just limit this PR to casting 's' and 'ms' to 'us' precision, and instead introduce the new timestamp_ns types?

Nanoseconds timestamp is supported in V3. In order to write nanoseconds without downcasting, we need to check if it is a V3 table.

HonahX

@syun64 It is great to have an optional flag to add more compatibility around nanosecond timestamp before V3. Thanks for working on this! I have one comment on the effect of this change on read side. Please let me know what you think!

HonahX · 2024-06-24T06:39:10Z

pyiceberg/io/pyarrow.py

+                # Supported types, will be upcast automatically to 'us'
+                pass
+            elif primitive.unit == "ns":
+                if Config().get_bool("downcast-ns-timestamp-on-write"):


How about making downcast_ns_timestamp a parameter of schema_to_pyarrow(TYPO: should be pyarrow_to_schema), and reading the Config from yml when we use this API on write? schema_to_pyarrow(TYPO: should be pyarrow_to_schema) itself seems to be a useful public API so it may be good to explicitly reveal the optional downcast. This will also help mitigate an edge case:

Since pyarrow_to_schema is used for both read/write, enabling this option also allows unit ns to pass the schema conversion when reading. For example, If users add a parquet file with ns timestamp and try to read the table as arrow, they will find the read process pass the pyarrow_to_schema check and stops at to_request_schema with

pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data:

Thank you for raising this @HonahX - I think this is an important bad case to consider

I actually don't think it'll stop when it reads through to_requested_schema, because it will detect that the pyarrow types are different, but their IcebergTypes are the same and silently cast on read, which will drop the precision silently:

iceberg-python/pyiceberg/io/pyarrow.py

Line 1274 in 0e381fa

elif (target_type := schema_to_pyarrow(field.field_type, include_field_ids=False)) != values.type:

This logic was introduced to support casting small and large types interchangeably: different pyarrow types that can be mapped to the same IcebergType (string, large_string) can be cast and read through as the same PyArrow type

The only thing that blocks this write from succeeding currently is the pyarrow_to_schema call which fails to generate a corresponding IcebergSchema based on the provided pyarrow schema, which this PR seeks to fix.

I do think that the silent downcasting of data is problematic - but that isn't the only problematic aspect of the add_files API. add_files does not check for the validity of the schema, because we provide a list of files into the API. Currently, it is up to the user to ensure that the file they want to add is in the correct format, and own the risk of potentially introducing wrong data files into the table. We note that the API is only intended for expert users, which is similar to the warnings we have for the other existing Table migration procedures.

Do you think it would be helpful to decouple this concern to that of the idea of introducing an optional schema check for the add_files procedure?

On a tangent, I'd like to raise another point for discussion:

If we are aware that nanoseconds will be introduced as a separate IcebergType, would introducing a pa.timestamp(unit="ns") -> TimestampType introduce too much complexity, since we will have to maintain the logic for one to many mapping for pa.timestamp(unit="ns") -> TimestampType, TimestampNsType based on the format-version of the Iceberg table? Is introducing automated conversion of ns precision timestamp really worth the complexity we are introducing in the near future?

I still think @HonahX raises a good point about the schema_to_pyarrow method being a useful public API, and it would be nice for its behavior to not be too tightly coupled to pyiceberg config. I.e., I agree that it's wiser to parameterize the behavior and determine the correct parameter to use via config where it's called.

Thank you for your input, @corleyma. Just clarifying here - what enables us to write ns into TimestampType in PyIceberg is this proposed change in ConvertToIceberg, which is not in schema_to_pyarrow. It is actually in pyarrow_to_schema which is used to check schema compatibility on write. Once the data file is written, we are making the assumption that TimestampType is all in 'us' precision, or that it is safe to cast to 'us' precision, because the writer has already made the decision to write into 'us' precision timestamps.

If we are aware that nanoseconds will be introduced as a separate IcebergType, would introducing a pa.timestamp(unit="ns") -> TimestampType introduce too much complexity, since we will have to maintain the logic for one to many mapping for pa.timestamp(unit="ns") -> TimestampType, TimestampNsType based on the format-version of the Iceberg table? Is introducing automated conversion of ns precision timestamp really worth the complexity we are introducing in the near future?

@Fokko , @HonahX and @corleyma : I'd like to gather some feedback on this point before committing to introducing this flag. My worry is that since there's a new type that's being introduced in V3 Spec that will actually be in 'ns', enabling 'ns' casting on the existing 'us' precision TimestampType will complicate the type conversions, dooming us to have to check the type (TimestampType, TimestampNsType), downcast-to-ns boolean flag, and the format-version whenever we are casting timestamps. I'd like for us to weigh that trade off carefully and decide on whether supporting this conversion is worth the complexity we are introducing into the conversion functions.

Thanks @HonahX for giving the example, I just gave this a spin and ran into the following:

@pytest.mark.integration def test_timestamp_tz( session_catalog: Catalog, format_version: int, mocker: MockerFixture ) -> None: nanoseconds_schema_iceberg = Schema( NestedField(1, "quux", TimestamptzType()) ) nanoseconds_schema = pa.schema([ ("quux", pa.timestamp("ns", tz="UTC")), ]) arrow_table = pa.Table.from_pylist( [ { "quux": 1615967687249846175, # 2021-03-17 07:54:47.249846159 } ], schema=nanoseconds_schema, ) mocker.patch.dict(os.environ, values={"PYICEBERG_DOWNCAST_NS_TIMESTAMP_ON_WRITE": "True"}) identifier = f"default.abccccc{format_version}" try: session_catalog.drop_table(identifier=identifier) except NoSuchTableError: pass tbl = session_catalog.create_table( identifier=identifier, schema=nanoseconds_schema_iceberg, properties={"format-version": str(format_version)}, partition_spec=PartitionSpec(), ) file_paths = [f"s3://warehouse/default/test_timestamp_tz/v{format_version}/test-{i}.parquet" for i in range(5)] # write parquet files for file_path in file_paths: fo = tbl.io.new_output(file_path) with fo.create(overwrite=True) as fos: with pq.ParquetWriter(fos, schema=nanoseconds_schema) as writer: writer.write_table(arrow_table) # add the parquet files as data files tbl.add_files(file_paths=file_paths) print(tbl.scan().to_arrow())

I think we can force the cast to be unsafe:

return values.cast(target_type, safe=False)

We might want to check if we only apply this when doing the nanos to micros. I'm not sure what will happen when we do other lossy conversions.

I also got some issues with the nanosecond timestamp when collecting statistics:

> ??? E ValueError: Nanosecond resolution temporal type 1615967687249846175 is not safely convertible to microseconds to convert to datetime.datetime. Install pandas to return as Timestamp with nanosecond support or access the .value attribute.

At the lines:

iceberg-python/pyiceberg/io/pyarrow.py

Lines 1870 to 1871 in 7afd6d6

col_aggs[field_id].update_min(statistics.min)

col_aggs[field_id].update_max(statistics.max)

This got fixed after updating this to:

col_aggs[field_id].update_min(statistics.min_raw) col_aggs[field_id].update_max(statistics.max_raw)

Hi folks - thank you all for the valuable feedback. So it sounds like what we want is for the flag to be controlled by the configuration flag, but that flag to be passed as a parameter to the schema_to_pyarrow API so that its behavior can be fully controlled by its input parameters.

I've made the following changes:

Introduced downcast_ns_timestamp_to_us as a new input parameter to pyarrow_to_schema and to_requested_schema public APIs

Now table and catalog level functions infer the flag from the Config on write. (e.g. _check_schema_compatible and _convert_schema_if_needed)

Always downcast ns to us on read, if there is ns timestamp in the parquet file (we will want to revise this behavior when we introduce nanosecond support in V3 spec, but until then, I think it's a reasonable assumption that data files that are in Iceberg will only be read with microseconds precision). https://github.com/apache/iceberg-python/pull/848/files#diff-8d5e63f2a87ead8cebe2fd8ac5dcf2198d229f01e16bb9e06e21f7277c328abdR1030-R1033

I also got some issues with the nanosecond timestamp when collecting statistics:

> ??? E ValueError: Nanosecond resolution temporal type 1615967687249846175 is not safely convertible to microseconds to convert to datetime.datetime. Install pandas to return as Timestamp with nanosecond support or access the .value attribute.

At the lines:

iceberg-python/pyiceberg/io/pyarrow.py

Lines 1870 to 1871 in 7afd6d6

col_aggs[field_id].update_min(statistics.min)

col_aggs[field_id].update_max(statistics.max)

This got fixed after updating this to:

col_aggs[field_id].update_min(statistics.min_raw) col_aggs[field_id].update_max(statistics.max_raw)

I tried making this change and realized that this causes our serialization to break because it introduces bytes values in our statistics, which cannot be serialized (since it already is). I will need to spend a bit more time to figure out the right change to StatsAggregator to support this change. I also failed to reproduce this issue in my environment (possibly because it has pandas installed) so I'm reverting this change for now.

I also failed to reproduce this issue in my environment (possibly because it has pandas installed) so I'm reverting this change for now.

Ah, of course. One of the few upsides of having a fresh Macbook.

Fokko · 2024-07-05T10:33:03Z

pyiceberg/io/pyarrow.py

-                elif primitive.tz is None:
-                    return TimestampType()
+            if primitive.unit in ("s", "ms", "us"):
+                # Supported types, will be upcast automatically to 'us'


This is nice 👍

Fokko

This looks good to me. The V3 support can be added in a separate PR 👍

Fokko · 2024-07-06T19:29:14Z

pyiceberg/catalog/__init__.py

@@ -675,8 +675,11 @@ def _convert_schema_if_needed(schema: Union[Schema, "pa.Schema"]) -> Schema:

            from pyiceberg.io.pyarrow import _ConvertToIcebergWithoutIDs, visit_pyarrow

+            downcast_ns_timestamp_to_us = Config().get_bool("downcast-ns-timestamp-to-us-on-write") or False


Nit: we can move "downcast-ns-timestamp-to-us-on-write" into a constant, and reuse it in pyarrow.py

Thank you for the review! I've adopted this in the new commits

HonahX

LGTM!

@HonahX

commit 1ed3abd Author: Sung Yun <107272191+syun64@users.noreply.github.com> Date: Wed Jul 17 02:04:52 2024 -0400 Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order, and support type promotion on write (apache#921) * merge * thanks @HonahX :) Co-authored-by: Honah J. <undefined.newdb.newtable@gmail.com> * support promote * revert promote * use a visitor * support promotion on write * fix * Thank you @Fokko ! Co-authored-by: Fokko Driesprong <fokko@apache.org> * revert * add-files promotiontest * support promote for add_files * add tests for uuid * add_files subset schema test --------- Co-authored-by: Honah J. <undefined.newdb.newtable@gmail.com> Co-authored-by: Fokko Driesprong <fokko@apache.org> commit 0f2e19e Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jul 15 23:25:08 2024 -0700 Bump zstandard from 0.22.0 to 0.23.0 (apache#934) Bumps [zstandard](https://github.com/indygreg/python-zstandard) from 0.22.0 to 0.23.0. - [Release notes](https://github.com/indygreg/python-zstandard/releases) - [Changelog](https://github.com/indygreg/python-zstandard/blob/main/docs/news.rst) - [Commits](indygreg/python-zstandard@0.22.0...0.23.0) --- updated-dependencies: - dependency-name: zstandard dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit ec73d97 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jul 15 23:24:47 2024 -0700 Bump griffe from 0.47.0 to 0.48.0 (apache#933) Bumps [griffe](https://github.com/mkdocstrings/griffe) from 0.47.0 to 0.48.0. - [Release notes](https://github.com/mkdocstrings/griffe/releases) - [Changelog](https://github.com/mkdocstrings/griffe/blob/main/CHANGELOG.md) - [Commits](mkdocstrings/griffe@0.47.0...0.48.0) --- updated-dependencies: - dependency-name: griffe dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit d05a423 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jul 15 23:24:16 2024 -0700 Bump mkdocs-material from 9.5.28 to 9.5.29 (apache#932) Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.28 to 9.5.29. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](squidfunk/mkdocs-material@9.5.28...9.5.29) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit e27cd90 Author: Yair Halevi (Spock) <118175475+spock-abadai@users.noreply.github.com> Date: Sun Jul 14 22:11:04 2024 +0300 Allow empty `names` in mapped field of Name Mapping (apache#927) * Remove check_at_least_one field validator Iceberg spec permits an emtpy list of names in the default name mapping. check_at_least_one is therefore unnecessary. * Remove irrelevant test case * Fixing pydantic model No longer requiring minimum length of names list to be 1. * Added test case for empty names in name mapping * Fixed formatting error commit 3f44dfe Author: Soumya Ghosh <ghoshsoumya92@gmail.com> Date: Sun Jul 14 00:35:38 2024 +0530 Lowercase bool values in table properties (apache#924) commit b11cdb5 Author: Sung Yun <107272191+syun64@users.noreply.github.com> Date: Fri Jul 12 16:45:04 2024 -0400 Deprecate to_requested_schema (apache#918) * deprecate to_requested_schema * prep for release commit a3dd531 Author: Honah J <honahx@apache.org> Date: Fri Jul 12 13:14:40 2024 -0700 Glue endpoint config variable, continue apache#530 (apache#920) Co-authored-by: Seb Pretzer <24555985+sebpretzer@users.noreply.github.com> commit 32e8f88 Author: Sung Yun <107272191+syun64@users.noreply.github.com> Date: Fri Jul 12 15:26:00 2024 -0400 support PyArrow timestamptz with Etc/UTC (apache#910) Co-authored-by: Fokko Driesprong <fokko@apache.org> commit f6d56e9 Author: Sung Yun <107272191+syun64@users.noreply.github.com> Date: Fri Jul 12 05:31:06 2024 -0400 fix invalidation logic (apache#911) commit 6488ad8 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu Jul 11 22:56:48 2024 -0700 Bump coverage from 7.5.4 to 7.6.0 (apache#917) Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.5.4 to 7.6.0. - [Release notes](https://github.com/nedbat/coveragepy/releases) - [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst) - [Commits](nedbat/coveragepy@7.5.4...7.6.0) --- updated-dependencies: - dependency-name: coverage dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit dceedfa Author: Sung Yun <107272191+syun64@users.noreply.github.com> Date: Thu Jul 11 20:32:14 2024 -0400 Check if schema is compatible in `add_files` API (apache#907) Co-authored-by: Fokko Driesprong <fokko@apache.org> commit aceed2a Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu Jul 11 15:52:06 2024 +0200 Bump mypy-boto3-glue from 1.34.136 to 1.34.143 (apache#912) Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.34.136 to 1.34.143. - [Release notes](https://github.com/youtype/mypy_boto3_builder/releases) - [Commits](https://github.com/youtype/mypy_boto3_builder/commits) --- updated-dependencies: - dependency-name: mypy-boto3-glue dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 1b9b884 Author: Fokko Driesprong <fokko@apache.org> Date: Thu Jul 11 12:45:20 2024 +0200 PyArrow: Don't enforce the schema when reading/writing (apache#902) * PyArrow: Don't enforce the schema PyIceberg struggled with the different type of arrow, such as the `string` and `large_string`. They represent the same, but are different under the hood. My take is that we should hide these kind of details from the user as much as possible. Now we went down the road of passing in the Iceberg schema into Arrow, but when doing this, Iceberg has to decide if it is a large or non-large type. This PR removes passing down the schema in order to let Arrow decide unless: - The type should be evolved - In case of re-ordering, we reorder the original types * WIP * Reuse Table schema * Make linter happy * Squash some bugs * Thanks Sung! Co-authored-by: Sung Yun <107272191+syun64@users.noreply.github.com> * Moar code moar bugs * Remove the variables wrt file sizes * Linting * Go with large ones for now * Missed one there! --------- Co-authored-by: Sung Yun <107272191+syun64@users.noreply.github.com> commit 8f47dfd Author: Soumya Ghosh <ghoshsoumya92@gmail.com> Date: Thu Jul 11 11:52:55 2024 +0530 Move determine_partitions and helper methods to io.pyarrow (apache#906) commit 5aa451d Author: Soumya Ghosh <ghoshsoumya92@gmail.com> Date: Thu Jul 11 07:57:05 2024 +0530 Rename data_sequence_number to sequence_number in ManifestEntry (apache#900) commit 77a07c9 Author: Honah J <honahx@apache.org> Date: Wed Jul 10 03:56:13 2024 -0700 Support MergeAppend operations (apache#363) * add ListPacker + tests * add merge append * add merge_append * fix snapshot inheritance * test manifest file and entries * add doc * fix lint * change test name * address review comments * rename _MergingSnapshotProducer to _SnapshotProducer * fix a serious bug * update the doc * remove merge_append as public API * make default to false * add test description * fix merge conflict * fix snapshot_id issue commit 66b92ff Author: Fokko Driesprong <fokko@apache.org> Date: Wed Jul 10 10:09:20 2024 +0200 GCS: Fix incorrect token description (apache#909) commit c25e080 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Jul 9 20:50:29 2024 -0700 Bump zipp from 3.17.0 to 3.19.1 (apache#905) Bumps [zipp](https://github.com/jaraco/zipp) from 3.17.0 to 3.19.1. - [Release notes](https://github.com/jaraco/zipp/releases) - [Changelog](https://github.com/jaraco/zipp/blob/main/NEWS.rst) - [Commits](jaraco/zipp@v3.17.0...v3.19.1) --- updated-dependencies: - dependency-name: zipp dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 301e336 Author: Sung Yun <107272191+syun64@users.noreply.github.com> Date: Tue Jul 9 23:35:11 2024 -0400 Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write (apache#848) commit 3f574d3 Author: Fokko Driesprong <fokko@apache.org> Date: Tue Jul 9 11:36:43 2024 +0200 Support partial deletes (apache#569) * Add option to delete datafiles This is done through the Iceberg metadata, resulting in efficient deletes if the data is partitioned correctly * Pull in main * WIP * Change DataScan to accept Metadata and io For the partial deletes I want to do a scan on in memory metadata. Changing this API allows this. * fix name-mapping issue * WIP * WIP * Moar tests * Oops * Cleanup * WIP * WIP * Fix summary generation * Last few bits * Fix the requirement * Make ruff happy * Comments, thanks Kevin! * Comments * Append rather than truncate * Fix merge conflicts * Make the tests pass * Add another test * Conflicts * Add docs (apache#33) * docs * docs * Add a partitioned overwrite test * Fix comment * Skip empty manifests --------- Co-authored-by: HonahX <honahx@apache.org> Co-authored-by: Sung Yun <107272191+syun64@users.noreply.github.com> commit cdc3e54 Author: Fokko Driesprong <fokko@apache.org> Date: Tue Jul 9 08:28:27 2024 +0200 Disallow writing empty Manifest files (apache#876) * Disallow writing empty Avro files/blocks Raising an exception when doing this might look extreme, but there is no real good reason to allow this. * Relax the constaints a bit commit b68e109 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jul 8 22:16:23 2024 -0700 Bump fastavro from 1.9.4 to 1.9.5 (apache#904) Bumps [fastavro](https://github.com/fastavro/fastavro) from 1.9.4 to 1.9.5. - [Release notes](https://github.com/fastavro/fastavro/releases) - [Changelog](https://github.com/fastavro/fastavro/blob/master/ChangeLog) - [Commits](fastavro/fastavro@1.9.4...1.9.5) --- updated-dependencies: - dependency-name: fastavro dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 90547bb Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jul 8 22:15:39 2024 -0700 Bump moto from 5.0.10 to 5.0.11 (apache#903) Bumps [moto](https://github.com/getmoto/moto) from 5.0.10 to 5.0.11. - [Release notes](https://github.com/getmoto/moto/releases) - [Changelog](https://github.com/getmoto/moto/blob/master/CHANGELOG.md) - [Commits](getmoto/moto@5.0.10...5.0.11) --- updated-dependencies: - dependency-name: moto dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 7dff359 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Sun Jul 7 07:50:19 2024 +0200 Bump tenacity from 8.4.2 to 8.5.0 (apache#898) commit 4aa469e Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Sat Jul 6 22:30:59 2024 +0200 Bump certifi from 2024.2.2 to 2024.7.4 (apache#899) Bumps [certifi](https://github.com/certifi/python-certifi) from 2024.2.2 to 2024.7.4. - [Commits](certifi/python-certifi@2024.02.02...2024.07.04) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit aa7ad78 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Sat Jul 6 20:37:51 2024 +0200 Bump deptry from 0.16.1 to 0.16.2 (apache#897) Bumps [deptry](https://github.com/fpgmaas/deptry) from 0.16.1 to 0.16.2. - [Release notes](https://github.com/fpgmaas/deptry/releases) - [Changelog](https://github.com/fpgmaas/deptry/blob/main/CHANGELOG.md) - [Commits](fpgmaas/deptry@0.16.1...0.16.2) --- updated-dependencies: - dependency-name: deptry dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

sungwy added 2 commits June 20, 2024 22:50

checkpoint

d5d31d7

support more timestamps

ae6ea72

sungwy mentioned this pull request Jun 22, 2024

Allow writing dataframes that are either a subset of table schema or in arbitrary order #829

Closed

sungwy requested review from kevinjqliu and Fokko June 22, 2024 22:33

HonahX reviewed Jun 24, 2024

View reviewed changes

Merge branch 'main' into timestamp-cast

0d065af

kevinjqliu mentioned this pull request Jun 27, 2024

Cannot cast a datetime type with a timezone into a timestampz type. #863

Closed

Fokko added this to the PyIceberg 0.7.0 release milestone Jun 30, 2024

Fokko reviewed Jul 5, 2024

View reviewed changes

sungwy added 3 commits July 5, 2024 17:48

adopt review feedback

e4471ab

fix

e41a813

revert min_raw max_raw change

d7483db

Fokko approved these changes Jul 6, 2024

View reviewed changes

Fokko requested a review from HonahX July 8, 2024 18:39

sungwy added 2 commits July 9, 2024 16:32

adopt nits

97ce9a0

Merge branch 'main' into timestamp-cast

f8ec372

HonahX approved these changes Jul 10, 2024

View reviewed changes

HonahX merged commit 301e336 into apache:main Jul 10, 2024
7 checks passed

devinrsmith mentioned this pull request Jul 24, 2024

Configure timestamp downcast programmatically #960

Open

kevinjqliu mentioned this pull request Oct 12, 2024

Added support for ns #1169 #1215

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write #848

Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write #848

sungwy commented Jun 22, 2024 •

edited

Loading

Fokko commented Jun 23, 2024

HonahX left a comment

HonahX Jun 24, 2024 •

edited

Loading

sungwy Jun 27, 2024

sungwy Jun 27, 2024

corleyma Jun 27, 2024 •

edited

Loading

sungwy Jun 28, 2024 •

edited

Loading

Fokko Jul 5, 2024

Fokko Jul 5, 2024 •

edited

Loading

sungwy Jul 5, 2024

sungwy Jul 6, 2024

Fokko Jul 6, 2024 •

edited

Loading

Fokko Jul 5, 2024

Fokko left a comment

Fokko Jul 6, 2024

sungwy Jul 9, 2024

HonahX left a comment

	col_aggs[field_id].update_min(statistics.min)
	col_aggs[field_id].update_max(statistics.max)

		@@ -675,8 +675,11 @@ def _convert_schema_if_needed(schema: Union[Schema, "pa.Schema"]) -> Schema:

		from pyiceberg.io.pyarrow import _ConvertToIcebergWithoutIDs, visit_pyarrow

		downcast_ns_timestamp_to_us = Config().get_bool("downcast-ns-timestamp-to-us-on-write") or False

Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write #848

Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write #848

Conversation

sungwy commented Jun 22, 2024 • edited Loading

Fokko commented Jun 23, 2024

HonahX left a comment

Choose a reason for hiding this comment

HonahX Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

corleyma Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

sungwy Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fokko Jul 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fokko Jul 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fokko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HonahX left a comment

Choose a reason for hiding this comment

sungwy commented Jun 22, 2024 •

edited

Loading

HonahX Jun 24, 2024 •

edited

Loading

corleyma Jun 27, 2024 •

edited

Loading

sungwy Jun 28, 2024 •

edited

Loading

Fokko Jul 5, 2024 •

edited

Loading

Fokko Jul 6, 2024 •

edited

Loading