Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC Parquet GEOMETRY logical type implementation #2

Conversation

Kontinuation
Copy link

@Kontinuation Kontinuation commented Sep 4, 2024

This is a continuation of apache#43196.

One major issue is that we're special casing the handling of sort-orders for geometry logical types everywhere. I'm not sure if it is better to define another sort-order for geometry so that the sort order is not UNKNOWN.

Another issue is related to Arrow-IO (parquet::arrow::FileWriter and parquet::arrow::FileReader). There are no canonical extension types for geometry in Arrow, so we cannot write parquet files containing geometry columns using the parquet-arrow table writers.

joellubi and others added 30 commits August 12, 2024 16:49
### Rationale for this change

Go implementation of apache#43234

### What changes are included in this PR?

- Go implementation of the `Bool8` extension type
- Minor refactor of existing extension builder interfaces

### Are these changes tested?

Yes, unit tests and basic read/write benchmarks are included.

### Are there any user-facing changes?

- A new extension type is added
- Custom extension builders no longer need another builder created and released separately.

* GitHub Issue: apache#17682

Authored-by: Joel Lubinitsky <joellubi@gmail.com>
Signed-off-by: Joel Lubinitsky <joellubi@gmail.com>
…EqualVisitor integration (apache#43642)

### Rationale for this change

LargeListViewVector requires `RangeEqualVisitor` and `TypeEqualVisitor` to support the C Data interface. 

### What changes are included in this PR?

Adding `RangeEqualVisitor`, `TypeEqualVisitor` and the corresponding test cases. 

### Are these changes tested?

Yes. 

### Are there any user-facing changes?

No
* GitHub Issue: apache#43638

Authored-by: Vibhatha Abeykoon <vibhatha@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
…apache#43606)

### Rationale for this change

This is done by passing an extra flag when building the Cython extension modules. It is needed so that the GIL is not dynamically reenabled when importing `pyarrow.lib`.

### What changes are included in this PR?

Changes to CMake so that the extra flag is passed when building Cython extension modules.

* GitHub Issue: apache#43536

Lead-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…ng javadocs (apache#43674)

### Rationale for this change

Apparently some maven plugins are not thread safe and started throwing errors in the `test-debian-12-docs` CI job when building javadocs.

### What changes are included in this PR?

* Remove multithreading config when building javadocs

### Are these changes tested?

CI

### Are there any user-facing changes?

No
* GitHub Issue: apache#43378

Authored-by: Dane Pitkin <dpitkin@apache.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…system (apache#43098)

### Rationale for this change

See apache#43097.

### What changes are included in this PR?
Implements `AzureFS::PathFromUri` using existing URI parsing and path extraction inside the `AzureOptions`.

### Are these changes tested?
Yes, added a unit test.

### Are there any user-facing changes?
No, but calling `PathFromUri` will now work instead of throwing due to no implementation provided.
* GitHub Issue: apache#43097

Authored-by: Oliver Layer <o.layer@celonis.de>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
….6.0 in /go (apache#43647)

Bumps [github.com/substrait-io/substrait-go](https://github.com/substrait-io/substrait-go) from 0.5.0 to 0.6.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/substrait-io/substrait-go/releases">github.com/substrait-io/substrait-go's releases</a>.</em></p>
<blockquote>
<h1>v0.6.0 (2024-08-11)</h1>
<h3>Features</h3>
<ul>
<li><strong><code>type</code></strong> add support for type PrecisionTimestamp and PrecisionTimestampTz (<a href="https://github.com/substrait-io/substrait-go/issues/41">#41</a>) (<a href="https://github.com/substrait-io/substrait-go/commit/5040d09319e2ec3067da2ee0f1f354cc07e8a41a">5040d09</a>)</li>
<li><strong><code>substrait</code></strong> Update to Substrait v0.53.0 (<a href="https://github.com/substrait-io/substrait-go/issues/40">#40</a>) (<a href="https://github.com/substrait-io/substrait-go/commit/0ea5482e061033854f9931e2134a1bf91a5bbb54">0ea5482</a>)
<blockquote>
<ul>
<li>Update substrait dependency to v0.53.0</li>
<li>Accommodate UserDefined Literal changes where literal value became oneof in proto instead of direct value</li>
<li>Fix AdvanceExtension interface to accommodate breaking change in AdvanceExtensionProto</li>
<li>Add linter to ignore internal use of deprecated methods.</li>
</ul>
</blockquote>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/substrait-io/substrait-go/commit/5040d09319e2ec3067da2ee0f1f354cc07e8a41a"><code>5040d09</code></a> feat(type): add support for type PrecisionTimestamp and PrecisionTimestampTz ...</li>
<li><a href="https://github.com/substrait-io/substrait-go/commit/0ea5482e061033854f9931e2134a1bf91a5bbb54"><code>0ea5482</code></a> feat(substrait): Update to Substrait v0.53.0 (<a href="https://github.com/substrait-io/substrait-go/issues/40">#40</a>)</li>
<li><a href="https://github.com/substrait-io/substrait-go/commit/2fc8f586848be8a97ba473c6172a95a82d5c943e"><code>2fc8f58</code></a> ci(build-test): Use grep to exclude protobuf from coverage report (<a href="https://github.com/substrait-io/substrait-go/issues/38">#38</a>)</li>
<li><a href="https://github.com/substrait-io/substrait-go/commit/b3aa515f9b50a728d8404e3b8113f2d3528df928"><code>b3aa515</code></a> ci(build-test): Update codecov to ignore protobuf files</li>
<li><a href="https://github.com/substrait-io/substrait-go/commit/15314a88001ef860031092b8c78b2e3cc06f2e62"><code>15314a8</code></a> ci(build-test): Add codecov and release branch action badges. (<a href="https://github.com/substrait-io/substrait-go/issues/36">#36</a>)</li>
<li><a href="https://github.com/substrait-io/substrait-go/commit/663c26d98efa6578b96ef1e04092625bcc5498b8"><code>663c26d</code></a> ci(build-test): Add codecov reports (<a href="https://github.com/substrait-io/substrait-go/issues/35">#35</a>)</li>
<li>See full diff in <a href="https://github.com/substrait-io/substrait-go/compare/v0.5.0...v0.6.0">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/substrait-io/substrait-go&package-manager=go_modules&previous-version=0.5.0&new-version=0.6.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@ dependabot rebase` will rebase this PR
- `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@ dependabot merge` will merge this PR after your CI passes on it
- `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@ dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@ dependabot reopen` will reopen this PR if it is closed
- `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>

Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
… .cc and .h files (apache#43678)

### Rationale for this change

One way of learning about a codebase is reading the tests. As it is now, it's hard to see the minimal `FlightServerBase` sub-class in `flight/test_util.cc`, so I moved it to its own file.

### What changes are included in this PR?

 - Renaming `FlightTestServer` to `TestFlightServer`
 - Moving the class to `test_flight_server.{h,cc}`
 - Bonus: Moving the server and client auth handlers to `test_auth_handlers.{h,cc}`

### Are these changes tested?

By existing tests.

### Are there any user-facing changes?

`ExampleTestServer` is removed from the testing library in favor of `FlightTestServer::Make`.
* GitHub Issue: apache#43677

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
…ag usage (apache#43583)

### Rationale for this change

`getBuffers` method provides the capability to clear the buffers in the vector, this has not been properly tested while clear flag is not properly used in the implementation across various types of vectors. 

### What changes are included in this PR?

Updating the vector `getBuffers` method to use `clear` flag as expected and adding corresponding test cases. 

### Are these changes tested?

Yes, via existing test cases and new test cases. 

### Are there any user-facing changes?

Yes
* GitHub Issue: apache#43577

Authored-by: Vibhatha Abeykoon <vibhatha@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
…ache#43458)

### Rationale for this change

Add the newly ratified extension type.

### What changes are included in this PR?

The C++/Python implementation only.

### Are these changes tested?

Yes

### Are there any user-facing changes?

No.
* GitHub Issue: apache#43454

Lead-authored-by: David Li <li.davidm96@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
…r implementation (apache#43637)

### Rationale for this change

Integrating the `transferPair` and `copyFrom`  functionality to `LargeListViewVector`

- [X] apache#41292

### What changes are included in this PR?

This PR includes the `TransferPairImpl`, corresponding functions and test cases. 

### Are these changes tested?

Yes

### Are there any user-facing changes?

No
* GitHub Issue: apache#41291

Authored-by: Vibhatha Abeykoon <vibhatha@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
)

### Rationale for this change

Newly introduced `LargeListViewVector` requires the IPC integration for C Data integration tests while mainly supporting IPC format to include this type. 

### What changes are included in this PR?

Includes the `JsonFileWriter` and `JsonFileReader` along with the corresponding test cases. 

### Are these changes tested?

Yes, using existing tests but adding new configurations. 

### Are there any user-facing changes?

No
* GitHub Issue: apache#43643

Authored-by: Vibhatha Abeykoon <vibhatha@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
… Ubuntu 24.04 (apache#43619)

Install the clang-rt libraries that are necessary to link Thread Sanitizer-enabled binaries. Also fix use of deprecated `BufferReader` constructor in some tests, so that compilation with CLang 18 succeeds.

Note that the C++ test suite still fails on Flight tests, as tracked in apacheGH-36552.

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…apache#43649)

### Rationale for this change

See apache#43627 (comment)

### What changes are included in this PR?

An extra `dplyr::select()`

### Are these changes tested?

Conbench should show that the performance is much better

### Are there any user-facing changes?

Not slow
* GitHub Issue: apache#43627
1. Add fuzz seeds with newer datatypes such as Run-End Encoded and String Views
2. Add fuzz seeds with buffer compression
3. Build seed corpus generation utilities even when fuzzing isn't enabled, for convenience

* GitHub Issue: apache#38041

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…43540)

### Rationale for this change

For better reference safety under Python free-threaded builds (i.e. with the GIL removed), we should be using `Py(List|Dict)_GetItemRef` that return strong references and are implemented in a thread-safe manner.

### What changes are included in this PR?

- Vendor a copy of https://github.com/python/pythoncapi-compat
- Port to strong reference APIs for lists and dicts

### Are these changes tested?

I ran the tests with the free-threaded build before and after, and there's the same expected failures.

* GitHub Issue: apache#43536

Lead-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
… for testing (apache#43708)

### Rationale for this change

Introducing more bad_data for testing

### What changes are included in this PR?

* Upgrade parquet-testing
* Introduce more bad_data
* Update fuzz generation

### Are these changes tested?

They're tests :-)

### Are there any user-facing changes?

no

* GitHub Issue: apache#43703

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…Unsafe in /csharp (apache#43651)

Bumps [BenchmarkDotNet](https://github.com/dotnet/BenchmarkDotNet) and [System.Runtime.CompilerServices.Unsafe](https://github.com/dotnet/runtime). These dependencies needed to be updated together.
Updates `BenchmarkDotNet` from 0.13.12 to 0.14.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/dotnet/BenchmarkDotNet/releases">BenchmarkDotNet's releases</a>.</em></p>
<blockquote>
<h2>0.14.0</h2>
<p>Full changelog: <a href="https://benchmarkdotnet.org/changelog/v0.14.0.html">https://benchmarkdotnet.org/changelog/v0.14.0.html</a></p>
<h2>Highlights</h2>
<ul>
<li>Introduce <code>BenchmarkDotNet.Diagnostics.dotMemory</code> <a href="https://github.com/dotnet/BenchmarkDotNet/pull/2549">#2549</a>: memory allocation profile of your benchmarks using <a href="https://www.jetbrains.com/dotmemory/">dotMemory</a>, see <a href="https://github.com/BenchmarkDotNet"><code>@​BenchmarkDotNet</code></a>.Samples.IntroDotMemoryDiagnoser</li>
<li>Introduce <code>BenchmarkDotNet.Exporters.Plotting</code> <a href="https://github.com/dotnet/BenchmarkDotNet/pull/2560">#2560</a>: plotting via <a href="https://scottplot.net/">ScottPlot</a> (initial version)</li>
<li>Multiple bugfixes</li>
<li>The default build toolchains have been updated to pass <code>IntermediateOutputPath</code>, <code>OutputPath</code>, and <code>OutDir</code> properties to the <code>dotnet build</code> command. This change forces all build outputs to be placed in a new directory generated by BenchmarkDotNet, and fixes many issues that have been reported with builds. You can also access these paths in your own <code>.csproj</code> and <code>.props</code> from those properties if you need to copy custom files to the output.</li>
</ul>
<h2>Bug fixes</h2>
<ul>
<li>Fixed multiple build-related bugs including passing MsBuildArguments and .Net 8's <code>UseArtifactsOutput</code>.</li>
</ul>
<h2>Breaking Changes</h2>
<ul>
<li><code>DotNetCliBuilder</code> removed <code>retryFailedBuildWithNoDeps</code> constructor option.</li>
<li><code>DotNetCliCommand</code> removed <code>RetryFailedBuildWithNoDeps</code> property and <code>BuildNoRestoreNoDependencies()</code> and <code>PublishNoBuildAndNoRestore()</code> methods (replaced with <code>PublishNoRestore()</code>).</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/cf882d378d51a6998aad43ca9caa29c19d122b87"><code>cf882d3</code></a> Add macOS Sequoia in OsBrandStringHelper</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/17cf3b0a71b7fa41e83e2db16307219420f4a4f8"><code>17cf3b0</code></a> [docs] Prepare v0.14.0 changelog</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/b3fbe7c489c2b6e354f736ba4c0854e4f1daacfb"><code>b3fbe7c</code></a> Set next BenchmarkDotNet version: 0.14.0</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/23e6c523cfe638d53508d6ca8212ca23501049ce"><code>23e6c52</code></a> Fix InvalidOperationException in DotMemoryDiagnoser</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/3d34edb219b84a68a377cb38b833dd30241fd5c8"><code>3d34edb</code></a> Bump JetBrains.Profiler.SelfApi: 2.5.2-&gt;2.5.9</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/bf0a49d1f5756cd2f7cb1da56974a7ee6a5a6fdf"><code>bf0a49d</code></a> fix(CI): Deprecation issues (<a href="https://github.com/dotnet/BenchmarkDotNet/issues/2605">#2605</a>)</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/0275649d350bcdc6953215598eca775b4882ece5"><code>0275649</code></a> Fixed crash from TaskbarProgress when BuiltInComInteropSupport is disabled. ...</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/15200d46a1395ef6e69c39c6f3371ab0e0d96e5c"><code>15200d4</code></a> [build] Add BenchmarkDotNet.Exporters.Plotting.Tests to unit-tests</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/834417a7dbec1dbb22a99cbb5f45c9cd474e483e"><code>834417a</code></a> Improve logging in ScottPlotExporterTests</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/f8082a2138b7cf1bda1eab8dca98d7d3c43b9946"><code>f8082a2</code></a> Fix IntroSummaryStyle compilation</li>
<li>Additional commits viewable in <a href="https://github.com/dotnet/BenchmarkDotNet/compare/v0.13.12...v0.14.0">compare view</a></li>
</ul>
</details>
<br />

Updates `System.Runtime.CompilerServices.Unsafe` from 4.7.1 to 5.0.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/dotnet/runtime/releases">System.Runtime.CompilerServices.Unsafe's releases</a>.</em></p>
<blockquote>
<h2>.NET 5</h2>
<p><a href="https://github.com/dotnet/core/blob/master/release-notes/5.0/5.0.0/5.0.0.md">Release Notes</a>
<a href="https://github.com/dotnet/core/blob/master/release-notes/5.0/5.0.0/5.0.0-install-instructions.md">Install Instructions</a></p>
<h1>Repo</h1>
<ul>
<li><a href="https://github.com/dotnet/core/releases/tag/v5.0.0">Core</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a href="https://github.com/dotnet/runtime/commits/v5.0.0">compare view</a></li>
</ul>
</details>
<br />

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@ dependabot rebase` will rebase this PR
- `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@ dependabot merge` will merge this PR after your CI passes on it
- `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@ dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@ dependabot reopen` will reopen this PR if it is closed
- `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>

Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
…me.CompilerServices.Unsafe in /csharp (apache#43711)

Bumps [BenchmarkDotNet.Diagnostics.Windows](https://github.com/dotnet/BenchmarkDotNet) and [System.Runtime.CompilerServices.Unsafe](https://github.com/dotnet/runtime). These dependencies needed to be updated together.
Updates `BenchmarkDotNet.Diagnostics.Windows` from 0.13.12 to 0.14.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/dotnet/BenchmarkDotNet/releases">BenchmarkDotNet.Diagnostics.Windows's releases</a>.</em></p>
<blockquote>
<h2>0.14.0</h2>
<p>Full changelog: <a href="https://benchmarkdotnet.org/changelog/v0.14.0.html">https://benchmarkdotnet.org/changelog/v0.14.0.html</a></p>
<h2>Highlights</h2>
<ul>
<li>Introduce <code>BenchmarkDotNet.Diagnostics.dotMemory</code> <a href="https://github.com/dotnet/BenchmarkDotNet/pull/2549">#2549</a>: memory allocation profile of your benchmarks using <a href="https://www.jetbrains.com/dotmemory/">dotMemory</a>, see <a href="https://github.com/BenchmarkDotNet"><code>@​BenchmarkDotNet</code></a>.Samples.IntroDotMemoryDiagnoser</li>
<li>Introduce <code>BenchmarkDotNet.Exporters.Plotting</code> <a href="https://github.com/dotnet/BenchmarkDotNet/pull/2560">#2560</a>: plotting via <a href="https://scottplot.net/">ScottPlot</a> (initial version)</li>
<li>Multiple bugfixes</li>
<li>The default build toolchains have been updated to pass <code>IntermediateOutputPath</code>, <code>OutputPath</code>, and <code>OutDir</code> properties to the <code>dotnet build</code> command. This change forces all build outputs to be placed in a new directory generated by BenchmarkDotNet, and fixes many issues that have been reported with builds. You can also access these paths in your own <code>.csproj</code> and <code>.props</code> from those properties if you need to copy custom files to the output.</li>
</ul>
<h2>Bug fixes</h2>
<ul>
<li>Fixed multiple build-related bugs including passing MsBuildArguments and .Net 8's <code>UseArtifactsOutput</code>.</li>
</ul>
<h2>Breaking Changes</h2>
<ul>
<li><code>DotNetCliBuilder</code> removed <code>retryFailedBuildWithNoDeps</code> constructor option.</li>
<li><code>DotNetCliCommand</code> removed <code>RetryFailedBuildWithNoDeps</code> property and <code>BuildNoRestoreNoDependencies()</code> and <code>PublishNoBuildAndNoRestore()</code> methods (replaced with <code>PublishNoRestore()</code>).</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/cf882d378d51a6998aad43ca9caa29c19d122b87"><code>cf882d3</code></a> Add macOS Sequoia in OsBrandStringHelper</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/17cf3b0a71b7fa41e83e2db16307219420f4a4f8"><code>17cf3b0</code></a> [docs] Prepare v0.14.0 changelog</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/b3fbe7c489c2b6e354f736ba4c0854e4f1daacfb"><code>b3fbe7c</code></a> Set next BenchmarkDotNet version: 0.14.0</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/23e6c523cfe638d53508d6ca8212ca23501049ce"><code>23e6c52</code></a> Fix InvalidOperationException in DotMemoryDiagnoser</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/3d34edb219b84a68a377cb38b833dd30241fd5c8"><code>3d34edb</code></a> Bump JetBrains.Profiler.SelfApi: 2.5.2-&gt;2.5.9</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/bf0a49d1f5756cd2f7cb1da56974a7ee6a5a6fdf"><code>bf0a49d</code></a> fix(CI): Deprecation issues (<a href="https://github.com/dotnet/BenchmarkDotNet/issues/2605">#2605</a>)</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/0275649d350bcdc6953215598eca775b4882ece5"><code>0275649</code></a> Fixed crash from TaskbarProgress when BuiltInComInteropSupport is disabled. ...</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/15200d46a1395ef6e69c39c6f3371ab0e0d96e5c"><code>15200d4</code></a> [build] Add BenchmarkDotNet.Exporters.Plotting.Tests to unit-tests</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/834417a7dbec1dbb22a99cbb5f45c9cd474e483e"><code>834417a</code></a> Improve logging in ScottPlotExporterTests</li>
<li><a href="https://github.com/dotnet/BenchmarkDotNet/commit/f8082a2138b7cf1bda1eab8dca98d7d3c43b9946"><code>f8082a2</code></a> Fix IntroSummaryStyle compilation</li>
<li>Additional commits viewable in <a href="https://github.com/dotnet/BenchmarkDotNet/compare/v0.13.12...v0.14.0">compare view</a></li>
</ul>
</details>
<br />

Updates `System.Runtime.CompilerServices.Unsafe` from 4.7.1 to 5.0.0
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a href="https://github.com/dotnet/runtime/commits/v5.0.0">compare view</a></li>
</ul>
</details>
<br />

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@ dependabot rebase` will rebase this PR
- `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@ dependabot merge` will merge this PR after your CI passes on it
- `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@ dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@ dependabot reopen` will reopen this PR if it is closed
- `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>

Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Curt Hagenlocher <curt@hagenlocher.org>
…inMax512AggKernels (apache#43704)

### Rationale for this change

See apache#43687

### What changes are included in this PR?

Change Registered AVX2 to AVX512

### Are these changes tested?

No

### Are there any user-facing changes?

maybe bugfix

* GitHub Issue: apache#43687

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: mwish <maplewish117@gmail.com>
…lue metadata from/to ColumnChunkMetaData (apache#41580)

### Rationale for this change
Parquet standard allows reading/writing key-value metadata from/to ColumnChunkMetaData, but there is no way to do that with Parquet C++.

### What changes are included in this PR?
Support reading/writing key-value metadata from/to ColumnChunkMetaData with Parquet C++ reader/writer. Support reading key-value metadata from ColumnChunkMetaData with pyarrow.parquet.

### Are these changes tested?
Yes, unit tests are added

### Are there any user-facing changes?
Yes.
- Users can read or write key-value metadata for column chunks with Parquet C++.
- Users can read key-value metadata for column chunks with PyArrow.
- parquet-reader tool prints key-value metadata in column chunks when `--print-key-value-metadata` option is used.

* GitHub Issue: apache#41579

Lead-authored-by: Chungmin Lee <chungminlee@microsoft.com>
Co-authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: mwish <maplewish117@gmail.com>
…#43710)

### Rationale for this change

Currently, the `ubuntu-lint` Docker build would install its Python dependencies directly into the system Python, which can fail depending on existing system Python packages.

See example here:
https://github.com/apache/arrow/actions/runs/10400929007/job/28802420047?pr=43539 where pip's dependency resolution fails with the following error message:
```
packaging.version.InvalidVersion: Invalid version: '2013-02-16'
```

### What changes are included in this PR?

This PR switches to use a virtual environment, guaranteeing that we're not interfering with the system Python and that we're not bound by already installed Python packages.

### Are these changes tested?

By CI.

### Are there any user-facing changes?

No.

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…apache#43706)

### Rationale for this change

Snappy's CMakeLists.txt unconditionally disables RTTI. This is incompatible with some other options, such as activating UBSAN for a fuzzing build:
google/snappy#189

### What changes are included in this PR?

Add `-frtti` at the end of compiler options when compiling a bundled Snappy build.

### Are these changes tested?

On CI; also manually checked that this allows enabling Snappy on OSS-Fuzz builds.

### Are there any user-facing changes?

No.

* GitHub Issue: apache#43688

Lead-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…che#43721)

### Rationale for this change

Old re2.pc add "-std=c++11" but it causes a build error. Because Apache Arrow C++ requires C++17.

### What changes are included in this PR?

Remove "-std=c++11" as workaround. We can remove this workaround when we drop support for Ubuntu 20.04.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: apache#41396

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…s::is_{min,max}_exact (apache#43595)

### Rationale for this change

We don't need "unknown" state. If they aren't set, we can process they are not exact.

### What changes are included in this PR?

Remove `std::optional` from `arrow::ArrayStatistics::is_{min,max}_exact`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: apache#43594

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
)

### Rationale for this change

The `LargeListType` is missing in the Data Types docs: https://arrow.apache.org/docs/python/api/datatypes.html#type-classes

### What changes are included in this PR?

This PR adds the `LargeListType` to the Data Types docs.

### Are these changes tested?

The change only affects the docs. I have generated the docs locally and they appear as expected. See comment below with screenshot: apache#43597 (comment)

### Are there any user-facing changes?

The change is indeed an update in the docs.

Authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

### What changes are included in this PR?

### Are these changes tested?

### Are there any user-facing changes?

Authored-by: Xin Hao <haoxinst@gmail.com>
Signed-off-by: mwish <maplewish117@gmail.com>
…undtrip data to Tables + Parquet files (apache#43634)

### Rationale for this change

Add coverage for objects that might have issues roundtripping to Arrow Tables or Parquet files 

### What changes are included in this PR?

A new test file + a crossbow job that ensures these other packages are installed so the tests run.

### Are these changes tested?

The changes are tests

### Are there any user-facing changes?

No
* GitHub Issue: apache#43633

Authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
…e bundled Azure SDK for C++ to azure-identity_1.9.0 (apache#43723)

### Rationale for this change

Some our CI jobs (such as conda based jobs) use recent Azure SDK for C++ and they require latest Azurite. We need to update Azurite for these jobs.

I wanted to use the latest Azurite on all environments but I didn't. Because I want to keep using `apt install nodejs` on old Ubuntu for easy to maintain.

### What changes are included in this PR?

* Use the latest Azurite if possible
* Use `--skipApiVersionCheck` for old Azurite
* Update the bundled Azure SDK for C++
  * This is not required. It's for detecting this problem in many CI jobs.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.

* GitHub Issue: fix apache#41505
* GitHub Issue: apache#43702

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…pache#43724)

### Rationale for this change

We can't use thread nor `%z` on Emacripten. Some CSV tests use them.

### What changes are included in this PR?

Skip CSV tests that use thread or `%z`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#43175

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Oops, I should have caught this in apache#43633 Removes `data.table::` since the namespace is loaded. Also fix some linting errors and free up space on the force tests run.

Authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
@Kontinuation
Copy link
Author

One major issue is that we're special casing the handling of sort-orders for geometry logical types everywhere. I'm not sure if it is better to define another sort-order for geometry so that the sort order is not UNKNOWN.

I think we can use the default SortOrder which is specific to the logical type. IIRC, SortOrder specifies how two values of the same type can be compared and ordered. In the case of Geometry type, I'm not sure if it is comparable. If not, we'd better disable generating min/max values for this type.

Another issue is related to Arrow-IO (parquet::arrow::FileWriter and parquet::arrow::FileReader). There are no canonical extension types for geometry in Arrow, so we cannot write parquet files containing geometry columns using the parquet-arrow table writers.

I think we can just write/read geometry values as binary type. We can add another layer to verify they are WKB values as expected. WDYT?

Sort order for geometry values is not specified, and we're using GeometryStatistics for data skipping. Currently the min/max are generated as if they are plain binary values, I'll disable generating min/max values for geometry values.

We have some very basic tests for plain parquet read/write without using arrow vectors/tables, but we have to implement canonical extension types for geometries to support Arrow-IO. I'll follow @paleolimbot 's guidance and implement an official canonical extension type based on geoarrow.wkb.

@Kontinuation Kontinuation force-pushed the kontinuation-parquet-geometry branch 2 times, most recently from 66ffa83 to fd7b3c1 Compare September 6, 2024 04:48
@Kontinuation Kontinuation force-pushed the kontinuation-parquet-geometry branch from fd7b3c1 to 2f4329e Compare September 6, 2024 07:08
@wgtmac
Copy link

wgtmac commented Sep 8, 2024

Canonical extension type for geometry is out of the scope of Parquet and may require a separate discussion and vote process on the Arrow side. It would be better if they are not mixed in the same PR, IMHO.

@Kontinuation Kontinuation force-pushed the kontinuation-parquet-geometry branch from 6216dfd to f782e30 Compare September 11, 2024 13:38
@paleolimbot
Copy link
Owner

@Kontinuation I wonder if this is ready for picking or squashing the commits on my branch and your branch here and making a PR in to apache/arrow? That would ensure the right people can keep an eye on this if they would like to 🙂

@Kontinuation
Copy link
Author

apache#43977

I've submitted apache#43977, and all subsequent reviews and modifications will be done within this PR. Is there anything else that needs to be done?

@paleolimbot
Copy link
Owner

Ah, sorry I missed that! I'm still getting pinged for this PR when you push, so maybe close this PR?

@Kontinuation
Copy link
Author

Ah, sorry I missed that! I'm still getting pinged for this PR when you push, so maybe close this PR?

Sure. I'll close the PR.

paleolimbot pushed a commit that referenced this pull request Jan 8, 2025
…n timezone (apache#45051)

### Rationale for this change

If the timezone database is present on the system, but does not contain a timezone referenced in a ORC file, the ORC reader will crash with an uncaught C++ exception.

This can happen for example on Ubuntu 24.04 where some timezone aliases have been removed from the main `tzdata` package to a `tzdata-legacy` package. If `tzdata-legacy` is not installed, trying to read a ORC file that references e.g. the "US/Pacific" timezone would crash.

Here is a backtrace excerpt:
```
apache#12 0x00007f1a3ce23a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
apache#13 0x00007f1a3ce39391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
apache#14 0x00007f1a3f4accc4 in orc::loadTZDB(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#15 0x00007f1a3f4ad392 in std::call_once<orc::LazyTimezone::getImpl() const::{lambda()#1}>(std::once_flag&, orc::LazyTimezone::getImpl() const::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#16 0x00007f1a4298bec3 in __pthread_once_slow (once_control=0xa5ca7c8, init_routine=0x7f1a3ce69420 <__once_proxy>) at ./nptl/pthread_once.c:116
apache#17 0x00007f1a3f4a9ad0 in orc::LazyTimezone::getEpoch() const ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#18 0x00007f1a3f4e76b1 in orc::TimestampColumnReader::TimestampColumnReader(orc::Type const&, orc::StripeStreams&, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#19 0x00007f1a3f4e84ad in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#20 0x00007f1a3f4e8dd7 in orc::StructColumnReader::StructColumnReader(orc::Type const&, orc::StripeStreams&, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#21 0x00007f1a3f4e8532 in orc::buildReader(orc::Type const&, orc::StripeStreams&, bool, bool, bool) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#22 0x00007f1a3f4925e9 in orc::RowReaderImpl::startNextStripe() ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#23 0x00007f1a3f492c9d in orc::RowReaderImpl::next(orc::ColumnVectorBatch&) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
apache#24 0x00007f1a3e6b251f in arrow::adapters::orc::ORCFileReader::Impl::ReadBatch(orc::RowReaderOptions const&, std::shared_ptr<arrow::Schema> const&, long) ()
   from /tmp/arrow-HEAD.ArqTs/venv-wheel-3.12-manylinux_2_17_x86_64.manylinux2014_x86_64/lib/python3.12/site-packages/pyarrow/libarrow.so.1900
```

### What changes are included in this PR?

Catch C++ exceptions when iterating ORC batches instead of letting them slip through.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#40633

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment