Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix checks and merge with master #3

Merged
merged 18 commits into from
Aug 5, 2024

Commits on Jul 26, 2024

  1. Merge 53.0.0-dev dev branch to main (apache#6126)

    * bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight` (apache#6041)
    
    * bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight`
    
    Signed-off-by: Bugen Zhao <i@bugenzhao.com>
    
    * fix example tests
    
    Signed-off-by: Bugen Zhao <i@bugenzhao.com>
    
    ---------
    
    Signed-off-by: Bugen Zhao <i@bugenzhao.com>
    
    * Remove `impl<T: AsRef<[u8]>> From<T> for Buffer`  that easily accidentally copies data (apache#6043)
    
    * deprecate auto copy, ask explicit reference
    
    * update comments
    
    * make cargo doc happy
    
    * Make display of interval types more pretty (apache#6006)
    
    * improve dispaly for interval.
    
    * update test in pretty, and fix display problem.
    
    * tmp
    
    * fix tests in arrow-cast.
    
    * fix tests in pretty.
    
    * fix style.
    
    * Update snafu (apache#5930)
    
    * Update Parquet thrift generated structures (apache#6045)
    
    * update to latest thrift (as of 11 Jul 2024) from parquet-format
    
    * pass None for optional size statistics
    
    * escape HTML tags
    
    * don't need to escape brackets in arrays
    
    * Revert "Revert "Write Bloom filters between row groups instead of the end  (#…" (apache#5933)
    
    This reverts commit 22e0b44.
    
    * Revert "Update snafu (apache#5930)" (apache#6069)
    
    This reverts commit 756b1fb.
    
    * Update pyo3 requirement from 0.21.1 to 0.22.1 (fixed) (apache#6075)
    
    * Update pyo3 requirement from 0.21.1 to 0.22.1
    
    Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
    - [Release notes](https://github.com/pyo3/pyo3/releases)
    - [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
    - [Commits](PyO3/pyo3@v0.21.1...v0.22.1)
    
    ---
    updated-dependencies:
    - dependency-name: pyo3
      dependency-type: direct:production
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    
    * refactor: remove deprecated `FromPyArrow::from_pyarrow`
    
    "GIL Refs" are being phased out.
    
    * chore: update `pyo3` in integration tests
    
    ---------
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    
    * remove repeated codes to make the codes more concise. (apache#6080)
    
    * Add `unencoded_byte_array_data_bytes` to `ParquetMetaData` (apache#6068)
    
    * update to latest thrift (as of 11 Jul 2024) from parquet-format
    
    * pass None for optional size statistics
    
    * escape HTML tags
    
    * don't need to escape brackets in arrays
    
    * add support for unencoded_byte_array_data_bytes
    
    * add comments
    
    * change sig of ColumnMetrics::update_variable_length_bytes()
    
    * rename ParquetOffsetIndex to OffsetSizeIndex
    
    * rename some functions
    
    * suggestion from review
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    
    * add Default trait to ColumnMetrics as suggested in review
    
    * rename OffsetSizeIndex to OffsetIndexMetaData
    
    ---------
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    
    * Update pyo3 requirement from 0.21.1 to 0.22.2 (apache#6085)
    
    Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
    - [Release notes](https://github.com/pyo3/pyo3/releases)
    - [Changelog](https://github.com/PyO3/pyo3/blob/v0.22.2/CHANGELOG.md)
    - [Commits](PyO3/pyo3@v0.21.1...v0.22.2)
    
    ---
    updated-dependencies:
    - dependency-name: pyo3
      dependency-type: direct:production
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    
    * Deprecate read_page_locations() and simplify offset index in `ParquetMetaData` (apache#6095)
    
    * deprecate read_page_locations
    
    * add to_thrift() to OffsetIndexMetaData
    
    * Update parquet/src/column/writer/mod.rs
    
    Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
    
    ---------
    
    Signed-off-by: Bugen Zhao <i@bugenzhao.com>
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: Bugen Zhao <i@bugenzhao.com>
    Co-authored-by: Xiangpeng Hao <haoxiangpeng123@gmail.com>
    Co-authored-by: kamille <caoruiqiu.crq@antgroup.com>
    Co-authored-by: Jesse <github@jessebakker.com>
    Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
    Co-authored-by: Marco Neumann <marco@crepererum.net>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    8 people authored Jul 26, 2024
    Configuration menu
    Copy the full SHA
    613e93e View commit details
    Browse the repository at this point in the history
  2. Add support for level histograms added in PARQUET-2261 to `ParquetMet…

    …aData` (apache#6105)
    
    * bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight` (apache#6041)
    
    * bump `tonic` to 0.12 and `prost` to 0.13 for `arrow-flight`
    
    Signed-off-by: Bugen Zhao <i@bugenzhao.com>
    
    * fix example tests
    
    Signed-off-by: Bugen Zhao <i@bugenzhao.com>
    
    ---------
    
    Signed-off-by: Bugen Zhao <i@bugenzhao.com>
    
    * Remove `impl<T: AsRef<[u8]>> From<T> for Buffer`  that easily accidentally copies data (apache#6043)
    
    * deprecate auto copy, ask explicit reference
    
    * update comments
    
    * make cargo doc happy
    
    * Make display of interval types more pretty (apache#6006)
    
    * improve dispaly for interval.
    
    * update test in pretty, and fix display problem.
    
    * tmp
    
    * fix tests in arrow-cast.
    
    * fix tests in pretty.
    
    * fix style.
    
    * Update snafu (apache#5930)
    
    * Update Parquet thrift generated structures (apache#6045)
    
    * update to latest thrift (as of 11 Jul 2024) from parquet-format
    
    * pass None for optional size statistics
    
    * escape HTML tags
    
    * don't need to escape brackets in arrays
    
    * Revert "Revert "Write Bloom filters between row groups instead of the end  (#…" (apache#5933)
    
    This reverts commit 22e0b44.
    
    * Revert "Update snafu (apache#5930)" (apache#6069)
    
    This reverts commit 756b1fb.
    
    * Update pyo3 requirement from 0.21.1 to 0.22.1 (fixed) (apache#6075)
    
    * Update pyo3 requirement from 0.21.1 to 0.22.1
    
    Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
    - [Release notes](https://github.com/pyo3/pyo3/releases)
    - [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
    - [Commits](PyO3/pyo3@v0.21.1...v0.22.1)
    
    ---
    updated-dependencies:
    - dependency-name: pyo3
      dependency-type: direct:production
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    
    * refactor: remove deprecated `FromPyArrow::from_pyarrow`
    
    "GIL Refs" are being phased out.
    
    * chore: update `pyo3` in integration tests
    
    ---------
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    
    * remove repeated codes to make the codes more concise. (apache#6080)
    
    * Add `unencoded_byte_array_data_bytes` to `ParquetMetaData` (apache#6068)
    
    * update to latest thrift (as of 11 Jul 2024) from parquet-format
    
    * pass None for optional size statistics
    
    * escape HTML tags
    
    * don't need to escape brackets in arrays
    
    * add support for unencoded_byte_array_data_bytes
    
    * add comments
    
    * change sig of ColumnMetrics::update_variable_length_bytes()
    
    * rename ParquetOffsetIndex to OffsetSizeIndex
    
    * rename some functions
    
    * suggestion from review
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    
    * add Default trait to ColumnMetrics as suggested in review
    
    * rename OffsetSizeIndex to OffsetIndexMetaData
    
    ---------
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    
    * deprecate read_page_locations
    
    * add level histograms to metadata
    
    * add to_thrift() to OffsetIndexMetaData
    
    * Update pyo3 requirement from 0.21.1 to 0.22.2 (apache#6085)
    
    Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
    - [Release notes](https://github.com/pyo3/pyo3/releases)
    - [Changelog](https://github.com/PyO3/pyo3/blob/v0.22.2/CHANGELOG.md)
    - [Commits](PyO3/pyo3@v0.21.1...v0.22.2)
    
    ---
    updated-dependencies:
    - dependency-name: pyo3
      dependency-type: direct:production
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    
    * Deprecate read_page_locations() and simplify offset index in `ParquetMetaData` (apache#6095)
    
    * deprecate read_page_locations
    
    * add to_thrift() to OffsetIndexMetaData
    
    * move valid test into ColumnIndexBuilder::append_histograms
    
    * move update_histogram() inside ColumnMetrics
    
    * Update parquet/src/column/writer/mod.rs
    
    Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
    
    * Implement LevelHistograms as a struct
    
    * formatting
    
    * fix error in docs
    
    ---------
    
    Signed-off-by: Bugen Zhao <i@bugenzhao.com>
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: Bugen Zhao <i@bugenzhao.com>
    Co-authored-by: Xiangpeng Hao <haoxiangpeng123@gmail.com>
    Co-authored-by: kamille <caoruiqiu.crq@antgroup.com>
    Co-authored-by: Jesse <github@jessebakker.com>
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    Co-authored-by: Marco Neumann <marco@crepererum.net>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    8 people authored Jul 26, 2024
    Configuration menu
    Copy the full SHA
    b06ffce View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f42d242 View commit details
    Browse the repository at this point in the history

Commits on Jul 27, 2024

  1. Implement data_part for intervals (apache#6071)

    Signed-off-by: Nick Cameron <nrc@ncameron.org>
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    nrc and alamb authored Jul 27, 2024
    Configuration menu
    Copy the full SHA
    e815d06 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    705d341 View commit details
    Browse the repository at this point in the history

Commits on Jul 28, 2024

  1. Remove automatic buffering in ipc::reader::FileReader for for consi…

    …stent buffering (apache#6132)
    
    * change ipc::reader and writer APIs for consistent buffering
    
    Current writer API automatically wraps the supplied std::io::Writer
    impl into a BufWriter.
    It is cleaner and more idiomatic to have the default be using the
    supplied impl directly, as the user might already have a BufWriter
    or an impl that doesn't actually benefit from buffering at all.
    
    StreamReader does a similar thing, but it also exposes a `try_new_unbuffered`
    that bypasses the internal wrap.
    
    Here we propose a consistent and non-buffered by default API:
    - `try_new` does not wrap the passed reader/writer,
    - `try_new_buffered` is a convenience function that does wrap
      the reader/writer into a BufReader/BufWriter,
    - all four publicly exposed IPC reader/writers follow the above consistently,
      i.e. `StreamReader`, `FileReader`, `StreamWriter`, `FileWriter`.
    
    Those are breaking changes.
    
    An additional tweak: removed the generic type bounds from struct definitions
    on the four types, as that is the idiomatic Rust approach (see e.g. stdlib's
    HashMap that has no bounds on the struct definition, only the impl requires
    Hash + Eq).
    
    See apache#6099 for the discussion.
    
    * improvements to docs in `arrow::ipc::reader` and `writer`
    
    Applied a few suggestions, made `Error` sections more consistent.
    V0ldek authored Jul 28, 2024
    Configuration menu
    Copy the full SHA
    5f5a82c View commit details
    Browse the repository at this point in the history

Commits on Jul 29, 2024

  1. Use LevelHistogram in PageIndex (apache#6135)

    * use LevelHistogram in PageIndex and ColumnIndexBuilder
    
    * revert changes to OffsetIndexBuilder
    etseidl authored Jul 29, 2024
    Configuration menu
    Copy the full SHA
    80ed712 View commit details
    Browse the repository at this point in the history
  2. Fix comparison kernel benchmarks (apache#6147)

    * fix comparison kernel benchmarks
    
    * add comment as suggested by @alamb
    samuelcolvin authored Jul 29, 2024
    Configuration menu
    Copy the full SHA
    11f2bb8 View commit details
    Browse the repository at this point in the history
  3. Implement exponential block size growing strategy for `StringViewBuil…

    …der` (apache#6136)
    
    * new block size growing strategy
    
    * Update arrow-array/src/builder/generic_bytes_view_builder.rs
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    
    * update function name, deprecate old function
    
    * update comments
    
    ---------
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    XiangpengHao and alamb authored Jul 29, 2024
    Configuration menu
    Copy the full SHA
    bd1e76b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    0e99e3a View commit details
    Browse the repository at this point in the history
  5. Improve LIKE performance for "contains" style queries (apache#6128)

    * improve "contains" performance
    
    * add tests
    
    * cargo fmt 😞
    
    ---------
    
    Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
    samuelcolvin and alamb authored Jul 29, 2024
    Configuration menu
    Copy the full SHA
    bf9ce47 View commit details
    Browse the repository at this point in the history

Commits on Jul 30, 2024

  1. improvements to (i)starts_with and (i)ends_with performance (apac…

    …he#6118)
    
    * improvements to "starts_with" and "ends_with"
    
    * add tests and refactor slightly
    
    * add comments
    samuelcolvin authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    bf0ea91 View commit details
    Browse the repository at this point in the history
  2. Add BooleanArray::new_from_packed and BooleanArray::new_from_u8 (a…

    …pache#6127)
    
    * Support construct BooleanArray from &[u8]
    
    * fix doc
    
    * add new_from_packed and new_from_u8; delete impl From<&[u8]> for BooleanArray and BooleanBuffer
    chloro-pn authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    6e893b5 View commit details
    Browse the repository at this point in the history

Commits on Jul 31, 2024

  1. Update object store MSRV to 1.64 (apache#6123)

    * Update MSRV to 1.64
    
    * Revert "clippy ignore"
    
    This reverts commit 7a4b760.
    alamb authored Jul 31, 2024
    Configuration menu
    Copy the full SHA
    2905ce6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4d1651c View commit details
    Browse the repository at this point in the history

Commits on Aug 1, 2024

  1. Upgrade protobuf definitions to flightsql 17.0 (apache#6133) (apache#…

    …6169)
    
    * Update FlightSql.proto to version 17.0
    
    Adds new message CommandStatementIngest and removes `experimental` from
    other messages.
    
    * Regenerate flight sql protocol
    
    This upgrades the file to version 17.0 of the protobuf definition.
    
    Co-authored-by: Douglas Anderson <djanderson@users.noreply.github.com>
    alamb and djanderson authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    c14ade2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8691903 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    241ee02 View commit details
    Browse the repository at this point in the history