Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Logical Types Branch #14241

Merged
merged 770 commits into from
Jan 23, 2025
Merged

Conversation

tobixdev
Copy link
Contributor

Which issue does this PR close?

Updates the logical-types branch to main for #12622.
In the last PR (#14202) there was a problem which caused unintended changes when diffing logical-types with main.

In this diff between the upstream main and my logical-types branch you can only see the intended changes. Hopefully it works this time so we can start with addressing the other tasks in #12622 .

Rationale for this change

Updates the logical-types branch to main for #12622.

What changes are included in this PR?

Apply Scalar type to new code and resolve issues between the logical-types branch and main.

Are these changes tested?

Are there any user-facing changes?

cc @jayzhan211

Eason0729 and others added 30 commits December 11, 2024 07:21
* fix: Fix parse_sql_expr not handling alias

* cargo fmt

* fix parse_sql_expr example(remove alias)

* add testing

* add SUM udaf to TestContextProvider and modify test_sql_to_expr_with_alias for function

* revert change on example `parse_sql_expr`
apache#13730)

Debug trait is useful for understanding what something is and how it's
configured, especially if the implementation is behind dyn trait.
…13660)

* add `unnest_as_table_factor` and `UnnestRelationBuilder`

* unparse unnest as table factor

* fix typo

* add tests for the default configs

* add a static const for unnest_placeholder

* fix tests

* fix tests
…apache#13727)

* Update apache-avro requirement from 0.16 to 0.17

---
updated-dependencies:
- dependency-name: apache-avro
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix compatibility changes schema handling apache-avro 0.17

- Handle ArraySchema struct
- Handle MapSchema struct
- Map BigDecimal => LargeBinary
- Map TimestampNanos => Timestamp(TimeUnit::Nanosecond, None)
- Map LocalTimestampNanos => todo!()
- Add Default to FixedSchema test

* Update Cargo.lock file for apache-avro 0.17

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marc Droogh <marc.droogh@imc.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Minor: Add doc example to RecordBatchStreamAdapter

* Update datafusion/physical-plan/src/stream.rs

Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com>

---------

Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com>
…13581)

* Implement GroupsAccumulator for corr(x,y)

* feedbacks

* fix CI MSRV

* review

* avoid collect in accumulation

* add back cast
* fix union serialisation order in proto

* clippy

* address comments
…apache#13733)

* Minor: make unsupported `nanosecond` part a real (not internal) error

* fmt

* Improve wording to refer to date part
…nes (apache#13732)

* Add tests for date_part on columns + timestamps with / without timezones

* Add tests from apache#13372

* remove trailing whitespace
* Optimize performance of initcap (~2x faster)

Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>

* format

---------

Signed-off-by: Tai Le Manh <manhtai.lmt@gmail.com>
Before the change, the request to use PostgreSQL was simply ignored when
`--complete` flag was present.
…pache#13739)

* doc-gen: migrate window functions documentation

Signed-off-by: zjregee <zjregee@gmail.com>

* fix: update Cargo.lock

---------

Signed-off-by: zjregee <zjregee@gmail.com>
…pache#13751)

* Refactor JoinLeftData structure by removing unused memory reservation field in hash join implementation

* Add Debug and Clone derives for HashJoinStreamState and ProcessProbeBatchState enums

This commit enhances the HashJoinStreamState and ProcessProbeBatchState structures by implementing the Debug and Clone traits, allowing for easier debugging and cloning of these state representations in the hash join implementation.
* Add big decimal formatting test cases with potential trailing zeros

* Rename and simplify decimal rendering functions

- add `decimal` to function name
- drop `precision` parameter as it is not supposed to affect the result

* Update to bigdecimal 0.4.7

Utilize new `to_plain_string` function
* CI: Warn on unused crates

* CI: Warn on unused crates

* CI: Warn on unused crates

* CI: Warn on unused crates

* CI: Clean up dependencies

* CI: Clean up dependencies
* plan implicit lateral if table factor is UNNEST

* check for outer references in `create_relation_subquery`

* add sqllogictest

* fix lateral constant test to not expect a subquery node

* replace sqllogictest in favor of logical plan test

* update lateral join sqllogictests

* add sqllogictests

* fix logical plan test
* Minor: improve the Deprecation / API health policy

* prettier

* Update docs/source/library-user-guide/api-health.md

Co-authored-by: Jonah Gao <jonahgao@msn.com>

* Add version guidance and make more copy/paste friendly

* prettier

* better

* rename to guidelines

---------

Co-authored-by: Jonah Gao <jonahgao@msn.com>
* fix: specify roottype in fieldreference

Signed-off-by: MBWhite <whitemat@uk.ibm.com>

* Fix formatting

Signed-off-by: MBWhite <whitemat@uk.ibm.com>

* review suggestion

Signed-off-by: MBWhite <whitemat@uk.ibm.com>

---------

Signed-off-by: MBWhite <whitemat@uk.ibm.com>
…nction signature (apache#13372)

* add type sig class

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* timestamp

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* date part

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fmt

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* taplo format

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* tpch test

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* msrc issue

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* msrc issue

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* explicit hash

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* Enhance type coercion and function signatures

- Added logic to prevent unnecessary casting of string types in `native.rs`.
- Introduced `Comparable` variant in `TypeSignature` to define coercion rules for comparisons.
- Updated imports in `functions.rs` and `signature.rs` for better organization.
- Modified `date_part.rs` to improve handling of timestamp extraction and fixed query tests in `expr.slt`.
- Added `datafusion-macros` dependency in `Cargo.toml` and `Cargo.lock`.

These changes improve type handling and ensure more accurate function behavior in SQL expressions.

* fix comment

Signed-off-by: Jay Zhan <jayzhan211@gmail.com>

* fix signature

Signed-off-by: Jay Zhan <jayzhan211@gmail.com>

* fix test

Signed-off-by: Jay Zhan <jayzhan211@gmail.com>

* Enhance type coercion for timestamps to allow implicit casting from strings. Update SQL logic tests to reflect changes in timestamp handling, including expected outputs for queries involving nanoseconds and seconds.

* Refactor type coercion logic for timestamps to improve readability and maintainability. Update the `TypeSignatureClass` documentation to clarify its purpose in function signatures, particularly regarding coercible types. This change enhances the handling of implicit casting from strings to timestamps.

* Fix SQL logic tests to correct query error handling for timestamp functions. Updated expected outputs for `date_part` and `extract` functions to reflect proper behavior with nanoseconds and seconds. This change improves the accuracy of test cases in the `expr.slt` file.

* Enhance timestamp handling in TypeSignature to support timezone specification. Updated the logic to include an additional DataType for timestamps with a timezone wildcard, improving flexibility in timestamp operations.

* Refactor date_part function: remove redundant imports and add missing not_impl_err import for better error handling

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: Jay Zhan <jayzhan211@gmail.com>
* Minor: Add some more blog posts to the readings page

* prettier

* prettier

* Update docs/source/user-guide/concepts-readings-events.md

---------

Co-authored-by: Oleks V <comphead@users.noreply.github.com>
)

Fixing `GroupsAccumulator` trait name in its docs
* Improve deprecation guidelines more

* prettier
…ingArrayBuilder` (apache#13758)

* fix: add `null_buffer` check for `LargeStringArray`

Add a safety check to ensure that the alignment of buffers cannot be
overflowed. This introduces a panic if they are not aligned through a
runtime assertion.

* fix: remove value_buffer assertion

These buffers can be misaligned and it is not problematic, it is the
`null_buffer` which we care about being of the same length.

* feat: add `null_buffer` check to `StringArray`

This is in a similar vein to `LargeStringArray`, as the code is the
same, except for `i32`'s instead of `i64`.

* feat: use `row_count` var to avoid drift
* fix: restore memory reservation in JoinLeftData for accurate memory accounting in HashJoin

This commit reintroduces the `_reservation` field in the `JoinLeftData` structure to ensure proper tracking of memory resources during join operations. The absence of this field could lead to inconsistent memory usage reporting and potential out-of-memory issues as upstream operators increase their memory consumption.

* fmt

Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>

---------

Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
* Update documentation guidelines for contribution content

* Apply suggestions from code review

Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Co-authored-by: Oleks V <comphead@users.noreply.github.com>

* clarify discussions and remove requirements note

* prettier

* Update docs/source/contributor-guide/index.md

Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>

---------

Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
* Add Round trip tests for Array <--> ScalarValue

* String dictionary test

* remove unecessary value

* Improve comments
cj-zhukov and others added 14 commits January 18, 2025 12:35
apache#14168)

* Add a hint about expected extension in error message in register_csv, register_parquet, register_json, register_avro (apache#14144)

* Add tests for error

* fix test

* fmt

* Fix issues causing GitHub checks to fail

* revert datafusion-testing change

---------

Co-authored-by: Sergey Zhukov <szhukov@aligntech.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…#14142)

* External memory limit validation for sort

* add bug tracker

* cleanup

* Update submodule

* reviews

* fix CI

* move feature to module level
…nction (apache#14183)

Co-authored-by: Cheng-Yuan-Lai <a186235@g,ail.com>
…tion function (apache#14181)

Co-authored-by: Cheng-Yuan-Lai <a186235@g,ail.com>
* Added job board as a separate header in the documentation

* Update docs/source/contributor-guide/communication.md

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update docs/source/contributor-guide/communication.md

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* prettier

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…T FROM` (apache#14187)

* Mapped the Spaceship operator with IsNotDistinctFrom

* Added tests for Spaceship Operator <=>

* Added sanity test for Spaceship Operator <=>
* feat: Use `SchemaRef` in `JoinFilter`

* Update datafusion/core/src/physical_optimizer/projection_pushdown.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/physical-plan/src/joins/join_filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/physical-plan/src/joins/join_filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/physical-plan/src/joins/join_filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fix

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…ns-nested functions (apache#14201)

Co-authored-by: Cheng-Yuan-Lai <a186235@g,ail.com>
@github-actions github-actions bot added documentation Improvements or additions to documentation sql SQL Planner development-process Related to development process of DataFusion logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate functions labels Jan 22, 2025
@jayzhan211 jayzhan211 merged commit 25f02a7 into apache:logical-types Jan 23, 2025
27 checks passed
@jayzhan211
Copy link
Contributor

jayzhan211 commented Jan 23, 2025

I guess there is something wrong with Github, so the file changed displayed in Compare shows additional unexpected changes. I fork logical-types to another branch logical-types-v2 and the comparison in PR looks correct 🤔

Anyway, I think we can keep working on other tasks on branch logical-types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
catalog Related to the catalog crate common Related to common crate core Core DataFusion crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation execution Related to the execution crate functions logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.