Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix Spark offline store type conversion to arrow #3071

Merged
merged 2 commits into from
Aug 11, 2022

Conversation

niklasvm
Copy link
Collaborator

@niklasvm niklasvm commented Aug 11, 2022

What this PR does / why we need it:

Fixes some integration tests related to the spark offline store. Specifically when empty list data is converted from a spark data frame to arrow. This PR fixes 5 failing tests.

The current implementation converts from spark df --> pandas --> arrow.
The new implementation writes the data temporarily to parquet and then loads it with arrow.

Which issue(s) this PR fixes:

None

Signed-off-by: niklasvm <niklasvm@gmail.com>
Signed-off-by: niklasvm <niklasvm@gmail.com>
@niklasvm niklasvm changed the title WIP: Fix some spark unit tests Chore: Fix some spark unit tests Aug 11, 2022
@niklasvm niklasvm changed the title Chore: Fix some spark unit tests chore: Fix some spark unit tests Aug 11, 2022
@niklasvm niklasvm marked this pull request as ready for review August 11, 2022 14:41
@adchia adchia changed the title chore: Fix some spark unit tests fix: Fix Spark offline store type conversion to arrow Aug 11, 2022
Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, niklasvm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov-commenter
Copy link

codecov-commenter commented Aug 11, 2022

Codecov Report

Merging #3071 (2ad0093) into master (36747aa) will decrease coverage by 9.30%.
The diff coverage is 40.00%.

@@            Coverage Diff             @@
##           master    #3071      +/-   ##
==========================================
- Coverage   67.35%   58.05%   -9.31%     
==========================================
  Files         169      202      +33     
  Lines       14834    16774    +1940     
==========================================
- Hits         9992     9738     -254     
- Misses       4842     7036    +2194     
Flag Coverage Δ
integrationtests ?
unittests 58.05% <40.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ffline_stores/contrib/spark_offline_store/spark.py 37.97% <40.00%> (ø)
...sts/integration/registration/test_universal_cli.py 20.20% <0.00%> (-79.80%) ⬇️
...ts/integration/offline_store/test_offline_write.py 26.08% <0.00%> (-73.92%) ⬇️
...fline_store/test_universal_historical_retrieval.py 28.75% <0.00%> (-71.25%) ⬇️
...ests/integration/e2e/test_python_feature_server.py 29.50% <0.00%> (-70.50%) ⬇️
...dk/python/tests/integration/e2e/test_validation.py 27.55% <0.00%> (-69.30%) ⬇️
...s/integration/registration/test_universal_types.py 32.25% <0.00%> (-67.75%) ⬇️
sdk/python/feast/infra/online_stores/redis.py 28.39% <0.00%> (-66.58%) ⬇️
sdk/python/tests/integration/e2e/test_usage_e2e.py 33.87% <0.00%> (-66.13%) ⬇️
sdk/python/tests/data/data_creator.py 34.78% <0.00%> (-65.22%) ⬇️
... and 156 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@adchia adchia merged commit b26566d into feast-dev:master Aug 11, 2022
franciscojavierarceo pushed a commit to franciscojavierarceo/feast that referenced this pull request Aug 13, 2022
* Fix unit tests related to empty list types

Signed-off-by: niklasvm <niklasvm@gmail.com>

* formatting

Signed-off-by: niklasvm <niklasvm@gmail.com>

Signed-off-by: niklasvm <niklasvm@gmail.com>
Signed-off-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
adchia pushed a commit that referenced this pull request Aug 15, 2022
* Fix unit tests related to empty list types

Signed-off-by: niklasvm <niklasvm@gmail.com>

* formatting

Signed-off-by: niklasvm <niklasvm@gmail.com>

Signed-off-by: niklasvm <niklasvm@gmail.com>
adchia pushed a commit that referenced this pull request Aug 15, 2022
## [0.22.4](v0.22.3...v0.22.4) (2022-08-15)

### Bug Fixes

* Fix field mapping logic during feature inference ([#3067](#3067)) ([3668702](3668702))
* Fix incorrect on demand feature view diffing and improve Java tests ([#3074](#3074)) ([dd46d45](dd46d45))
* Fix on demand feature view output in feast plan + Web UI crash ([#3057](#3057)) ([a44fe66](a44fe66))
* Fix Spark offline store type conversion to arrow ([#3071](#3071)) ([8e6a6b1](8e6a6b1))
adchia pushed a commit that referenced this pull request Aug 15, 2022
* Fix unit tests related to empty list types

Signed-off-by: niklasvm <niklasvm@gmail.com>

* formatting

Signed-off-by: niklasvm <niklasvm@gmail.com>

Signed-off-by: niklasvm <niklasvm@gmail.com>
adchia pushed a commit that referenced this pull request Aug 15, 2022
## [0.23.2](v0.23.1...v0.23.2) (2022-08-15)

### Bug Fixes

* Fix field mapping logic during feature inference ([#3067](#3067)) ([eb885b1](eb885b1))
* Fix incorrect on demand feature view diffing and improve Java tests ([#3074](#3074)) ([0ff0ec4](0ff0ec4))
* Fix on demand feature view output in feast plan + Web UI crash ([#3057](#3057)) ([a32d247](a32d247))
* Fix Spark offline store type conversion to arrow ([#3071](#3071)) ([a49f70c](a49f70c))
kevjumba pushed a commit that referenced this pull request Aug 25, 2022
# [0.24.0](v0.23.0...v0.24.0) (2022-08-25)

### Bug Fixes

* Check if on_demand_feature_views is an empty list rather than None for snowflake provider ([#3046](#3046)) ([9b05e65](9b05e65))
* FeatureStore.apply applies BatchFeatureView correctly ([#3098](#3098)) ([41be511](41be511))
* Fix Feast Java inconsistency with int64 serialization vs python ([#3031](#3031)) ([4bba787](4bba787))
* Fix feature service inference logic ([#3089](#3089)) ([4310ed7](4310ed7))
* Fix field mapping logic during feature inference ([#3067](#3067)) ([cdfa761](cdfa761))
* Fix incorrect on demand feature view diffing and improve Java tests ([#3074](#3074)) ([0702310](0702310))
* Fix Java helm charts to work with refactored logic. Fix FTS image ([#3105](#3105)) ([2b493e0](2b493e0))
* Fix on demand feature view output in feast plan + Web UI crash ([#3057](#3057)) ([bfae6ac](bfae6ac))
* Fix release workflow to release 0.24.0 ([#3138](#3138)) ([a69aaae](a69aaae))
* Fix Spark offline store type conversion to arrow ([#3071](#3071)) ([b26566d](b26566d))
* Fixing Web UI, which fails for the SQL registry ([#3028](#3028)) ([64603b6](64603b6))
* Force Snowflake Session to Timezone UTC ([#3083](#3083)) ([9f221e6](9f221e6))
* Make infer dummy entity join key idempotent ([#3115](#3115)) ([1f5b1e0](1f5b1e0))
* More explicit error messages ([#2708](#2708)) ([e4d7afd](e4d7afd))
* Parse inline data sources ([#3036](#3036)) ([c7ba370](c7ba370))
* Prevent overwriting existing file during `persist` ([#3088](#3088)) ([69af21f](69af21f))
* Register BatchFeatureView in feature repos correctly ([#3092](#3092)) ([b8e39ea](b8e39ea))
* Return an empty infra object from sql registry when it doesn't exist ([#3022](#3022)) ([8ba87d1](8ba87d1))
* Teardown tables for Snowflake Materialization testing ([#3106](#3106)) ([0a0c974](0a0c974))
* UI error when saved dataset is present in registry. ([#3124](#3124)) ([83cf753](83cf753))
* Update sql.py ([#3096](#3096)) ([2646a86](2646a86))
* Updated snowflake template ([#3130](#3130)) ([f0594e1](f0594e1))

### Features

* Add authentication option for snowflake connector ([#3039](#3039)) ([74c75f1](74c75f1))
* Add Cassandra/AstraDB online store contribution ([#2873](#2873)) ([feb6cb8](feb6cb8))
* Add Snowflake materialization engine ([#2948](#2948)) ([f3b522b](f3b522b))
* Adding saved dataset capabilities for Postgres  ([#3070](#3070)) ([d3253c3](d3253c3))
* Allow passing repo config path via flag ([#3077](#3077)) ([0d2d951](0d2d951))
* Contrib azure provider with synapse/mssql offline store and Azure registry store ([#3072](#3072)) ([9f7e557](9f7e557))
* Custom Docker image for Bytewax batch materialization ([#3099](#3099)) ([cdd1b07](cdd1b07))
* Feast AWS Athena offline store (again) ([#3044](#3044)) ([989ce08](989ce08))
* Implement spark offline store `offline_write_batch` method ([#3076](#3076)) ([5b0cc87](5b0cc87))
* Initial Bytewax materialization engine ([#2974](#2974)) ([55c61f9](55c61f9))
* Refactor feature server helm charts to allow passing feature_store.yaml in environment variables ([#3113](#3113)) ([85ee789](85ee789))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants