Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Pandas v2 compatibility #3957

Merged
merged 9 commits into from
Mar 4, 2024
Merged

Conversation

sudohainguyen
Copy link
Collaborator

What this PR does / why we need it:
Resume works from #3928

Which issue(s) this PR fixes:

Fixes #3709

@sudohainguyen
Copy link
Collaborator Author

sudohainguyen commented Feb 18, 2024

pending on this snowflakedb/snowflake-connector-python#1872
to be released to resolve deps conflicts

Note: test cases won't be success unless pandas 2.2.0 is installed

setup.py Outdated Show resolved Hide resolved
@sudohainguyen
Copy link
Collaborator Author

pending on this snowflakedb/snowflake-connector-python#1872 to be released to resolve deps conflicts

Note: test cases won't be success unless pandas 2.2.0 is installed

decided to skip some tests with conditions

@sudohainguyen sudohainguyen changed the title feat: Support pandas v2 feat: Pandas v2 compatibility Feb 19, 2024
@sudohainguyen
Copy link
Collaborator Author

sudohainguyen commented Feb 19, 2024

good, all passed now @tokoko

@sudohainguyen sudohainguyen requested review from shuchu and removed request for tokoko February 19, 2024 07:20
@tokoko
Copy link
Collaborator

tokoko commented Feb 20, 2024

@sudohainguyen probably best to wait for snowflake release anyway, don't you think? Right now this pr would bump python to 2.2.0, but CI is testing 2.1.4 (plus with some tests being skipped). btw, we should probably consider running tests independently for each "backend" at some point in the future to avoid these scenarios.

@sudohainguyen
Copy link
Collaborator Author

think we can merge, but our release will be pending until Snowflake completes their upgrade.

mark skip should be kept so other contributors can be aware of it

@sudohainguyen
Copy link
Collaborator Author

we should probably consider running tests independently for each "backend" at some point in the future to avoid these scenarios.

agree but a lot to do so

@@ -28,6 +28,11 @@
@pytest.mark.integration
@pytest.mark.universal_offline_stores
@pytest.mark.parametrize("pass_as_path", [True, False], ids=lambda v: str(v))
@pytest.mark.skipif(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does 2.0.3 also contain this bug? If so, even after snowflake is released, 3.8 tests will still fail once we remove skipifs as the latest pandas release in 3.8 is 2.0.3. Will we need to pin pandas version below 2.0.0 only for python 3.8 using environment markers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well actually the bug appears running pandas testing only,
it does not affect to features though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still, wdyt about adding something like this in our ci extras: pandas < 2.0.0 : python_version == '3.8'. Would be better than skipping the tests entirely.

@sudohainguyen
Copy link
Collaborator Author

@tokoko pandas 2.2.0 updated and tests passed 🙂

@tokoko
Copy link
Collaborator

tokoko commented Feb 23, 2024

great, one last point... skips can be removed now, can't they? none of the ci pandas versions fall within their range, anyway.

@sudohainguyen
Copy link
Collaborator Author

ok one minute

@sudohainguyen
Copy link
Collaborator Author

This is good to go @franciscojavierarceo
Thanks boss

@jeremyary
Copy link
Collaborator

Thanks @sudohainguyen! If you can assist with conflicts, I'd be happy to roll it in.

@sudohainguyen
Copy link
Collaborator Author

@jeremyary sure will do 😄
think you can have a look at #3950 first, need your help as well

@jeremyary
Copy link
Collaborator

@sudohainguyen thanks! I'm working a vulnerability issue this morning, but will add 3950 to today's list for myself to TAL.

@jeremyary
Copy link
Collaborator

@sudohainguyen got 3950 all buttoned up yesterday. 👍 I'll keep an eye here for if/when you get a chance to look at conflicts.

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
@sudohainguyen
Copy link
Collaborator Author

@jeremyary all good now 😄

Copy link
Collaborator

@jeremyary jeremyary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes proposed were already approved prior in diff review, and now this lgtm on conflict resolution. Thanks for the contrib!

@jeremyary jeremyary merged commit 64459ad into feast-dev:master Mar 4, 2024
18 checks passed
@sudohainguyen sudohainguyen deleted the deps branch March 4, 2024 13:54
tqtensor pushed a commit to tqtensor/feast that referenced this pull request Mar 11, 2024
* feat: Support pandas v2

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

* fix: Prune dependencies

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

* chore: Re-compile reqs py310

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

* fix: Mark test skip with conditions

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

* chore: Re-compile reqs py39

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

* chore: Update skip reason

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

* chore: Re-compile reqs py38

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

* chore: Bump snowflake connector

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

* chore: Remove test skip

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>

---------

Signed-off-by: Hai Nguyen <quanghai.ng1512@gmail.com>
@matankley
Copy link

Hi guys, thanks for the great work.
Do you know when is this planned to be released ? I see it was merged but no release was published yet.

@sudohainguyen
Copy link
Collaborator Author

Hi guys, thanks for the great work.

Do you know when is this planned to be released ? I see it was merged but no release was published yet.

@franciscojavierarceo you know this best 👀

@franciscojavierarceo
Copy link
Member

I can release this weekend 👍

@TomSteenbergen
Copy link
Contributor

@franciscojavierarceo Any news on when we can expect the new release? 🙏

@franciscojavierarceo
Copy link
Member

@TomSteenbergen so sorry for not updating you on this sooner, I introduced a bug in the latest commit that @tokoko discovered and i'm working to fix it.

@TomSteenbergen
Copy link
Contributor

I see, no worries and thanks for the update @franciscojavierarceo! Keep us posted and let me know if I can help.

franciscojavierarceo pushed a commit that referenced this pull request Apr 16, 2024
# [0.36.0](v0.35.0...v0.36.0) (2024-04-16)

### Bug Fixes

* Add __eq__, __hash__ to SparkSource for correct comparison ([#4028](#4028)) ([e703b40](e703b40))
* Add conn.commit() to Postgresonline_write_batch.online_write_batch ([#3904](#3904)) ([7d75fc5](7d75fc5))
* Add missing __init__.py to embedded_go ([#4051](#4051)) ([6bb4c73](6bb4c73))
* Add missing init files in infra utils ([#4067](#4067)) ([54910a1](54910a1))
* Added registryPath parameter documentation in WebUI reference ([#3983](#3983)) ([5e0af8f](5e0af8f)), closes [#3974](#3974) [#3974](#3974)
* Adding missing init files in materialization modules ([#4052](#4052)) ([df05253](df05253))
* Allow trancated timestamps when converting ([#3861](#3861)) ([bdd7dfb](bdd7dfb))
* Azure blob storage support in Java feature server ([#2319](#2319)) ([#4014](#4014)) ([b9aabbd](b9aabbd))
* Bugfix for grabbing historical data from Snowflake with array type features. ([#3964](#3964)) ([1cc94f2](1cc94f2))
* Bytewax materialization engine fails when loading feature_store.yaml ([#3912](#3912)) ([987f0fd](987f0fd))
* CI unittest warnings ([#4006](#4006)) ([0441b8b](0441b8b))
* Correct the returning class proto type of StreamFeatureView to StreamFeatureViewProto instead of FeatureViewProto. ([#3843](#3843)) ([86d6221](86d6221))
* Create index only if not exists during MySQL online store update ([#3905](#3905)) ([2f99a61](2f99a61))
* Disable minio tests in workflows on master and nightly ([#4072](#4072)) ([c06dda8](c06dda8))
* Disable the Feast Usage feature by default. ([#4090](#4090)) ([b5a7013](b5a7013))
* Dump repo_config by alias ([#4063](#4063)) ([e4bef67](e4bef67))
* Extend SQL registry config with a sqlalchemy_config_kwargs key ([#3997](#3997)) ([21931d5](21931d5))
* Feature Server image startup in OpenShift clusters ([#4096](#4096)) ([9efb243](9efb243))
* Fix copy method for StreamFeatureView ([#3951](#3951)) ([cf06704](cf06704))
* Fix for materializing entityless feature views in Snowflake ([#3961](#3961)) ([1e64c77](1e64c77))
* Fix type mapping spark ([#4071](#4071)) ([3afa78e](3afa78e))
* Fix typo as the cli does not support shortcut-f option. ([#3954](#3954)) ([dd79dbb](dd79dbb))
* Get container host addresses from testcontainers ([#3946](#3946)) ([2cf1a0f](2cf1a0f))
* Handle ComplexFeastType to None comparison ([#3876](#3876)) ([fa8492d](fa8492d))
* Hashlib md5 errors in FIPS for python 3.9+ ([#4019](#4019)) ([6d9156b](6d9156b))
* Making the query_timeout variable as optional int because upstream is considered to be optional ([#4092](#4092)) ([fd5b620](fd5b620))
* Move gRPC dependencies to an extra ([#3900](#3900)) ([f93c5fd](f93c5fd))
* Prevent spamming pull busybox from dockerhub ([#3923](#3923)) ([7153cad](7153cad))
* Quickstart notebook example ([#3976](#3976)) ([b023aa5](b023aa5))
* Raise error when not able read of file source spark source ([#4005](#4005)) ([34cabfb](34cabfb))
* remove not use input parameter in spark source ([#3980](#3980)) ([7c90882](7c90882))
* Remove parentheses in pull_latest_from_table_or_query ([#4026](#4026)) ([dc4671e](dc4671e))
* Remove proto-plus imports ([#4044](#4044)) ([ad8f572](ad8f572))
* Remove unnecessary dependency on mysqlclient ([#3925](#3925)) ([f494f02](f494f02))
* Restore label check for all actions using pull_request_target ([#3978](#3978)) ([591ba4e](591ba4e))
* Revert mypy config ([#3952](#3952)) ([6b8e96c](6b8e96c))
* Rewrite Spark materialization engine to use mapInPandas ([#3936](#3936)) ([dbb59ba](dbb59ba))
* Run feature server w/o gunicorn on windows ([#4024](#4024)) ([584e9b1](584e9b1))
* SqlRegistry _apply_object update statement ([#4042](#4042)) ([ef62def](ef62def))
* Substrait ODFVs for online ([#4064](#4064)) ([26391b0](26391b0))
* Swap security label check on the PR title validation job to explicit permissions instead ([#3987](#3987)) ([f604af9](f604af9))
* Transformation server doesn't generate files from proto ([#3902](#3902)) ([d3a2a45](d3a2a45))
* Trino as an OfflineStore Access Denied when BasicAuthenticaion ([#3898](#3898)) ([49d2988](49d2988))
* Trying to import pyspark lazily to avoid the dependency on the library ([#4091](#4091)) ([a05cdbc](a05cdbc))
* Typo Correction in Feast UI Readme ([#3939](#3939)) ([c16e5af](c16e5af))
* Update actions/setup-python from v3 to v4 ([#4003](#4003)) ([ee4c4f1](ee4c4f1))
* Update typeguard version to >=4.0.0 ([#3837](#3837)) ([dd96150](dd96150))
* Upgrade sqlalchemy from 1.x to 2.x regarding PVE-2022-51668. ([#4065](#4065)) ([ec4c15c](ec4c15c))
* Use CopyFrom() instead of __deepycopy__() for creating a copy of protobuf object. ([#3999](#3999)) ([5561b30](5561b30))
* Using version args to install the correct feast version ([#3953](#3953)) ([b83a702](b83a702))
* Verify the existence of Registry tables in snowflake before calling CREATE sql command. Allow read-only user to call feast apply. ([#3851](#3851)) ([9a3590e](9a3590e))

### Features

* Add duckdb offline store ([#3981](#3981)) ([161547b](161547b))
* Add Entity df in format of a Spark Dataframe instead of just pd.DataFrame or string for SparkOfflineStore ([#3988](#3988)) ([43b2c28](43b2c28))
* Add gRPC Registry Server ([#3924](#3924)) ([373e624](373e624))
* Add local tests for s3 registry using minio ([#4029](#4029)) ([d82d1ec](d82d1ec))
* Add python bytes to array type conversion support proto ([#3874](#3874)) ([8688acd](8688acd))
* Add python client for remote registry server ([#3941](#3941)) ([42a7b81](42a7b81))
* Add Substrait-based ODFV transformation ([#3969](#3969)) ([9e58bd4](9e58bd4))
* Add support for arrays in snowflake ([#3769](#3769)) ([8d6bec8](8d6bec8))
* Added delete_table to redis online store ([#3857](#3857)) ([03dae13](03dae13))
* Adding support for Native Python feature transformations for ODFVs ([#4045](#4045)) ([73bc853](73bc853))
* Bumping requirements ([#4079](#4079)) ([1943056](1943056))
* Decouple transformation types from ODFVs ([#3949](#3949)) ([0a9fae8](0a9fae8))
* Dropping Python 3.8 from local integration tests and integration tests ([#3994](#3994)) ([817995c](817995c))
* Dropping python 3.8 requirements files from the project. ([#4021](#4021)) ([f09c612](f09c612))
* Dropping the support for python 3.8 version from feast ([#4010](#4010)) ([a0f7472](a0f7472))
* Dropping unit tests for Python 3.8 ([#3989](#3989)) ([60f24f9](60f24f9))
* Enable Arrow-based columnar data transfers  ([#3996](#3996)) ([d8d7567](d8d7567))
* Enable Vector database and retrieve_online_documents API ([#4061](#4061)) ([ec19036](ec19036))
* Kubernetes materialization engine written based on bytewax ([#4087](#4087)) ([7617bdb](7617bdb))
* Lint with ruff ([#4043](#4043)) ([7f1557b](7f1557b))
* Make arrow primary interchange for offline ODFV execution ([#4083](#4083)) ([9ed0a09](9ed0a09))
* Pandas v2 compatibility ([#3957](#3957)) ([64459ad](64459ad))
* Pull duckdb from contribs, add to CI ([#4059](#4059)) ([318a2b8](318a2b8))
* Refactor ODFV schema inference ([#4076](#4076)) ([c50a9ff](c50a9ff))
* Refactor registry caching logic into a separate class ([#3943](#3943)) ([924f944](924f944))
* Rename OnDemandTransformations to Transformations ([#4038](#4038)) ([9b98eaf](9b98eaf))
* Revert updating dependencies so that feast can be run on 3.11. ([#3968](#3968)) ([d3c68fb](d3c68fb)), closes [#3958](#3958)
* Rewrite ibis point-in-time-join w/o feast abstractions ([#4023](#4023)) ([3980e0c](3980e0c))
* Support s3gov schema by snowflake offline store during materialization ([#3891](#3891)) ([ea8ad17](ea8ad17))
* Update odfv test ([#4054](#4054)) ([afd52b8](afd52b8))
* Update pyproject.toml to use Python 3.9 as default ([#4011](#4011)) ([277b891](277b891))
* Update the Pydantic from v1 to v2 ([#3948](#3948)) ([ec11a7c](ec11a7c))
* Updating dependencies so that feast can be run on 3.11. ([#3958](#3958)) ([59639db](59639db))
* Updating protos to separate transformation ([#4018](#4018)) ([c58ef74](c58ef74))

### Reverts

* Reverting bumping requirements ([#4081](#4081)) ([1ba65b4](1ba65b4)), closes [#4079](#4079)
* Verify the existence of Registry tables in snowflake… ([#3907](#3907)) ([c0d358a](c0d358a)), closes [#3851](#3851)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pandas 2.0 support
6 participants