Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional BigQuery type maps #2028

Closed
wants to merge 17 commits into from
Closed

Conversation

judahrand
Copy link
Member

@judahrand judahrand commented Nov 12, 2021

What this PR does / why we need it:
Currently support for BigQuery DATE as well as various ARRAY subtypes are missing due to a lack of type mapping.

Which issue(s) this PR fixes:

Fixes #2053

Does this PR introduce a user-facing change?:

* Add support for Python `date` type
* Fix BigQuery ARRAY type inference
* Support BigQuery DATE and additional ARRAY types

@feast-ci-bot
Copy link
Collaborator

Hi @judahrand. Thanks for your PR.

I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2021

Codecov Report

Merging #2028 (a6395f2) into master (f279a7d) will decrease coverage by 26.08%.
The diff coverage is 28.98%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #2028       +/-   ##
===========================================
- Coverage   84.57%   58.48%   -26.09%     
===========================================
  Files         102      102               
  Lines        8173     8222       +49     
===========================================
- Hits         6912     4809     -2103     
- Misses       1261     3413     +2152     
Flag Coverage Δ
integrationtests ?
unittests 58.48% <28.98%> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...thon/feast/infra/offline_stores/bigquery_source.py 41.93% <10.00%> (-37.61%) ⬇️
sdk/python/feast/type_map.py 40.93% <24.24%> (-31.42%) ⬇️
...s/integration/registration/test_universal_types.py 35.75% <26.66%> (-62.96%) ⬇️
...n/feature_repos/universal/data_sources/redshift.py 51.51% <57.14%> (-48.49%) ⬇️
...thon/feast/infra/offline_stores/redshift_source.py 46.53% <66.66%> (-36.12%) ⬇️
sdk/python/tests/data/data_creator.py 34.78% <100.00%> (-65.22%) ⬇️
.../integration/online_store/test_universal_online.py 15.34% <0.00%> (-82.80%) ⬇️
.../integration/online_store/test_online_retrieval.py 17.39% <0.00%> (-82.61%) ⬇️
...fline_store/test_universal_historical_retrieval.py 19.18% <0.00%> (-80.82%) ⬇️
sdk/python/tests/utils/online_read_write_test.py 20.68% <0.00%> (-79.32%) ⬇️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f279a7d...a6395f2. Read the comment docs.

@woop
Copy link
Member

woop commented Nov 17, 2021

@judahrand this is great, thanks. Would you mind adding tests? It's hard to validate whether this PR fixes a problem without it.

Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
@adchia
Copy link
Collaborator

adchia commented Dec 8, 2021

@pyalex FYI

@tsotnet tsotnet assigned tsotnet and unassigned felixwang9817 Dec 8, 2021
@tsotnet
Copy link
Collaborator

tsotnet commented Dec 8, 2021

Hey @judahrand, sorry for the wait. I'll be taking a look at this PR (including failures) today, so I changed the assignee to myself.

Copy link
Collaborator

@tsotnet tsotnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. I suspect what is happening here is that Redshift data creator is silently failing to upload the local dataframe to Redshift, which later surfaces as table not found error. This seems to happen only for datetime type, not date. So there could be some type-related issue (e.g. Redshift doesn't support this type). I'll look more closely into this tomorrow.

sdk/python/feast/type_map.py Show resolved Hide resolved
sdk/python/feast/type_map.py Outdated Show resolved Hide resolved
sdk/python/feast/type_map.py Outdated Show resolved Hide resolved
@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: judahrand
To complete the pull request process, please assign tsotnet after the PR has been reviewed.
You can assign the PR to them by writing /assign @tsotnet in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
judahrand and others added 7 commits December 10, 2021 13:08
Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
…ROTO_VALUE

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
@tsotnet
Copy link
Collaborator

tsotnet commented Dec 17, 2021

Hey @judahrand, I've been working on debugging this for the last couple of days on and off. I haven't found the root cause, but here are my findings:

  1. Removing UNIX_TIMESTAMP_LIST from PYTHON_LIST_VALUE_TYPE_TO_PROTO_VALUE caused some tests to fail, so I added it back here 05f1c67
  2. There were some intermittent failures caused by Redshift connection errors, which got fixed after I increased the Redshift cluster size.
  3. Looks like RedshiftDataSourceCreator.table field had a class-level initialization with an empty list. This caused some interference between tests, which I fixed by this: 8d3d5d4. I hoped this would fix the recurring exception (Redshift table not existing), but unfortunately it didn't. (To clarify, this is a legit bug fix, it's just that this wasn't causing any issues for this PR).
  4. I tried reproducing the error locally, but to no avail. I was initially trying to run a single test locally, but towards the end I ran the entire integration test suite a couple of times with the exact parameters as in the Github Action, suspecting some kind of race condition between individual tests, but I wasn't able to get any failures locally.
  5. I added some logging to see when the tables are created, destroyed and requested. The timeline aligns in all cases. I even checked the CREATE & DROP SQL query timelines in Redshift (using DescribeStatement API), but unless I missed anything, nothing seems to be out of place.
  6. The errors are consistently happening with 3 tests:
FAILED sdk/python/tests/integration/registration/test_universal_types.py::test_entity_inference_types_match[TypeTestConfig(entity_type=<ValueType.INT32: 3>, feature_dtype='datetime', feature_is_list=False, has_empty_list=False, test_repo_config=Provider: aws-Redshif-dynamodb)]
FAILED sdk/python/tests/integration/registration/test_universal_types.py::test_feature_get_online_features_types_match[TypeTestConfig(entity_type=<ValueType.INT32: 3>, feature_dtype='datetime', feature_is_list=False, has_empty_list=False, test_repo_config=Provider: aws-Redshif-dynamodb)]
FAILED sdk/python/tests/integration/registration/test_universal_types.py::test_feature_get_online_features_types_match[TypeTestConfig(entity_type=<ValueType.INT32: 3>, feature_dtype='datetime', feature_is_list=False, has_empty_list=False, test_repo_config=Provider: aws-Redshif-redis)]

Notice all of them are datetime features in the same file. I'm not sure if there's something weird going on with these 3 tests specifically, just calling it out in case you want to take another look. I don't have leads as to why these tests could be consistently failing in github actions and not on my laptop (I even tried using ubuntu to match the host os, but nothing changed), so at this point, I'll stop my investigation.

Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
@judahrand
Copy link
Member Author

I tried reproducing the error locally, but to no avail. I was initially trying to run a single test locally, but towards the end I ran the entire integration test suite a couple of times with the exact parameters as in the Github Action, suspecting some kind of race condition between individual tests, but I wasn't able to get any failures locally.

You mean running the test locally but still connecting to Redshift?

@tsotnet
Copy link
Collaborator

tsotnet commented Dec 23, 2021

You mean running the test locally but still connecting to Redshift?

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ODFV does not allow feast.ValueType.UNIX_TIMESTAMP in RequestDataSource schema
7 participants