Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema parameter to RedshiftSource #1847

Merged
merged 1 commit into from
Sep 9, 2021

Conversation

felixwang9817
Copy link
Collaborator

@felixwang9817 felixwang9817 commented Sep 8, 2021

Signed-off-by: Felix Wang wangfelix98@gmail.com

What this PR does / why we need it: The RedshiftSource implementation currently does not support a schema parameter. This PR adds support for a schema parameter.

Which issue(s) this PR fixes:

Fixes #1767

Does this PR introduce a user-facing change?:

schema is now an optional parameter in RedshiftSource

Signed-off-by: Felix Wang <wangfelix98@gmail.com>
@codecov-commenter
Copy link

codecov-commenter commented Sep 8, 2021

Codecov Report

Merging #1847 (809ce38) into master (0dc13f0) will decrease coverage by 20.72%.
The diff coverage is 50.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #1847       +/-   ##
===========================================
- Coverage   84.45%   63.72%   -20.73%     
===========================================
  Files          90       90               
  Lines        6773     6784       +11     
===========================================
- Hits         5720     4323     -1397     
- Misses       1053     2461     +1408     
Flag Coverage Δ
integrationtests ?
unittests 63.72% <50.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...thon/feast/infra/offline_stores/redshift_source.py 45.91% <50.00%> (-35.70%) ⬇️
.../integration/online_store/test_online_retrieval.py 17.39% <0.00%> (-82.61%) ⬇️
.../integration/online_store/test_universal_online.py 18.03% <0.00%> (-81.97%) ⬇️
sdk/python/tests/utils/online_read_write_test.py 18.18% <0.00%> (-81.82%) ⬇️
...fline_store/test_universal_historical_retrieval.py 20.00% <0.00%> (-80.00%) ⬇️
...gration/registration/test_feature_service_apply.py 31.25% <0.00%> (-68.75%) ⬇️
sdk/python/feast/infra/online_stores/datastore.py 30.32% <0.00%> (-63.12%) ⬇️
sdk/python/feast/infra/online_stores/redis.py 30.35% <0.00%> (-62.50%) ⬇️
sdk/python/tests/data/data_creator.py 38.09% <0.00%> (-61.91%) ⬇️
sdk/python/feast/infra/online_stores/dynamodb.py 30.37% <0.00%> (-60.76%) ⬇️
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0dc13f0...809ce38. Read the comment docs.

@felixwang9817
Copy link
Collaborator Author

/kind bug

super().__init__(
event_timestamp_column,
created_timestamp_column,
field_mapping,
date_partition_column,
)

self._redshift_options = RedshiftOptions(table=table, query=query)
# The default Redshift schema is named "public".
_schema = "public" if table and not schema else schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it appropriate for us to set the default or defer to Amazon for the schema? I'm not super familiar with how AWS deals with the lack of a schema.

Copy link
Collaborator Author

@felixwang9817 felixwang9817 Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redshift comes with a default schema named public. By default, operations occur in the public schema; for example, if you reference a table without specifying a schema, Redshift will look in the public schema.

We could just defer to Redshift's default, but I think it's better to be explicit here. It will make it easier for users to understand (especially users who aren't familiar with Redshift), and easier to debug.

Copy link
Member

@woop woop Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Redshift uses public by default, what difference does setting the schema explicitly on our side make? Why would it be easier to understand or debug? It seems like we are moving the debugging experience from Redshift to Feast.

I am not trying to block this PR. Happy to merge. Just curious.

@woop
Copy link
Member

woop commented Sep 8, 2021

My only real question is how we know that we have solved the problem without changing any tests. Do we consider this functionality to be far enough off the happy path that changing our tests or updating our tests isn't worth it?

@felixwang9817
Copy link
Collaborator Author

My only real question is how we know that we have solved the problem without changing any tests. Do we consider this functionality to be far enough off the happy path that changing our tests or updating our tests isn't worth it?

Yeah that's a fair question. Since this PR has added "public" as the default schema, the existing tests are now querying Redshift with both a table and a schema, instead of just relying on Redshift's default. However, we are not testing the case where a user might specify a different schema; I consider this far enough off the happy path that we shouldn't modify our tests.

@woop
Copy link
Member

woop commented Sep 9, 2021

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: felixwang9817, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add schema parameter to RedshiftSource to distinguish between database schemas
4 participants