-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema parameter to RedshiftSource #1847
Conversation
Signed-off-by: Felix Wang <wangfelix98@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #1847 +/- ##
===========================================
- Coverage 84.45% 63.72% -20.73%
===========================================
Files 90 90
Lines 6773 6784 +11
===========================================
- Hits 5720 4323 -1397
- Misses 1053 2461 +1408
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
/kind bug |
super().__init__( | ||
event_timestamp_column, | ||
created_timestamp_column, | ||
field_mapping, | ||
date_partition_column, | ||
) | ||
|
||
self._redshift_options = RedshiftOptions(table=table, query=query) | ||
# The default Redshift schema is named "public". | ||
_schema = "public" if table and not schema else schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it appropriate for us to set the default or defer to Amazon for the schema? I'm not super familiar with how AWS deals with the lack of a schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redshift comes with a default schema named public
. By default, operations occur in the public
schema; for example, if you reference a table without specifying a schema, Redshift will look in the public
schema.
We could just defer to Redshift's default, but I think it's better to be explicit here. It will make it easier for users to understand (especially users who aren't familiar with Redshift), and easier to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Redshift uses public by default, what difference does setting the schema explicitly on our side make? Why would it be easier to understand or debug? It seems like we are moving the debugging experience from Redshift to Feast.
I am not trying to block this PR. Happy to merge. Just curious.
My only real question is how we know that we have solved the problem without changing any tests. Do we consider this functionality to be far enough off the happy path that changing our tests or updating our tests isn't worth it? |
Yeah that's a fair question. Since this PR has added |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: felixwang9817, woop The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Felix Wang wangfelix98@gmail.com
What this PR does / why we need it: The RedshiftSource implementation currently does not support a
schema
parameter. This PR adds support for aschema
parameter.Which issue(s) this PR fixes:
Fixes #1767
Does this PR introduce a user-facing change?: