-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add motherduck support for duckdb plugin #2680
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2680 +/- ##
===========================================
- Coverage 94.08% 83.05% -11.04%
===========================================
Files 32 333 +301
Lines 1842 26287 +24445
Branches 0 4071 +4071
===========================================
+ Hits 1733 21832 +20099
- Misses 109 3756 +3647
- Partials 0 699 +699 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are linting issues: https://github.com/flyteorg/flytekit/actions/runs/10374544320/job/28722185574?pr=2680
Can you run make fmt
? (Also you can configure pre-commit to automatically lint your files.
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
"""Connect to local DuckDB.""" | ||
return duckdb.connect(":memory:") | ||
|
||
def _connect_motherduck(self, hosted_secret: Secret): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add some test cases?
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
inputs: Optional[Dict[str, Union[StructuredDataset, list]]] = None, | ||
provider: DuckDBProvider = DuckDBProvider.LOCAL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an enum restricts someone from passing a callable. Lets say someone wants to add a new keyword to duckdb.connect
for motherduck:
def custom_connect_motherduck(token: str):
"""Connect to MotherDuck."""
return duckdb.connect("md:", config={"motherduck_token": token, "another_config": "hello"})
They should be a to pass it into provider:
DuckDBQuery(..., provider=custom_connect_motherduck)
If you prefer an enum, I think we also open it up to callables:
provider: Union[DuckDBProvider, Callable]
In the above callable API, I propose making the input just be a string. To pass in the secret the user API becomes:
secret = Secret(key="my-key", group="my-group")
DuckDBQuery(..., connect_secret=secret)
then in `DuckDBQuery._connect_to_duckdb:
def _connect_to_duckdb(self):
connect_token = None
if self._connect_secret:
connect_token = current_context().secrets.get(
group=self._connect_secret.group,
key=self._connect_secret.key,
group_version=self._connect_secret.group_version,
)
if isinstance(self.provider, DuckDBProvider):
return self.provider.value(connect_token)
else: # callable
return self.provider(connect_token)
This way the callable does not need to be responsible for handling current_context
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do!! Thank you.
I will also use secret_requests
from PythonAutoContainerTask
and check for it in the constructor rather than adding a new secret argument.
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Why are the changes needed?
We would like to integrate motherduck support with the current duckdb plugin such that a
DuckDBQuery
can query the motherduck data warehouse and/or in memory files at the same time.What changes were proposed in this pull request?
Simply add a connection to motherduck and authenticate via a secret.
duckdb
handles the rest.How was this patch tested?
Tested on a motherduck free trial with a combinarion query like:
Check all the applicable boxes
Related PRs
Docs link