-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to disable schema check and creation on load_file
#1922
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1922 +/- ##
==========================================
- Coverage 85.31% 84.81% -0.51%
==========================================
Files 104 104
Lines 5952 5959 +7
Branches 677 678 +1
==========================================
- Hits 5078 5054 -24
- Misses 735 762 +27
- Partials 139 143 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
@tatiana Do you think a global setting as a default can be helpful? like an env variable? |
@utkarsharma2 that's a great idea, I'll add to this PR. |
@utkarsharma2 I added the global config, please, let me know your thoughts! |
This is a follow-up for #1922. In that PR we allowed users to skip schema check & creation for `aql.load_file`, but we missed the fact that `aql.transform` and `aql.transform_file` had the same issue. This PR aims to address this limitation. Changes included in this PR: * Rename config `load_table_schema_exists` to `assume_schema_exists` * Rename (`load_file`) argument `schema_exists` to `assume_schema_exists` * Refactor where the check for `assume_schema_exists` happens. Before, it happened only inside the `load_file_to_table`. Now, it is part of `create_schema_if_applicable`. This makes this feature available in the `aql.transform` task as well * Rename `Database.create_schema_if_needed` to `Database.create_schema_if_applicable` * Expose `assume_schema_exists` in `aql.transform` * Release 1.7.0a2
Support running
load_file
without checking if the table schema exists or trying to create it.Recently a user reported that the cost of checking if the schema exists is very high for Snowflake:
"I have a (
load_file
) task that took 1:36 minutes to run, and it was 1:30 running the information schema query."This is likely happening for other databases as well.
Introduce two ways of disabling schema checks:
On a per-task basis, by exposing the argument
schema_exists
inaql.load_file
When this argument is
True
, the SDK will not check if the schema exists or try to create it.It is
False
by default, and the Python SDK will behave as of 1.6 (running schema check and, if needed, trying to create the schema)Globally, by exposing the Airflow configuration
load_table_schema_exists
in the[astro-sdk]
section. This can also be set using the environment variableAIRFLOW__ASTRO_SDK__LOAD_TABLE_SCHEMA_EXISTS
. The global configuration can be overridden per task, using [1].Closes: #1921