-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround job race bug on biguery to gcs transfer #24330
Conversation
366ae59
to
d2540a9
Compare
cc: @wojsamjan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not well-versed here, but looks legit.
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
* Doc: Add column names for DB Migration Reference (apache#23853) Before the automation: https://airflow.apache.org/docs/apache-airflow/2.2.5/migrations-ref.html Currently (with missing column names): https://airflow.apache.org/docs/apache-airflow/2.3.0/migrations-ref.html * Fix exception trying to display moved table warnings (apache#23837) If you still have an old dangling table from the 2.2 migration this would fail. Make it more resilient and cope with both styles of moved table name * Update sample dag and doc for RDS (apache#23651) * Fix DataprocJobBaseOperator not being compatible with dotted names (apache#23439). (apache#23791) * job_name parameter is now sanitized, replacing dots by underscores. * Upgrade `pip` to 22.1.1 version (just released) (apache#23854) * Add better feedback to Breeze users about expected action timing (apache#23827) There are a few actions in Breeze that might take more or less time when invoked. This is mostly when you need to upgrade Breeze or update to latest version of the image because some dependedncies were added or image was modified. While we have improved significantly the waiting time involved now (and caching problems have been fixed to make it as fast possible), there are still a few situations that you need to have a good connectivity and a little time to run the upgrade. Which is often not something you would like to loose your time on in a number of cases when you need to do things fast. Usually Breeeze does not force the user to perform such long actions - it allows to continue without doing them (either by timeout or by letting user answer "no" to question asked. Previously Breeze have not informed the user about the exepcted time of running such operation, but with this change it tells what is the expected delay - thus allowing the user to make informed action whether they want to run the upgrade or not. * Fix UnboundLocalError when sql is empty list in DbApiHook (apache#23816) * Fix UnboundLocalError when sql is empty list in DatabricksSqlHook (apache#23815) * Add number of node params only for single-node cluster in RedshiftCreateClusterOperator (apache#23839) * Sql to gcs with exclude columns (apache#23695) * Add support for associating custom tags to job runs submitted via EmrContainerOperator (apache#23769) Co-authored-by: Sandeep Kadyan <sandeep.kadyan@publicissapient.com> * Add Deferrable Databricks operators (apache#19736) * Fix Amazon EKS example DAG raises warning during Imports (apache#23849) Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> * Fix databricks tests (apache#23856) * Add __wrapped__ property to _TaskDecorator (apache#23830) Co-authored-by: Sanjay Pillai <sanjaypillai11 [at] gmail.com> * Highlight task states by hovering on legend row (apache#23678) * Rework the legend row and add the hover effect. * Move horevedTaskState to state and fix merge conflicts. * Add tests. * Order of item in the LegendRow, add no_status support * Clean up f-strings in logging calls (apache#23597) * update K8S-KIND to 0.14.0 (apache#23859) * Replaced all days_ago functions with datetime functions (apache#23237) Co-authored-by: Dev232001 <thedevhooda@gmail.com> * Add clear DagRun endpoint. (apache#23451) * Ignore the DeprecationWarning in test_days_ago (apache#23875) Co-authored-by: alexkru <alexkru@wix.com> * Speed up Breeze experience on Mac OS (apache#23866) This change should significantly speed up Breeze experience (and especially iterating over a change in Breeze for MacOS users - independently if you are using x86 or arm architecture. The problem with MacOS with docker is particularly slow filesystem used to map sources from Host to Docker VM. It is particularly bad when there are multiple small files involved. The improvement come from two areas: * removing duplicate pycache cleaning * moving MyPy cache to docker volume When entering breeze we are - just in case - cleaning .pyc and __pychache__ files potentially generated outside of the docker container - this is particularly useful if you use local IDE and you do not have bytecode generation disabled (we have it disabled in Breeze). Generating python bytecode might lead to various problems when you are switching branches and Python versions, so for Breeze development where the files change often anyway, disabling them and removing when they are found is important. This happens at entering breeze and it might take a second or two depending if you have locally generated. It could happen that __init script was called twice (depending which script was called - therefore the time could be double the one that was actually needed. Also if you ever generated provider packages, the time could be much longer, because node_modules generated in provider sources were not excluded from searching (and on MacOS it takes a LOT of time). This also led to duplicate time of exit as the initialization code installed traps that were also run twice. The traps however were rather fast so had no negative influence on performance. The change adds a guard so that initialization is only ever executed once. Second part of the change is moving the cache of mypy to a docker volume rather than being used from local source folder (default when complete sources are mounted). We were already using selective mount to make sure MacOS filesystem slowness affects us in minimal way - but with this change, the cache will be stored in docker volume that does not suffer from the same problems as mounting volumes from host. The Docker volume is preserved until the `docker stop` command is run - which means that iterating over a change should be WAY faster now - observed speed-up were around 5x speedups for MyPy pre-commit. * Add default task retry delay config (apache#23861) * Move MappedOperator tests to mirror code location (apache#23884) At some point during the development of AIP-42 we moved the code for MappedOperator out of baseoperator.py to mappedoperator.py, but we didn't move the tests at the same time * Enable clicking on DAG owner in autocomplete dropdown (apache#23804) PR#18991 introduced directly navigating to a DAG when selecting one from the typeahead search results. Unfortunately, the search results also includes DAG owner names, and selecting one of those navigates to a DAG with that name, which almost certainly doesn't exist. This extends the autocompletion endpoint to return the type of result, and adjusts the typeahead selection to use this to know which way to navigate. * Document LocalKubernetesExecutor support in chart (apache#23876) * Avoid extra questions in `breeze build image` command. (apache#23898) Fixes: apache#23867 * Update INTHEWILD.md (apache#23892) * Split contributor's quick start into separate guides. (apache#23762) The foldable parts were not good. They made links not to work as well as they were not too discoverable. Fixes: apache#23174 * Avoid printing exception when exiting tests command (apache#23897) Fixes: apache#23868 * Move string arg evals to `execute()` in `EksCreateClusterOperator` (apache#23877) Currently there are string-value evaluations of `compute`, `nodegroup_role_arn`, and `fargate_pod_execution_role_arn` args in the constructor of `EksCreateClusterOperator`. These args are all listed as a template fields so it's entirely possible that the value(s) passed in to the operator is a Jinja expression or an `XComArg`. Either of these value types could cause a false-negative `ValueError` (in the case of unsupported `compute` values) or a `false-positive` (in the the cases of explicit checks for the *arn values) since the values themselves have not been rendered. This PR moves the evaluations of these args to the `execute()` scope. * Update .readthedocs.yml (apache#23903) String instead of Int see https://docs.readthedocs.io/en/stable/config-file/v2.html * Make --file command in static-checks autocomplete file name (apache#23896) The --verbose and --dry-dun commands caused n --files command to fail and the flag was "artifficial" -it was equivalent to bool flag. the actual files were taken from arguments. This PR fixes it by turning the arguments into multiple ``--file`` commands - each with its own completioin for local files. * Chart: Update default airflow version to `2.3.1` (apache#23913) * Fix Breeze documentation typo (apache#23919) * Update environments documentation links (apache#23920) * `2.3.1` has been released (apache#23912) * Make CI and PROD image builds consistent (apache#23841) Simple refactoring to make the jobs more consistent. * Alphabetizes two tables (apache#23923) The rest of the page has consistently alphabetized tables. This commit fixes three `extras` that were not alphabetized. * Use "remote" pod when patching KPO pod as "checked" (apache#23676) When patching as "checked", we have to use the current version of the pod otherwise we may get an error when trying to patch it, e.g.: ``` Operation cannot be fulfilled on pods \"test-kubernetes-pod-db9eedb7885c40099dd40cd4edc62415\": the object has been modified; please apply your changes to the latest version and try again" ``` This error would not cause a failure of the task, since errors in `cleanup` are suppressed. However, it would fail to patch. I believe one scenario when the pod may be updated is when retrieving xcom, since the sidecar is terminated after extracting the value. Concerning some changes in the tests re the "already_checked" label, it was added to a few "expected pods" recently, when we changed it to patch even in the case of a successful pod. Since we are changing the "patch" code to patch with the latest read on the pod that we have (i.e. using the `remote_pod` variable), and no longer the pod object stored on `k.pod`, the label no longer shows up in those tests (that's because in k.pod isn't actually a read of the remote pod, but just happens to get mutated in the patch function before it is used to actually patch the pod). Further, since the `remote_pod` is a local variable, we can't observe it in tests. So we have to read the pod using k8s api. _But_, our "find pod" function excludes "already checked" pods! So we have to make this configurable. So, now we have a proper integration test for the "already_checked" behavior (there was already a unit test). * Clarify manual merging of PR in release doc (apache#23928) It was not clear to me what this really means * Fix broken main (apache#23940) main breaks with `Traceback: /usr/local/lib/python3.7/importlib/__init__.py:127: in import_module return _bootstrap._gcd_import(name[level:], package, level) tests/providers/amazon/aws/hooks/test_cloud_formation.py:31: in <module> class TestCloudFormationHook(unittest.TestCase): tests/providers/amazon/aws/hooks/test_cloud_formation.py:67: in TestCloudFormationHook @mock_cloudformation /usr/local/lib/python3.7/site-packages/moto/__init__.py:30: in f module = importlib.import_module(module_name, "moto") /usr/local/lib/python3.7/importlib/__init__.py:127: in import_module return _bootstrap._gcd_import(name[level:], package, level) /usr/local/lib/python3.7/site-packages/moto/cloudformation/__init__.py:1: in <module> from .models import cloudformation_backends /usr/local/lib/python3.7/site-packages/moto/cloudformation/models.py:18: in <module> from .parsing import ResourceMap, OutputMap /usr/local/lib/python3.7/site-packages/moto/cloudformation/parsing.py:17: in <module> from moto.apigateway import models # noqa # pylint: disable=all /usr/local/lib/python3.7/site-packages/moto/apigateway/__init__.py:1: in <module> from .models import apigateway_backends /usr/local/lib/python3.7/site-packages/moto/apigateway/models.py:9: in <module> from openapi_spec_validator import validate_spec E ModuleNotFoundError: No module named 'openapi_spec_validator' ` Fix is already in placed in moto getmoto/moto#5165 but version 3.1.11 wasn't released yet * Update INSTALL_PROVIDERS_FROM_SOURCES instructions. (apache#23938) * Add typing to Azure Cosmos Client Hook (apache#23941) New release of Azure Cosmos library has added typing information and it broke main builds with mypy verification. * Remove redundant register exit signals in `dag-processor` command (apache#23886) * Disable rebase workflow (apache#23943) The change of the release workflow in apache#23928 removed the reason why we should have rebase workflow possible. We only needed to do rebase when we merged test branch into stable branch and since we are doing it manually, there is no more reeason to have it in the GitHub UI. * Prevent UI from crashing if grid task instances are null (apache#23939) * UI fix for null task instances * improve tests without global vars * fix test data * Grid fix details button truncated and small UI tweaks (apache#23934) * Show details button and wrap on LegendRow. * Update following brent review * Fix display on small width * Rotate icon for a 'ReadLess' effect * Fix and speed up grid view (apache#23947) This fetches all TIs for a given task across dag runs, leading to signifincatly faster response times. It also fixes a bug where Nones were being passed to the UI when a new task was added to a DAG with exiting runs. * Removes duplicate code block (apache#23952) There's are two code blocks with identical text in the helm-chart docs. This commit removes one of them. * Update dep for databricks apache#23917 (apache#23927) * Use '--subdir' argument value for standalong dag processor. (apache#23864) * Revert "Add limit for JPype1 (apache#23847)" (apache#23953) This turned out to be mistake in manual submission. Fixed on JPype1 side. This reverts commit 3699be4. * Faster grid view (apache#23951) * Disallow calling expand with no arguments (apache#23463) * [FEATURE] KPO use K8S hook (apache#22086) * Add cascade to `dag_tag` to `dag` foreignkey (apache#23444) Bulk delete does not work if the cascade behaviour of a foreignkey is set on python side(relationship configuration). To allow bulk delete of dags we need to setup cascade deletion in the DB. The warning on query.delete at https://docs.sqlalchemy.org/en/14/orm/session_basics.html#selecting-a-synchronization-strategy stated that: The operations do not offer in-Python cascading of relationships - it is assumed that ON UPDATE CASCADE and/or ON DELETE CASCADE is configured for any foreign key references which require it, otherwise the database may emit an integrity violation if foreign key references are being enforced. Another alternative is avoiding bulk delete of dags but I prefer we support bulk deletes. This will break offline sql generation for mssql(already broken before now :) ). Also, since there's only one foreign key in `dag_tag` table, I assume that the foreign key would be named `dag_tag_ibfk_1` in `mysql`. This avoided having to query the db for the name. The foreignkey is explicitly named now, would be easy for future upgrades * DagFileProcessorManager: Start a new process group only if current process not a session leader (apache#23872) * Introduce `flake8-implicit-str-concat` plugin to static checks (apache#23873) * Fix UnboundLocalError when sql is empty list in ExasolHook (apache#23812) * Fix inverted section levels in best-practices.rst (apache#23968) This PR fixes inverted levels in the sections added to the "Best Practices" document in apache#21879. * Add support to specify language name in PapermillOperator (apache#23916) * Add support to specify language name in PapermillOperator * Replace getattr() with simple attribute access * [23945] Icons in grid view for different dag types (apache#23970) * Helm logo no longer a link (apache#23977) * Fix links in documentation (apache#23975) * fix links * added right link to breeze * Add TaskInstance State 'REMOVED' to finished states and success states (apache#23797) Now that we support dynamic task mapping, we should have the 'REMOVED' state of task instances as a finished state because for dynamic tasks with a removed task instance, the dagrun would be stuck in running state if 'REMOVED' state is not in finished states. * Remove `xcom_push` from `DockerOperator` (apache#23981) * Fix missing shorthand for docker buildx rm -f (apache#23984) Latest version of buildx removed -f as shorthand for --force flag. * use explicit --mount with types of mounts rather than --volume flags (apache#23982) The --volume flag is an old style of specifying mounts used by docker, the newer and more explicit version is --mount where you have to specify type, source, destination in the form of key/value pairs. This is more explicit and avoids some guesswork when volumes are mounted (for example seems that on WSL2 volume name might be guessed as path wrongly). The change explicitly specifies which of the mounts are bind mounts and which are volume mounts. Another nice side effect of this change is that when source is missing, docker will not automatically create directories with the missing name but it will fail. This is nicer because before it led to creating directories when they were missing (for example .bash_aliases and similar). This allows us to avoid some cleanups to account for those files being created - instead we simply skip those mounts if the file/folder does not exist. * Force colors in yarn test output in CI (apache#23986) * Fix breeze failures when there is no buildx installed on Mac (apache#23988) If you have no buildx plugin installed on Mac (for example when you use colima instead of Docker Desktop) the breeze check was failing - but buildx in fact is not needed to run typical breeze commands, and breeze already has support for it - it was just wrongly handled. * Replace generation of docker volumes to be done from python (apache#23985) The pre-commit to generate docker volumes in docker compose file is now written in Python and it also uses the newer "volume:" syntax to define the volumes mounted in the docker-compose. * Replace `use_task_execution_date` with `use_task_logical_date` (apache#23983) * Replace `use_task_execution_date` with `use_task_logical_date` We have some operators/sensors that use `*_execution_date` as the class parameters. This PR deprecate the usage of these parameters and replace it with `logical_date`. There is no change in functionality, under the hood the functionality already uses `logical_date` this is just about the parameters name as exposed to the users. * Remove pinning for xmltodict (apache#23992) We have now moto 3.1.9+ in constraints so we should remove the limit. Fixes: apache#23576 * Remove fixing cncf.kubernetes provider when generating constraints (apache#23994) When we yanked cncf.kubernetes provider, we pinned 3.1.2 temporarily for provider generation. This removes the pinning as we are already at 4.0.2 version * Add better diagnostics capabilities for pre-commits run via CI image (apache#23980) The pre-commits that require CI image run docker command under the hood that is highly optimized for performance (only mounts files that are necessary to be mounted) - in order to improve performance on Mac OS and make sure that artifacts are not left in the source code of Airflow. However that makes the command slightly more difficult to debug because they generate dynamically the docker command used, including the volumens that should be mounted when the docker command is run. This PR adds better diagnostics to the pre-commit scripts allowing VERBOSE="true" and DRY_RUN="true" variables that can help with diagnosing problems such as running the scripts on WSL2. It also fixes a few documentation bugs that have been missed after changing names of the image-related static checks and thanks to separating the common code to utility function it allows to set SKIP_IMAGE_PRE_COMMITS variable to true which will skip running all pre-commit checks that require breeze image to be available locally. * Disable fail-fast on pushing images to docker cache (apache#24005) There is an issue with pushing cache to docker registry that is connected to containerd bug but started to appear more frequently recently (as evidenced for example by https://git.luolix.topmunity/t/buildx-failed-with-error-cannot-reuse-body-request-must-be-retried/253178 ). The issue is still open in containerd: containerd/containerd#5978. Until it if fixed, we disable fail-fast on pushing cache so that even if it happens, we just have to re-run that single python version that actually failed. Currently there is a much lower chance of success because all 4 build have to succeed. * Add automated retries on retryable condition for building images in CI (apache#24006) There is a flakiness in pushing cache images to ghcr.io, therefore we want to add automated retries when the images fail intermittently. The root cause of the problem is tracked in containerd: containerd/containerd#5978 * Ensure @contextmanager decorates generator func (apache#23103) * Revert "Add automated retries on retryable condition for building images in CI (apache#24006)" (apache#24016) This reverts commit 7cf0e43. * Cleanup `BranchDayOfWeekOperator` example dag (apache#24007) * Cleanup BranchDayOfWeekOperator example dag There is no need for `dag=dag` when using context manager. * Added missing project_id to the wait_for_job (apache#24020) * Only run separate per-platform build when preparing build cache (apache#24023) Apparently pushing multi-platform images when building cache on CI has some problems recently, connected with ghcr.io being more vulnerable to race condition described in this issue: containerd/containerd#5978 Apparently when two, different platform layers are pushed about the same time to ghcr.io, the error "cannot reuse body, request must be retried" is generated. However we actually do not even need to build the multiplatform latest images because as of recently we have separate cache for each platform, and the ghcr.io/:latest images are not used any more not even for docker builds. We we always build images rather than pull and we use --from-cache for that - specific per platform. The only image pulling we do is when we pull the :COMMIT_HASH images in CI- but those are single-platform images (amd64) and even if we add tests for arm, they will have different tag. Hopefully we can still build release images without causing the race condition too frequently - this is more likely because when we build images for cache we use machines with different performance characteristics and the same layers are pushed at different times from different platforms. * Preparing buildx cache is allowed without --push-image flag (apache#24028) The previous version of buildx cache preparation implied --push-image flag, but now this is completely separated (we do not push image, we just prepare cache), so when mutli-platform buildx preparation is run we should also allow the cache to run without --push-image flag. * Add partition related methods to GlueCatalogHook: (apache#23857) * "get_partition" to retrieve a Partition * "create_partition" to create a Partition * Adds foldable CI group for command output (apache#24026) * Add foldable groups in CI outputs in commands that need it (apache#24035) This is follow-up after apache#24026 which added capability of selectively deciding for each breeze command, whether the output of the command should be "foldable" group. All CI output has been reviewed, and the commands which "need" it were identified. This also fixes a problem introduced there - that the command itself was not "foldable" group itself. * Increase size of ARM build instance (apache#24036) Our ARM cache builds started to hang recently at yarn prod step. The most likely reason are limited resources we had for the ARM instance to run the docker build - it was rather small instance with 2GB RAM and it is likely not nearly enought to cope with recent changes related to Grid View where we likely need much more memory during the yarn build step. This change increases the instance memory to 8 GB (c6g.xlarge). Also this instance type gives 70% cost saving and has very low probability of being evicted (it's not in high demand in Ohio Region of AWS. Also the AMI used is refreshed with latest software (docker) * Remove unused [github_enterprise] from ref docs (apache#24033) * Add enum validation for [webserver]analytics_tool (apache#24032) * Support impersonation service account parameter for Dataflow runner (apache#23961) * Fix closing connection dbapi.get_pandas_df (apache#23452) * Light Refactor and Clean-up AWS Provider (apache#23907) * Removing magic numbers from exceptions (apache#23997) * Removing magic numbers from exceptions * Running pre-commit * Upgrade to pip 22.1.2 (apache#24043) Pip has been upgraded to version 22.1.2 12 minutes ago. Time to catch up. * Shaves-off about 3 minutes from usage of ARM instances on CI (apache#24052) Preparing airflow packages and provider packages does not need to be done on ARM and actually the ARM instance is idle while they are prepared during cache building. This change moves preparation of the packages to before the ARM instance is started which saves about 3 minutes of ARM instance time. * SSL Bucket, Light Logic Refactor and Docstring Update for Alibaba Provider (apache#23891) * Use KubernetesHook to create api client in KubernetesPodOperator (apache#20578) Add support for k8s hook in KPO; use it always (even when no conn id); continue to consider the core k8s settings that KPO already takes into account but emit deprecation warning about them. KPO historically takes into account a few settings from core airflow cfg (e.g. verify ssl, tcp keepalive, context, config file, and in_cluster). So to use the hook to generate the client, somehow the hook has to take these settings into account. But we don't want the hook to consider these settings in general. So we read them in KPO and if necessary patch the hook and warn. * Re-add --force-build flag (apache#24061) After apache#24052 we also need to add --force-build flag as for Python 3.7 rebuilding CI cache would have been silently ignored as no image building would be needed * Fix grid view for mapped tasks (apache#24059) * Fix StatD timing metric units (apache#21106) Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Co-authored-by: Tzu-ping Chung <tp@astronomer.io> * Drop Python 3.6 compatibility objects/modules (apache#24048) * Remove hack from BigQuery DTS hook (apache#23887) * Spanner assets & system tests migration (AIP-47) (apache#23957) * Run the `check_migration` loop at least once (apache#24068) This is broken since 2.3.0. that's if a user specifies a migration_timeout of 0 then no migration is run at all. * Bump eventsource from 1.0.7 to 1.1.1 in /airflow/ui (apache#24062) Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.0.7 to 1.1.1. - [Release notes](https://github.com/EventSource/eventsource/releases) - [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md) - [Commits](EventSource/eventsource@v1.0.7...v1.1.1) --- updated-dependencies: - dependency-name: eventsource dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Remove certifi limitations from eager upgrade limits (apache#23995) The certifi limitation was introduced to keep snowflake happy while performing eager upgrade because it added limits on certifi. However seems like it is not limitation any more in latest versions of snowflake python connector, so we can safely remove it from here. The only remaining limit is dill but this one still holds. * fix style of example block (apache#24078) * Handle occasional deadlocks in trigger with retries (apache#24071) Fixes: apache#23639 * Adds Pura Scents, edits The Dyrt (apache#24086) * Migrate Yandex example DAGs to new design AIP-47 (apache#24082) closes: apache#22470 * set color to operators in cloud_sql.py (apache#24000) * Migrate HTTP example DAGs to new design AIP-47 (apache#23991) closes: apache#22448 , apache#22431 * Make expand() error vague so it's not misleading (apache#24018) * Use github for postgres chart index (apache#24089) Bitnami's CloudFront CDN is seemingly having issues, so point at github direct instead until it is resolved. * Fix the link to google workplace (apache#24080) * Bring MappedOperator members in sync with BaseOperator (apache#24034) * Add note about Docker volume remount issues in WSL 2 (apache#24094) * Convert Athena Sample DAG to System Test (apache#24058) * Self-update pre-commit to latest versions (apache#24106) * Temporarily fix bitnami index problem (apache#24112) We started to experience "Internal Error" when installing Helm chart and apperently bitnami "solved" the problem by removing from their index software older than 6 months(!). This makes our CI fail but It is much worse. This renders all our charts useless for people to install This is terribly wrong, and I raised this in the issue here: bitnami/charts#10539 (comment) * Fix small typos in static code checks doc (apache#24113) - Trivial typo fix in the command to run static checks on the last commit - Update "run all tests" to "run all checks" where applicable for consistency * Really workaround bitnami chart problem (apache#24115) The original fix in apache#24112 did not work due to: * not updated lock * EOL characters at the end of multiline long URL This PR fixes it. * Reduce grid view API calls (apache#24083) * Reduce API calls from /grid - Separate /grid_data from /grid - Remove need for formatData - Increase default query stale time to prevent extra fetches - Fix useTask query keys * consolidate grid data functions * fix www tests test grid_data instead of /grid * Removing magic status code numbers from api_connecxion (apache#24050) * Do not support MSSQL less than v2017 in code (apache#24095) Our experimental support for MSSQL starts from v2017(in README.md) but we still support 2000 & 2005 in code. This PR removes this support, allowing us to use mssql.DATETIME2 in all MSSQL DB. * Rename Permissions to Permission Pairs. (apache#24065) * Note that yarn dev needs webserver in debug mode (apache#24119) * Note that yarn dev needs webserver -d * Update CONTRIBUTING.rst Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * Use -D * Revert "Use -D" This reverts commit 94d63ad. Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> * fixing SSHHook bug when using allow_host_key_change param (apache#24116) * Adds mssql volumes to "all" backends selection (apache#24123) The "stop" command of Breeze uses "all" backend to remove all volumes - but mssql has special approach where the volumes defined depend on the filesystem used and we need to add the specific docker-compose files to list of files used when we use stop command. * Breeze must create `hooks\` and `dags\` directories for bind mounts (apache#24122) Now that breeze uses --mount instead of --volume (the former of which does not create missing mount dirs like the latter does see docs here: https://docs.docker.com/storage/bind-mounts/#differences-between--v-and---mount-behavior) we need to create these directories explicitly. * AIP-47 | Migrate Trino example DAGs to new design (apache#24118) * Update production-deployment.rst (apache#24121) The sql_alchemy_conn option is in the database section, not the core section. Simple typo fix. * Migrate Zendesk example DAGs to new design apache#22471 (apache#24129) * Migrate JDBC example DAGs to new design apache#22450 (apache#24137) * Migrate Jenkins example DAGs to new design apache#22451 (apache#24138) * Migrate Microsoft example DAGs to new design apache#22452 - mssql (apache#24139) * Migrate MySQL example DAGs to new design apache#22453 (apache#24142) * Migrate Opsgenie example DAGs to new design apache#22455 (apache#24144) * Migrate Presto example DAGs to new design apache#22459 (apache#24145) * Migrate Plexus example DAGs to new design apache#22457 (apache#24147) * Migrate SQLite example DAGs to new design apache#22461 (apache#24150) * Migrate Telegram example DAGs to new design apache#22468 (apache#24126) * AIP-47 - Migrate Tableau DAGs to new design (apache#24125) * Migrate Salesforce example DAGs to new design apache#22463 (apache#24127) * Update credentials when using ADC in Compute Engine (apache#23773) * Improve Windows development compatibility for breeze (apache#24098) * Migrate Asana example DAGs to new design apache#22440 (apache#24131) * Migrate Neo4j example DAGs to new design apache#22454 (apache#24143) * Workflows assets & system tests migration (AIP-47) (apache#24105) * Workflows assets & system tests migration (AIP-47) Co-authored-by: Wojciech Januszek <januszek@google.com> * Add disabled_algorithms as an extra parameter for SSH connections (apache#24090) * Migrate Postgres example DAGs to new design apache#22458 (apache#24148) * Migrate Postgres example DAGs to new design apache#22458 * Fix static checks * Migrate Snowflake system tests to new design apache#22434 (apache#24151) * Migrate Snowflake system tests to new design apache#22434 * Fix flake8 * Migrate Qubole example DAGs to new design apache#22460 (apache#24149) * Migrate Qubole example DAGs to new design apache#22460 * Migrate Microsoft example DAGs to new design apache#22452 - azure (apache#24141) * Migrate Microsoft example DAGs to new design apache#22452 - azure * Migrate Microsoft example DAGs to new design apache#22452 - winrm (apache#24140) * Migrate Microsoft example DAGs to new design apache#22452 - winrm * Fix static checks * Migrate Influx example DAGs to new design apache#22449 (apache#24136) * Migrate Influx example DAGs to new design apache#22449 * Fix static checks * Migrate DingTalk example DAGs to new design apache#22443 (apache#24133) * Migrate DingTalk example DAGs to new design apache#22443 * Migrate Cncf.Kubernetes example DAGs to new design apache#22441 (apache#24132) * Migrate Cncf.Kubernetes example DAGs to new design apache#22441 * Migrate Alibaba example DAGs to new design apache#22437 (apache#24130) * Migrate Alibaba example DAGs to new design apache#22437 * Pass connection extra parameters to wasb BlobServiceClient (apache#24154) * fix BigQueryInsertJobOperator (apache#24165) * Migrate Singularity example DAGs to new design apache#22464 (apache#24128) * Better summary of status of AIP-47 (apache#24169) Result is here: apache#24168 * Remove old Athena Sample DAG (apache#24170) * removed old files (apache#24172) * Chart: Default to Airflow 2.3.2 (apache#24184) * Update 'rich' to latest version across the board. (apache#24186) That Also includes regenerating the breeze output images. * Fix BigQuery system tests (apache#24013) * Change execution_date to data_interval_start in BigQueryInsertJobOperator job_id Change-Id: Ie1f3bba701169ceb2b39d693da320564de145c0c * Change jinja template path to relative path Change-Id: I6cced215124f69e9f4edf8ac08bb71d3ec3c8afc Co-authored-by: Bartlomiej Hirsz <bartomiejh@google.com> * `2.3.2` has been released (apache#24182) * Add verification step to image release process (apache#24177) * Added impersonation_chain for DataflowStartFlexTemplateOperator and DataflowStartSqlJobOperator (apache#24046) * Add key_secret_project_id parameter which specifies a project with KeyFile (apache#23930) * Add built-in Extrenal Link for ExternalTaskMarker operator (apache#23964) * fix: DatabricksSubmitRunOperator and DatabricksRunNowOperator cannot define .json as template_ext (apache#23622) (apache#23641) * fix: StepFunctionHook ignores explicit set `region_name` (apache#23976) * Remove `GithubOperator` use in `GithubSensor.__init__()`` (apache#24214) The constructor for `GithubSensor` was instantiating `GitHubOperator` to use its `execute()` method as the driver for the result of the sensor's `poke()` logic. However, this could yield a `DuplicateTaskIdFound` when used in DAGs. This PR updates the `GithubSensor` to use the `GithubHook` instead. * Mac M1 postgress and doc fix (apache#24200) * AIP-47 - Migrate dbt DAGs to new design apache#22472 (apache#24202) * AIP-47 - Migrate databricks DAGs to new design apache#22442 (apache#24203) * AIP-47 - Migrate hive DAGs to new design apache#22439 (apache#24204) * AIP-47 - Migrate kylin DAGs to new design apache#22439 (apache#24205) * AIP-47 - Migrate drill DAGs to new design apache#22439 (apache#24206) * AIP-47 - Migrate druid DAGs to new design apache#22439 (apache#24207) * AIP-47 - Migrate cassandra DAGs to new design apache#22439 (apache#24209) * AIP-47 - Migrate spark DAGs to new design apache#22439 (apache#24210) * AIP-47 - Migrate apache pig DAGs to new design apache#22439 (apache#24212) * Migrate GitHub example DAGs to new design apache#22446 (apache#24134) * Remove warnings when starting breeze (apache#24183) Breeze when started produced three warnings that were harmless, but we should fix them to remove "false positives". * AIP-47 - Migrate livy DAGs to new design apache#22439 (apache#24208) * Remove escaping which is wrong in latest rich version (apache#24217) Latest rich makes escaping not needed for extra `[` needed in Markdown URLs. * Parse error for task added to multiple groups (apache#23071) This raises an exception if a task already belonging to a task group (including added to a DAG, since such task is automatically added to the DAG's root task group). Also, according to the issue response, manually calling TaskGroup.add() is not considered a supported way to add a task to group. So a meta-marker is added to the function docstring to prevent it from showing up in documentation and users from trying to use it. * Fix xfail test in test_scheduler.py (apache#23731) * Migrate Papermill example DAGs to new design apache#22456 (apache#24146) * Migrate Asana system tests to new design AIP-47 (apache#24226) closes: apache#22428 related: apache#22440 * Migrate Microsoft system tests to new design AIP-47 (apache#24225) closes: apache#22432 related: apache#22452 * Migrate CNCF system tests to new design AIP-47 (apache#24224) closes: apache#22429 related: apache#22441 * Migrate Postgres system tests to new design (apache#24223) closes: apache#22433 related: apache#22458 * AIP-47 - Migrate beam DAGs to new design apache#22439 (apache#24211) * AIP-47 - Migrate beam DAGs to new design apache#22439 * Add explanatory note for contributors about updating Changelog (apache#24229) * Fix backwards-compatibility introduced by fixing mypy problems (apache#24230) There was a backwards-incompatibility introduced by apache#23716 in two providers by using get_mandatory_value config method. This PR corrects that backwards compatibility and updates 2.1 compatibility pre-commit to check for forbidden usage of get_mandatory_value. * Bump moto version (apache#24222) * Bump moto version version 3.1.10 broke main but the issue was fixed since in moto related: getmoto/moto#5165 * fix moto * Add `PrestoToSlackOperator` (apache#23979) * Add `PrestoToSlackOperator` Adding the funcitonality to run a single query against presto and send the result as slack message. Similar to `SnowflakeToSlackOperator` * Fix BigQuery Sensors system test (apache#24245) Co-authored-by: Bartlomiej Hirsz <bartomiejh@google.com> * adding AWS_DEFAULT_REGION to the docs, boto3 expects this to be in the env variables (apache#24181) * Unify return_code interface for task runner (apache#24093) * Update dbt.py (apache#24218) * Fix GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix (apache#24039) * Adding fnmatch type regex to SFTPSensor (apache#24084) * docs: amazon-provider retry modes (apache#23906) * Cloud Storage assets & StorageLink update (apache#23865) Co-authored-by: Wojciech Januszek <januszek@google.com> * Fix useTasks crash on error (apache#24152) * Prevent UI from crashing on Get API error * add test * don't show API errors in test logs * use setMinutes inline * Refactor GlueJobHook get_or_create_glue_job method. (apache#24215) When invoked, create_job takes into account the provided 'Command' argument instead of having it hardcoded. * Fix delete_cluster no use TriggerRule.ALL_DONE (apache#24213) related: apache#24082 * docker new system test (apache#23167) * chore: Refactoring and Cleaning Apache Providers (apache#24219) * Fix await_container_completion condition (apache#23883) * Migrate Apache Beam system tests to new design AIP-47 (apache#24256) closes: apache#22427 * Migrate Apache Beam system tests to new design apache#22427 (apache#24241) * Migrate Google leveldb system tests to new design AIP-47 (apache#24255) related: apache#22447, apache#22430 * Add param docs to KubernetesHook and KubernetesPodOperator (apache#23955) (apache#24054) * Enable dbt Cloud provider to interact with single tenant instances (apache#24264) * Enable provider to interact with single tenant * Define single tenant arg on Operator * Add test for single tenant endpoint * Enable provider to interact with single tenant * Define single tenant arg on Operator * Add test for single tenant endpoint * Code linting from black * Code linting from black * Pass tenant to dbtCloudHook in DbtCloudGetJobRunArtifactOperator class * Make Tenant a connection-level setting * Remove tenant arg from Operator * Make tenant connection-level param that defaults to 'cloud' * Remove tenant param from sensor * Remove leftover param string from hook * Update airflow/providers/dbt/cloud/hooks/dbt.py Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com> * Parameterize test_init_hook to test single and multi tenant connections * Integrate test simplification suggestion * Add connection to TestDbtCloudJobRunSesnor Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com> * Apply per-run log templates to log handlers (apache#24153) * AIP-47 - Migrate google leveldb DAGs to new design #apache#22447 (apache#24233) * Fix choosing backend versions in breeze's command line (apache#24228) Choosing version of backend were broken when command line switches were used. The _VERSION variables were "hard-coded" to defaults rather than taken from command line. This is a remnant of initial implementation and converting the parameters to "cacheable" ones. While looking at the versions we also found that PARAM_NAME_FLAG is not used any more so we took the opportunity to remove it. * Fix link broken after apache#24082 (apache#24276) apache#24082 * Add command to regenerate breeze command output images (apache#24216) * Make numpy effectively an optional dependency for Oracle provider (apache#24272) Better fix to apache#23132 * Add SMAP Energy to list of companies using Airflow (apache#24268) * fix command and typo (apache#24282) * Update doc and sample dag for EMR Containers (apache#24087) * scheduleinterval nullable true added in openapi (apache#24253) * Check that edge nodes actually exist (apache#24166) * Prepare docs for May 2022 provider's release (apache#24231) This documentation update also (following the rule agreed in https://github.com/apache/airflow/blob/main/README.md#support-for-providers) bumps mininimum supported version of Airflow for all providers to 2.2 and it constitutes a breaking change and major version bump for all providers. * pydocstyle D202 added (apache#24221) * Update provider templates for new Airflow 2.2+ req (apache#24291) I imagine we could update this somewhat programmatically and/or add this update to instructions somewhere. Let me know what you think. * Update package description to remove double min-airflow specification (apache#24292) * Airflow UI fix vulnerabilities - Prototype Pollution (apache#24201) * Mention context variables and logging (apache#24304) * Mention context variables and logging * Fix static checks * Remove limit of presto-python-client version (apache#24305) * Fix langauge override in papermill operator (apache#24301) * Also mention airflow 2 only in readme template (apache#24296) * Fix permission issue for dag that has dot in name (apache#23510) How we determine if a DAG is a subdag in airflow.security.permissions.resource_name_for_dag is not right. If a dag_id contains a dot, the permission is not recorded correctly. The current solution makes a query every time we check for permission for dags that has a dot in the name. Not that I like it but I think it's better than other options I considered such as changing how we name dags for subdag. That's not good in UX. Another option I considered was making a query when parsing, that's not good and it's avoided by passing root_dag to resource_name_for_dag Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> * Check bag DAG schedule_interval match tiemtable (apache#23113) This guards against the DAG's timetable or schedule_interval from being changed after it's created. Validation is done by creating a timetable and check its summary matches schedule_interval. The logic is not bullet-proof, especially if a custom timetable does not provide a useful summary. But this is the best we can do. * fix: patches apache#24215. Won't raise KeyError when 'create_job_kwargs' contains the 'Command' key. (apache#24308) * Fix D202 issue (apache#24322) * Check for run_id for grid group summaries (apache#24327) * Workaround job race bug on biguery to gcs transfer (apache#24330) Fixes: apache#24277 * Update release notes for RC2 release of Providers for May 2022 (apache#24307) Also updates links to example dags to work properly following apache#24331 * feat(README): 커스텀 리드미를 추가한다 (#1) * feat(README): 커스텀 리드미를 추가한다 * fix(README): 원본 readme 위에 커스텀 readme 내용을 추가하도록 수정한다 Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com> Co-authored-by: Ash Berlin-Taylor <ash@apache.org> Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com> Co-authored-by: Guilherme Martins Crocetti <24530683+gmcrocetti@users.noreply.github.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Dmytro Kazanzhy <dkazanzhy@gmail.com> Co-authored-by: pankajastro <98807258+pankajastro@users.noreply.github.com> Co-authored-by: 서재권(Data Platform) <90180644+jaegwonseo@users.noreply.github.com> Co-authored-by: Sandeep <sandeep.kadyan@gmail.com> Co-authored-by: Sandeep Kadyan <sandeep.kadyan@publicissapient.com> Co-authored-by: Eugene Karimov <13220923+eskarimov@users.noreply.github.com> Co-authored-by: Vedant Bhamare <55763604+Dark-Knight11@users.noreply.github.com> Co-authored-by: eladkal <45845474+eladkal@users.noreply.github.com> Co-authored-by: pierrejeambrun <pierrejbrun@gmail.com> Co-authored-by: sanjayp <sanjaypillai11@gmail.com> Co-authored-by: Josh Fell <48934154+josh-fell@users.noreply.github.com> Co-authored-by: raphaelauv <raphaelauv@users.noreply.github.com> Co-authored-by: Tzu-ping Chung <tp@astronomer.io> Co-authored-by: Dev232001 <thedevhooda@gmail.com> Co-authored-by: Karthikeyan Singaravelan <tir.karthi@gmail.com> Co-authored-by: Alex Kruchkov <36231027+alexkruc@users.noreply.github.com> Co-authored-by: alexkru <alexkru@wix.com> Co-authored-by: Sumit Maheshwari <msumit@users.noreply.github.com> Co-authored-by: Mark Norman Francis <norm@201created.com> Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com> Co-authored-by: Vincent Koc <koconder@users.noreply.github.com> Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com> Co-authored-by: Igor Tavares <igorborgest@gmail.com> Co-authored-by: Marty Jackson <mfjackson2008@gmail.com> Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com> Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> Co-authored-by: Brent Bovenzi <brent.bovenzi@gmail.com> Co-authored-by: mhenc <mhenc@google.com> Co-authored-by: Kengo Seki <sekikn@apache.org> Co-authored-by: John Green <nhojjohn@users.noreply.github.com> Co-authored-by: David Skoda <dskoda1@binghamton.edu> Co-authored-by: Edith Puclla <58795858+edithturn@users.noreply.github.com> Co-authored-by: Łukasz Wyszomirski <wyszomirski@google.com> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com> Co-authored-by: Hubert Pietroń <94397721+hubert-pietron@users.noreply.github.com> Co-authored-by: Bernardo Couto <35502483+bernardocouto@users.noreply.github.com> Co-authored-by: viktorvia <86823020+viktorvia@users.noreply.github.com> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> Co-authored-by: henriqueribeiro <henriqueribeiro@users.noreply.github.com> Co-authored-by: Wojciech Januszek <wjanuszek@sigma.ug.edu.pl> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ishiis <ishii.shunichi@gmail.com> Co-authored-by: Chenglong Yan <alanx.yan@gmail.com> Co-authored-by: François de Metz <francois@2metz.fr> Co-authored-by: Paul Williams <pdw@udel.edu> Co-authored-by: D. Ferruzzi <ferruzzi@amazon.com> Co-authored-by: James Timmins <james@astronomer.io> Co-authored-by: Niko <onikolas@amazon.com> Co-authored-by: chethanuk-plutoflume <chethanuk@outlook.com> Co-authored-by: DataFusion4All <101581331+DataFusion4All@users.noreply.github.com> Co-authored-by: chethanuk-plutoflume <chethan.umesha@tessian.com> Co-authored-by: Maksim <maksimy@google.com> Co-authored-by: Wojciech Januszek <januszek@google.com> Co-authored-by: Paul Williams <pauldalewilliams@gmail.com> Co-authored-by: Tanel Kiis <tanelk@users.noreply.github.com> Co-authored-by: Bowrna <mailbowrna@gmail.com> Co-authored-by: Bartłomiej Hirsz <bartek.hirsz@gmail.com> Co-authored-by: Bartlomiej Hirsz <bartomiejh@google.com> Co-authored-by: Jonathan Simon Prates <jonathan.simonprates@gmail.com> Co-authored-by: Rafael Carrasco <rafacarrasco07@gmail.com> Co-authored-by: Ping Zhang <pingzh@umich.edu> Co-authored-by: GitStart-AirFlow <101595287+gitstart-airflow@users.noreply.github.com> Co-authored-by: akakakakakaa <akstn3023@naver.com> Co-authored-by: Maria Sumedre <maria.sumedre@3pillarglobal.com> Co-authored-by: Elize Papineau <elizepapineau@gmail.com> Co-authored-by: peter-volkov <peter.r.volkov@yandex.ru> Co-authored-by: Hank Ehly <henry.ehly@gmail.com> Co-authored-by: Malthe Borch <mborch@gmail.com> Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com> Co-authored-by: socar-dini <89070514+socar-dini@users.noreply.github.com>
Fixes: #24277
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named
{pr_number}.significant.rst
, in newsfragments.