-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TWTTR] Don't enqueue tasks again if already queued #60
Conversation
https://github.com/twitter-forks/airflow/pull/27/files |
It looks like this change was explicitly added to deal with a problem with CeleryExecutor -- take a look at the context above
So my question would be: can we merge this if we're also using the CeleryExecutor on-prem? Would be very nervous about undoing this because I think it was a change that Dan put a lot of thought into when it was merged. Maybe checking for executor is a safer way to handle this? |
+1 to Sean's comment, might add a check to my forking change to only run this code if the executor is celery. It's possible the issue has been fixed but I don't think it's extremely likely. If you want to remove the change I would make sure to turn debug logging on for Celery and prepare oncall resources to address customer task hangs just in case, or more ideally come with metrics to detect this automatically (which is quite hard to do). |
@aoen not sure how come we don't see duplicate runs for celery, as there are chances that 2 workers can pick the same task at the same time and may find it in queued state and start running it. But as I don't know full history of celery-mysql here, I'm overriding the |
FYI the Task Class' "run" method prevents the double triggers from happening @msumit (acquires DB lock to change state to running) |
[TWTTR] Don't enqueue tasks again if already queued for K8sExecutor(twitter-forks#60) Basically reverting commit 87fcc1c and making changes specifically into the Celery Executor class only.
[TWTTR] Don't enqueue tasks again if already queued for K8sExecutor(twitter-forks#60) Basically reverting commit 87fcc1c and making changes specifically into the Celery Executor class only.
* EWT-569 : Initial Commit for migrations * [EWT-569] Airflow Upgrade to 1.10.14, Cherry-Pick 76fe7ac from 1.10.4 * CP Contains fb64f2e: [TWTR][AIRFLOW-XXX] Twitter Airflow Customizations + Fixup job scheduling without explicit_defaults_for_timestamp * [EWT-569] Airflow Upgrade to 1.10.14, Cherry-Pick 91d2b00 [CP][EWT-548][AIRFLOW-6527] Make send_task_to_executor timeout configurable (#63) * [EWT-569] Airflow Upgrade to 1.10.14, Cherry-Pick 91d2b00 CP contains [EWT-16]: Airflow fix for manual trigger during version upgrade (#13) * [EWT-16]: Airflow fix for manual trigger during version upgrade * [EWT-569] Airflow Upgrade to 1.10.14, Cherry-Pick 91d2b00 [CP][EWT-548][AIRFLOW-6527] Make send_task_to_executor timeout configurable (#63) CP of f757a54 * CP(55bb579) [AIRFLOW-5597] Linkify urls in task instance log (#16) * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : 94cdcf6 [CP] Contains [AIRFLOW-5597] Linkify urls in task instance log CP of f757a54 * [EWT-569] Airflow Upgrade to 1.10.14, Cherry-Pick 4ce8d4c from 1.10.4 CP contains [TWTTR] Fix for rendering code on UI (#34) * [EWT-569] Airflow Upgrade to 1.10.14, Cherry-Pick 299b4d8 from 1.10.4 CP contains [TWTR] CP from 1.10+twtr (#35) * 99ee040: CP from 1.10+twtr * 2e01c24: CP from 1.10.4 ([TWTR][AIRFLOW-4939] Fixup use of fallback kwarg in conf.getint) * 00cb4ae: [TWTR][AIRFLOW-XXXX] Cherry-pick d4a83bc and bump version (#21) * CP 51b1aee: Relax version requiremets (#24) * CP 67a4d1c: [CX-16266] Change with reference to 1a4c164 commit in open source (#25) * CP 54bd095: [TWTR][CX-17516] Queue tasks already being handled by the executor (#26) * CP 87fcc1c: [TWTR][CX-17516] Requeue tasks in the queued state (#27) * CP 98a1ca9: [AIRFLOW-6625] Explicitly log using utf-8 encoding (apache#7247) (#31) * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : f7050fb CP Contains Experiment API path fix (#37) * [EWT-569] Airflow Upgrade to 1.10.14, Cherry-Pick 8a689af from 1.10.4 CP Contains Export scheduler env variable into worker pods. (#38) * [EWT-569] Airflow Upgrade to 1.10.14, Cherry-Pick 5875a15 from 1.10.4 Cp Contains [EWT-115][EWT-118] Initialise dag var to None and fix for DagModel.fileloc (missed in EWT-16) (#39) * [EWT-569] Airflow Upgrade to 1.10.14, Cherry-Pick a68e2b3 from 1.10.4 [CX-16591] Fix regex to work with impersonated clusters like airflow_scheduler_ddavydov (#42) * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : e9642c2 [CP][EWT-128] Fetch task logs from worker pods (19ac45a) (#43) * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : d5d0a07 [CP][AIRFLOW-6561][EWT-290]: Adding priority class and default resource for worker pod. (#47) * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : 9b58c88 [CP][EWT-302]Patch Pool.DEFAULT_POOL_NAME in BaseOperator (apache#8587) (#49) Open source commit id: b37ce29 * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : 7b52a71 [CP][AIRFLOW-3121] Define closed property on StreamLogWriter (apache#3955) (#52) CP of 2d5b8a5 * [EWT-361] Fix broken regex pattern for extracting dataflow job id (#51) Update the dataflow URL regex as per AIRFLOW-9323 * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : 4b5b977 EWT-370: Use python3 to launch the dataflow job. (#53) * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : 596e24f * [EWT-450] fixing sla miss triggering duplicate alerts every minute (#56) * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : b3d7fb4 [CP] Handle IntegrityErrors for trigger dagruns & add Stacktrace when DagFileProcessorManager gets killed (#57) CP of faaf179 - from master CP of 2102122 - from 1.10.12 * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : bac4acd [TWTR][EWT-472] Add lifecycle support while launching worker pods (#59) * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : 6162402 [TWTTR] Don't enqueue tasks again if already queued for K8sExecutor(#60) Basically reverting commit 87fcc1c and making changes specifically into the Celery Executor class only. * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : 1991419 [CP][TWTR][EWT-377] Fix DagBag bug when a Dag has invalid schedule_interval (#61) CP of 5605d10 & apache#11462 * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : 48be0f9 [TWTR][EWT-350] Reverting the last commit partially (#62) * [EWT-569] Airflow Upgrade to 1.10.14 [CP] from 1.10.4+twtr : d8c473e [CP][EWT-548][AIRFLOW-6527] Make send_task_to_executor timeout configurable (#63) CP of f757a54
Found that some tasks are being run twice and sometimes even thrice with K8sExecutors. The fix I did here already exists in the masters, but could not track the source jira/issue of that.
https://github.com/apache/airflow/blob/master/airflow/executors/base_executor.py#L71