-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.1.3/4 queued dag runs changes catchup=False behaviour #18487
Comments
What is your understanding to catchup? This does not match my understanding to it. https://airflow.apache.org/docs/apache-airflow/stable/dag-run.html?highlight=catchup#catchup From the documentation’s description, catchup only matters for runs missed their turn. But since the scheduler is able to schedule those runs at the pace of once a minute, those successfully scheduled runs did not miss their turn, so catchup does not matter to them. The behaviour you described for 2.1.2 and below sounds like a bug to me. |
Per the documentation: "When turned off, the scheduler creates a DAG run only for the latest interval." With the queued state, it creates dag runs for the latest interval and the next 16 intervals (by default) |
I'm having the exactly same issue in the latest version, it breaks many of our DAGs that rely on the catchup setting. We have DAGs scheduled every minute, most of them should finish within 1 minute, but sometimes it could exceed, e.g. network issue, external service slow, etc. When this happen, it will cumulate the DagRuns that we don't expect. We have internal backfill mechanism already. The document said "Turning catchup off is great if your DAG performs catchup internally." in section "catchup" |
Can you try the setting |
@ephraimbuddy thanks I'll try this out. I can accept it as workaround, but still it's conflicting with the idea of catch-up. |
@uranusjr
Oftentimes the usage of An example would be an incremental process. Each dag run, you process the new data that has arrived since last run. So if you have your dag scheduled every 10 minutes let's say, and it takes an hour for some reason, you don't want it to create 6 new queued dag runs while it is waiting. It should only queue up one at most. There are many kinds of pipelines that use this pattern. But the current behavior (verified in 2.2.0 as well) is, even when catchup=False, it will queue up a bunch of dag runs (up to max_queued_runs_per_dag). And this should not happen. I encountered this today with a stuck dag (task had been stuck in running state for a day or so for some reason). 16 dag runs were queued up and waiting to go. But this should not be. Along these lines... @ephraimbuddy in your PR #18897 you explicitly mention the |
The PR captures both cases where catchup is False or true. It's annoying that it broke. For now limit it with |
OK nice! I figured but just wanted to mention in case. Thanks |
So i have been also been working on a fix for this and the MR #18897 does not fix the issue. It's very simple to recreate with a test dag: import logging
import time
from datetime import datetime
from airflow import DAG
from airflow.operators.python import PythonOperator
def some_task():
time.sleep(5)
logging.info("Slept for 5")
dag = DAG(
dag_id="aa_catchup_1",
schedule_interval='@daily',
catchup=True,
start_date=datetime(2021, 9, 9),
max_active_runs=1,
max_active_tasks=1,
)
with dag:
PythonOperator(
task_id="some_task",
python_callable=some_task
)
@jedcunningham or @ephraimbuddy would you kindly re-open this issue an i can provide a PR |
Since this is catchup True and max_active_runs 1, |
If you take a look at the instructions beneath the example dag in my comment this is only used to get the dag behind schedule. It should then be amended to |
That's true. |
Here is the fix #19130 |
When a dag has catchup=False we want to select the max value from the earliest date and the end of the last execution period. Thus if a dag is behind schedule it will select the latest execution date.
Apache Airflow version
2.1.4 (latest released)
Operating System
Amazon Linux 2
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
What happened
Say, for example, you have a DAG that has a sensor. This DAG is set to run every minute, with max_active_runs=1, and catchup=False.
This sensor may pass 1 or more times per day.
Previously, when this sensor is satisfied once per day, there is 1 DAG run for that given day. When the sensor is satisfied twice per day, there are 2 DAG runs for that given day.
With the new queued dag run state, new dag runs will be queued for each minute (up to AIRFLOW__CORE__MAX_QUEUED_RUNS_PER_DAG), this seems to be against the spirit of catchup=False.
This means that if a dag run is waiting on a sensor for longer than the schedule_interval, it will still in effect 'catchup' due to the queued dag runs.
What you expected to happen
No response
How to reproduce
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: