Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect DAG scheduling after DST #7999

Closed
gor-obr opened this issue Mar 30, 2020 · 11 comments · Fixed by #35887
Closed

Incorrect DAG scheduling after DST #7999

gor-obr opened this issue Mar 30, 2020 · 11 comments · Fixed by #35887
Labels
AIP-39 Timetables area:Scheduler including HA (high availability) scheduler kind:bug This is a clearly a bug

Comments

@gor-obr
Copy link

gor-obr commented Mar 30, 2020

Apache Airflow version: 1.10.9, 2.0.0dev

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
    What happened:

If DAG's cron contains "non trivial" hours section, Scheduler will not schedule DAG correctly immediately after DST switch. In this case, "non-trivial" means either having an interval (0 7-8 * * *) or multiple values (0 7,9 * * *).

If DST occurs in morning (2-3 AM of March 8th), following following executions will occur:

0 7-8 * * *
"2020-03-07T07:00:00-08:00"
"2020-03-07T08:00:00-08:00" # DST switch after this run
"2020-03-08T08:00:00-07:00" # 8 AM instead of 7 AM

0 7,9 * * *

"2020-03-07T07:00:00-08:00"
"2020-03-07T09:00:00-08:00"
"2020-03-08T07:00:00-07:00" # DST switch after this run
"2020-03-08T08:00:00-07:00" # 8 AM instead of 7 AM

Cause for this is the method is_fixed_time_schedule in dag.py, which tests whether cron is "fixed" (i.e. "execute exactly at this time") or "relative" (i.e. "execute on each n hours"). Method relies on a quite crude test, it calculates two subsequent times and checks whether both hours and minutes of them are the same:

        now = datetime.now()
        cron = croniter(self._schedule_interval, now)

        start = cron.get_next(datetime)
        cron_next = cron.get_next(datetime)

        if cron_next.minute == start.minute and cron_next.hour == start.hour:
            return True

        return False

This is not satisfied in case of above examples (it is executed at two different hours during each day, so hours in subsequent executions are never the same).

Based on this, method following_schedule in dag.py thinks that this DAG should be executed "on each n hours", and explicitly works around DST. It calculates the amount of time which needs to pass until next run, and adds that amount of time to the previous run, thus ignoring DST.

            # We assume that DST transitions happen on the minute/hour
            if not self.is_fixed_time_schedule():
                # relative offset (eg. every 5 minutes)
                delta = cron.get_next(datetime) - naive
                following = dttm.in_timezone(self.timezone).add_timedelta(delta)
            else:
                # absolute (e.g. 3 AM)
                naive = cron.get_next(datetime)
                tz = pendulum.timezone(self.timezone.name)
                following = timezone.make_aware(naive, tz)

Note that only the first execution (or first few executions) will be affected. After them, the calculation will stabilize and will work correctly going forward.

This is describes the same issue as https://issues.apache.org/jira/browse/AIRFLOW-7039

What you expected to happen:

How to reproduce it:

A failing unit test which demonstrates the issue is pushed here:

gor-obr@0d021a1

Anything else we need to know:

The immediate issue can be fixed in several ways:

  • Alter the logic of is_fixed_time_schedule so that it parses the cron string and checks whether it contains / in minute or hour sections.

  • Extend the Dag class so that its constructor accepts an optional bool argument is_fixed_time_schedule. If this argument is provided, it will be used instead of the logic in is_fixed_time_schedule method. If argument is not provided, the behavior will continue to be as is now. This way user can work around the issue, while existing DAGs still working as they used to.

However, both of these fixes are a bit hacky in their nature and I'd say that they don't address the underlying issue.

Problem is in the fact that cron actually doesn't recognize "relative" schedules. In cron, all schedules are absolute, and say 0 */6 * * * is just a syntactic sugar for 0 0,6,12,18 * * *. In cron these two schedules are equivalent, and in Airflow this syntactic sugar is overloaded with semantic meaning (first is assumed to be a "relative" schedule and second is assumed to be "fixed" schedule).

This is highlighted by the fact that Airflow uses croniter to calculate schedule, but then (in higher level of abstraction) works around this library and fiddles with calculated results (there is even a vague comment: # we don't want to rely on the transitions created by croniter as they are not always correct).

Maybe the cleanest solution would be to accept that all cron schedules are in their nature absolute, and to extract the feature of relative scheduling from the cron. Leave cron to work with absolute schedules as it was designed to work, and introduce different syntax to allow for relative scheduling feature.

The obvious downside of this approach is that it would change the behavior of already existing DAGs. Some good news is that in most cases absolute and relative scheduling actually have the same result (they differ mostly in handling the DST).

@gor-obr gor-obr added the kind:bug This is a clearly a bug label Mar 30, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 30, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@mik-laj
Copy link
Member

mik-laj commented Mar 30, 2020

Do you have any proposition for solutions or workarounds for this problem??

@mik-laj mik-laj added area:Scheduler including HA (high availability) scheduler and removed requires triage labels Mar 30, 2020
gor-obr pushed a commit to gor-obr/airflow that referenced this issue Mar 30, 2020
@gor-obr
Copy link
Author

gor-obr commented Mar 30, 2020

Do you have any proposition for solutions or workarounds for this problem??

I added a broader context with some possible solutions in the description of the bug. As I don't have any experience with the project, probably someone from the community is better suited to evaluate the alternatives.

@bxnxiong
Copy link

Hi we have run into this issue during last DST transition on 2020-11-01 as well, so will be nice if this can be addressed.
I'm wondering one of the solutions can be: if the schedule_interval is provided as a cron, then Airflow should respect that and not use relative delta at all? if schedule_interval is provided with a datetime.timedelta or relative timedelta object, only then the DAG uses a relative offset. What do you think?

@gor-obr
Copy link
Author

gor-obr commented Jan 22, 2021

Unfortunately as far as I'm aware, there is no systematic solution for this issue.

The best workaround I can suggest would be to extend Dag class and override the is_fixed_time_schedule to simply return true. This would solve the issue as the root cause is is_fixed_time_schedule incorrectly classifying a dag as "not fixed" time schedule. After this when defining dags simply instantiate the subclass instead of Dag. All dags that use overridden class should behave correctly.

@bxnxiong
Copy link

👍 yes that's how I'm patching it, guess I'll just use that for now

@eladkal
Copy link
Contributor

eladkal commented Dec 24, 2021

I think this issue is covered by AIP 39 which allows you to set your own scheduling logic.

@uranusjr
Copy link
Member

Yes. Note that the AIP-39 implementation by itself does not solve this problem—cron expression by design does not work well with DST, and Airflow is alreay stretching the functionality the best it can. You need to implement a custom timetable using the API exposed by AIP-39 if you need more sophisticated handling of timezone-specific logic.

@eladkal
Copy link
Contributor

eladkal commented Dec 24, 2021

Closing as no action item on this issue.

@timkpaine
Copy link
Contributor

timkpaine commented Mar 13, 2023

@gor-obr / @bxnxiong I believe #30083 has a more durable fix. @eladkal / @uranusjr I think this issue actually should be reopened as even though cron schedules are brittle, 0 9 * * * and 0 9,10 * * * work differently right now (one is fixed and accounts for DST, the other is interval and does not) and I think this is technically a bug.

@timkpaine
Copy link
Contributor

Changes now in #35887

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-39 Timetables area:Scheduler including HA (high availability) scheduler kind:bug This is a clearly a bug
Projects
None yet
7 participants