Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: next schedule shouldn't be earlier around DST change #30088

Closed
1 of 2 tasks
joshzana opened this issue Mar 14, 2023 · 1 comment
Closed
1 of 2 tasks

AssertionError: next schedule shouldn't be earlier around DST change #30088

joshzana opened this issue Mar 14, 2023 · 1 comment
Labels
area:core duplicate Issue that is duplicated kind:bug This is a clearly a bug

Comments

@joshzana
Copy link

Apache Airflow version

2.5.1

What happened

We have a DAG with a recurring cron set to run every 4 hours, set up like this:

CronDataIntervalTimetable("20 */4 * * *", timezone=pst)

For about 40 minutes the other night, the scheduler crashed with this error:

AssertionError: next schedule shouldn't be earlier
  File "airflow/models/dag.py", line 916, in next_dagrun_info
    info = self.timetable.next_dagrun_info(
  File "airflow/timetables/interval.py", line 87, in next_dagrun_info
    earliest = self._skip_to_latest(earliest)
  File "airflow/timetables/interval.py", line 154, in _skip_to_latest
    raise AssertionError("next schedule shouldn't be earlier")

The time of the crash happened to be on 3/12/2023, which also happens to be when the daylight savings time comes into effect here in the US.

What you think should happen instead

The scheduler shouldn't crash around this time, no matter the cron timetable

How to reproduce

I was able to reconstruct the call stack from our monitoring and reproduced in a simple unit test:

from unittest import TestCase
from unittest.mock import patch
from airflow.timetables.base import TimeRestriction, DataInterval
from airflow.timetables.interval import CronDataIntervalTimetable, DateTime
from pendulum import timezone


class AirflowExceptionTest(TestCase):
    def test_airflow_schedule(self):
        pst = timezone("US/Pacific")
        utc = timezone("UTC")
        mock_date = DateTime(2023, 3, 12, 11, 19, 1, 728990, tzinfo=utc)
        timetable = CronDataIntervalTimetable("20 */4 * * *", timezone=pst)
        last_automated_data_interval = DataInterval(
            start=DateTime(2023, 3, 12, 4, 20, 0, tzinfo=utc),
            end=DateTime(2023, 3, 12, 8, 20, 0, tzinfo=utc),
        )
        restriction = TimeRestriction(
            earliest=DateTime(2021, 12, 1, 8, 0, 0, tzinfo=utc),
            latest=None,
            catchup=False,
        )
        with patch.object(DateTime, "utcnow", return_value=mock_date):
            timetable.next_dagrun_info(
                last_automated_data_interval=last_automated_data_interval,
                restriction=restriction,
            )

Note that I am mock out the current time with the time of one our crashes, and all the other variables came from the call stack that the scheduler used. I don't have control of any of those other inputs.

Running this with pytest yields the same assert

self = <airflow.timetables.interval.CronDataIntervalTimetable object at 0x7f8adb26d400>
earliest = DateTime(2021, 12, 1, 8, 0, 0, tzinfo=Timezone('UTC'))

    def _skip_to_latest(self, earliest: DateTime | None) -> DateTime:
        """Bound the earliest time a run can be scheduled.
    
        The logic is that we move start_date up until one period before, so the
        current time is AFTER the period end, and the job can be created...
    
        This is slightly different from the delta version at terminal values.
        If the next schedule should start *right now*, we want the data interval
        that start now, not the one that ends now.
        """
        current_time = DateTime.utcnow()
        last_start = self._get_prev(current_time)
        next_start = self._get_next(last_start)
        if next_start == current_time:  # Current time is on interval boundary.
            new_start = last_start
        elif next_start > current_time:  # Current time is between boundaries.
            new_start = self._get_prev(last_start)
        else:
>           raise AssertionError("next schedule shouldn't be earlier")
E           AssertionError: next schedule shouldn't be earlier

/home/airflow/.local/lib/python3.9/site-packages/airflow/timetables/interval.py:154: AssertionError

Operating System

Ubuntu 20.04.3 LTS (Focal Fossa)

Versions of Apache Airflow Providers

n/a

Deployment

Docker-Compose

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@eladkal
Copy link
Contributor

eladkal commented Mar 16, 2023

I'm going to close this one as a fix to #7999 will also cover this issue as suggested in #30083

@eladkal eladkal closed this as not planned Won't fix, can't repro, duplicate, stale Mar 16, 2023
@eladkal eladkal added duplicate Issue that is duplicated and removed needs-triage label for new issues that we didn't triage yet labels Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core duplicate Issue that is duplicated kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants