-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-2027] Only trigger sleep in scheduler after all files have parsed #2986
[AIRFLOW-2027] Only trigger sleep in scheduler after all files have parsed #2986
Conversation
1d27d2c
to
f8edf4e
Compare
Please rebase against master to trigger a new build. The CI should be fixed. |
@aoen can you trigger the build again please? |
ping @aoen |
f8edf4e
to
2685227
Compare
Sorry for the delay, I've been working on another PR to parallelize Celery syncing on the Airflow that should have a higher impact on scheduler performance. Re-pushed. |
2685227
to
0ea7a85
Compare
Can you rebase onto master? We had some issues with the |
0ea7a85
to
24411e9
Compare
Codecov Report
@@ Coverage Diff @@
## master #2986 +/- ##
==========================================
+ Coverage 73.07% 73.08% +<.01%
==========================================
Files 180 180
Lines 12578 12586 +8
==========================================
+ Hits 9191 9198 +7
- Misses 3387 3388 +1
Continue to review full report at Codecov.
|
@Fokko done, and thanks for looking into the apache-beam issues! |
There seems to be a couple of problems that cause the sleep to not trigger and Scheduler heartbeating/logs to be spammed:
To unblock the release I'm reverting this PR for now. It should be re-added with tests/mocking. @bolkedebruin @yrqls21 |
@aoen unblock release? I don't follow? |
This bug can cause severe issues if the particular edge cases are hit, I guess no one reported it in the current RC but it's there. |
K... but I assume "unblocking" means 1.10.1 not 1.10.0 as the vote has already passed? |
Ah I missed that, would be for 1.10.1 then, yeah. |
@aoen I'm not so sure about monotonically increasing, what I can see is that if we have parse files faster than we set file path, then if file was removed/have import error and not updating the last_finish_time in the 2nd parsing round, we'll still use the last_finish_time from the 1st round and get some non-desirable duration value. |
I hit this in production during testing of 1.10 but managed to avoid it by tweaking the sleep configurations. It was triggered by having only Solved by setting
|
…arsed Closes apache#2986 from aoen/ddavydov--open_source_disab le_unecessary_sleep_in_scheduler_loop
JIRA
Description
The scheduler loop sleeps for 1 second every loop unnecessarily. Remove this sleep to slightly speed up scheduling, and instead do it once all files have been parsed. It can add up since it runs to every scheduler loop which runs # of dags to parse/scheduler parallelism times. We have seen it increase scheduling times by 10% in our production environment.
Also remove the unnecessary increased file processing interval in tests which slows them down.
Tests
Can't test this very well since it is a sleep interval, might mock out the sleep and test the calculated of sleep_length though. It has been running on the Airbnb Airflow cluster for many weeks now.
Commits
My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
Passes
git diff upstream/master -u -- "*.py" | flake8 --diff
@saguziel