Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watchdog runs too many times / repeatedly when many files are changed #542

Closed
MareoRaft opened this issue Mar 7, 2019 · 8 comments · Fixed by #940
Closed

watchdog runs too many times / repeatedly when many files are changed #542

MareoRaft opened this issue Mar 7, 2019 · 8 comments · Fixed by #940

Comments

@MareoRaft
Copy link

I am using watchdog to watch a git repository and rerun a build process every time the repo changes.

I have observed that performing a git pull, for example, can cause watchdog to run repeatedly. I believe that the git pull is causing a very large number of files to change, and hence watchdog runs many times because watchdog runs once per file modification.

For me, this is not the desired behavior. If 5 files are modified at the same time, for example, I only need watchdog to run the build process once. I am hoping we can add a configuration option to watchdog to allow somebody to opt-in to the following proposed behavior "lazy run":

Proposed behavior "lazy run":

  • Watchdog maintains a queue of file change events. This happens concurrently.
  1. Check the queue.
  2. If the queue is nonempty, empty the queue and run the process.
  3. (When the process is finished,) return to step 1.

Example of "lazy run" in action:

Suppose that the process to run takes a long time, and 5 files are modified in rapid succession.

  1. First file modified.
  2. modification 1 goes in queue
  3. watchdog empties queue and begins running the process
  4. Second file modified
  5. Third file modified
  6. Fourth file modified
  7. Fifth file modified
  8. modifications 2 through 5 are in the queue.
  9. The process completes and watchdog checks the queue again
  10. Modifications are discarded from the queue and watchdog begins running the process again
  11. The process completes again.

As you can see, in this example, even though 5 files were modified, the process only runs 2 times.

@libcthorne
Copy link

I think the --drop flag can achieve what you want: https://github.com/gorakhargosh/watchdog/blob/v0.9.0/src/watchdog/watchmedo.py#L415

@oprypin
Copy link
Contributor

oprypin commented May 28, 2021

I confirm this finding, although I'm not sure which exact way to use Watchdog you're referring to.

Assuming watchmedo shell-command:

$ watchmedo shell-command --command='echo start; sleep 0.1; echo finish' &
$ touch {1..3}.md
start
start
start
start
start
start
finish
finish
finish
finish
finish
finish

It runs the command not just 3 times (once per file), but 6! It's easy to guess that this regressed when Linux emitter was improved to emit both FileModified and FileClosed events.
And indeed the commands are run concurrently!

Then let's try --drop.

$ watchmedo shell-command --command='echo start; sleep 0.1; echo finish' --drop &
$ touch {1..3}.md
start
finish

The events do get disintegrated.
But it's not really what was asked, because presumably it should've run a 2nd time to catch the followup events.

The --wait flag actually seems promising, or at least I would expect it to work similarly to what was asked. But it's not...

$ watchmedo shell-command --command='echo start; sleep 0.1; echo finish' --wait &
$ touch {1..3}.md
start
finish
start
finish
start
finish
start
finish
start
finish
start
finish

So indeed no solution currently.

@oprypin
Copy link
Contributor

oprypin commented May 28, 2021

But let me propose an even better behavior (and a complete example with it):

import time
import threading

from watchdog.events import FileSystemEventHandler
from watchdog.observers import Observer


class RebuildLoop(FileSystemEventHandler):
    def __init__(self, func, debounce=0.1):
        self.func = func
        self.debounce = debounce
        self.want_rebuild = False
        self.condition = threading.Condition()

    def on_any_event(self, event):
        with self.condition:
            self.want_rebuild = True
            self.condition.notify_all()

    def run(self):
        while True:
            with self.condition:
                self.condition.wait_for(lambda: self.want_rebuild)
                self.want_rebuild = False
                print("Detected file changes")
                while self.condition.wait(timeout=self.debounce):
                    print("Waiting for file changes to stop happening")
            self.func()


def do_rebuild():
    print("start")
    time.sleep(1)
    print("finish")


rebuild_loop = RebuildLoop(do_rebuild)

observer = Observer()
observer.schedule(rebuild_loop, ".")
observer.start()
try:
    rebuild_loop.run()
finally:
    observer.stop()
    observer.join()
  • Basically it collapses any chain of events that has less than 0.1 sec in between each event, and then does 1 rebuild.
    • A necessary consequence of this is that a build will start at the earliest 0.1 sec after an event is detected.
    • And the huge upside, of course, is that it very nicely waits for all updates to stop happening and only then does 1 rebuild.
  • Moreover, it waits for the rebuild to finish before doing anything.
    • But it is still aware of events that happened during the build, and initiates the next build ASAP.

Consider a shell session with two bursts of events:

$ python this_example.py &
$ for i in {1..3}; do (set -x; touch $i.md); sleep 0.05; done; \
  sleep 0.5; \
  for i in {1..3}; do (set -x; touch $i.md); sleep 0.05; done; \
  fg
+ touch 1.md
Detected file changes
+ touch 2.md
Waiting for file changes to stop happening
+ touch 3.md
Waiting for file changes to stop happening
start
+ touch 1.md
+ touch 2.md
+ touch 3.md
finish
Detected file changes
start
finish

I already implemented this into a project in a more advanced shape here: mkdocs/mkdocs@a444c43#diff-1d9feb257a726960a5811814b912c90718f0935d46a7504c4d47268ccd67bb50R98

@libcthorne
Copy link

I confirm this finding, although I'm not sure which exact way to use Watchdog you're referring to.

For my own case of avoiding the git checkout issue I used a slightly hacky approach of combining --drop with an intermediate file, will share here in case it's useful:

# Celery with auto-restart using watchdog.
# An intermediate buffer directory (/tmp/celery_restart_trigger) is used for
# auto-restarting to avoid git branch changing triggering multiple restarts
# due to multiple files being changed at once. The "sleep 1" combined with the
# --drop parameter enforces an interval between checking if the directory has changed.

mkdir -p /tmp/celery_restart_trigger

watchmedo shell-command \
          --drop \
          --pattern=*.py \
          --recursive /scope/foodnet \
          -c 'touch /tmp/celery_restart_trigger && sleep 1' &

watchmedo auto-restart \
          --directory=/tmp/celery_restart_trigger -- \
          celery -A foodnet -l debug worker --concurrency=5

@BoboTiG
Copy link
Collaborator

BoboTiG commented Jan 14, 2023

@oprypin, @MareoRaft, @libcthorne: would you mind trying the patch from #940 to see if it fits your use cases 🙏?

@nicbou
Copy link

nicbou commented Jan 18, 2023

@BoboTiG, what a timely patch! I just started using watchdog and face the same issue as the people above.

This seems like a patch to watchmedo, but not to the observers themselves, correct? It does not fit my use case, unfortunately. My static site generator has a --watch argument that uses watchdog. Saving multiple files at once triggers a long chain of rebuilds that keep my computer busy for a while.

My use case would be to wait for changes to finish (as above), to prevent multiple successive builds.

@oprypin
Copy link
Contributor

oprypin commented Jan 18, 2023

Ah, yes, that patch is only about watchmedo. Although you could just use the EventDebouncer from it for yourself as well.

Anyway as I had commented, I already made my own debouncer that's better integrated into my server: the server also knows to pause serving any files while the rebuild is ongoing.

@nicbou
Copy link

nicbou commented Jan 19, 2023

You're right. After spending a few hours days on this, I can't quite grok it. I understand what the code does, but not where to hook it up when using a simple Observer().

I'll try to extend FileSystemEventHandler and see where it takes me. The goal is to group multiple file change events to avoid multiple lengthy rebuilds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants