watchdog runs too many times / repeatedly when many files are changed #542

MareoRaft · 2019-03-07T19:43:14Z

I am using watchdog to watch a git repository and rerun a build process every time the repo changes.

I have observed that performing a git pull, for example, can cause watchdog to run repeatedly. I believe that the git pull is causing a very large number of files to change, and hence watchdog runs many times because watchdog runs once per file modification.

For me, this is not the desired behavior. If 5 files are modified at the same time, for example, I only need watchdog to run the build process once. I am hoping we can add a configuration option to watchdog to allow somebody to opt-in to the following proposed behavior "lazy run":

Proposed behavior "lazy run":

Watchdog maintains a queue of file change events. This happens concurrently.

Check the queue.
If the queue is nonempty, empty the queue and run the process.
(When the process is finished,) return to step 1.

Example of "lazy run" in action:

Suppose that the process to run takes a long time, and 5 files are modified in rapid succession.

First file modified.
modification 1 goes in queue
watchdog empties queue and begins running the process
Second file modified
Third file modified
Fourth file modified
Fifth file modified
modifications 2 through 5 are in the queue.
The process completes and watchdog checks the queue again
Modifications are discarded from the queue and watchdog begins running the process again
The process completes again.

As you can see, in this example, even though 5 files were modified, the process only runs 2 times.

libcthorne · 2019-10-02T16:11:23Z

I think the --drop flag can achieve what you want: https://github.com/gorakhargosh/watchdog/blob/v0.9.0/src/watchdog/watchmedo.py#L415

oprypin · 2021-05-28T13:35:54Z

I confirm this finding, although I'm not sure which exact way to use Watchdog you're referring to.

Assuming watchmedo shell-command:

$ watchmedo shell-command --command='echo start; sleep 0.1; echo finish' &
$ touch {1..3}.md
start
start
start
start
start
start
finish
finish
finish
finish
finish
finish

It runs the command not just 3 times (once per file), but 6! It's easy to guess that this regressed when Linux emitter was improved to emit both FileModified and FileClosed events.
And indeed the commands are run concurrently!

Then let's try --drop.

$ watchmedo shell-command --command='echo start; sleep 0.1; echo finish' --drop &
$ touch {1..3}.md
start
finish

The events do get disintegrated.
But it's not really what was asked, because presumably it should've run a 2nd time to catch the followup events.

The --wait flag actually seems promising, or at least I would expect it to work similarly to what was asked. But it's not...

$ watchmedo shell-command --command='echo start; sleep 0.1; echo finish' --wait &
$ touch {1..3}.md
start
finish
start
finish
start
finish
start
finish
start
finish
start
finish

So indeed no solution currently.

oprypin · 2021-05-28T14:06:23Z

But let me propose an even better behavior (and a complete example with it):

import time
import threading

from watchdog.events import FileSystemEventHandler
from watchdog.observers import Observer


class RebuildLoop(FileSystemEventHandler):
    def __init__(self, func, debounce=0.1):
        self.func = func
        self.debounce = debounce
        self.want_rebuild = False
        self.condition = threading.Condition()

    def on_any_event(self, event):
        with self.condition:
            self.want_rebuild = True
            self.condition.notify_all()

    def run(self):
        while True:
            with self.condition:
                self.condition.wait_for(lambda: self.want_rebuild)
                self.want_rebuild = False
                print("Detected file changes")
                while self.condition.wait(timeout=self.debounce):
                    print("Waiting for file changes to stop happening")
            self.func()


def do_rebuild():
    print("start")
    time.sleep(1)
    print("finish")


rebuild_loop = RebuildLoop(do_rebuild)

observer = Observer()
observer.schedule(rebuild_loop, ".")
observer.start()
try:
    rebuild_loop.run()
finally:
    observer.stop()
    observer.join()

Basically it collapses any chain of events that has less than 0.1 sec in between each event, and then does 1 rebuild.
- A necessary consequence of this is that a build will start at the earliest 0.1 sec after an event is detected.
- And the huge upside, of course, is that it very nicely waits for all updates to stop happening and only then does 1 rebuild.
Moreover, it waits for the rebuild to finish before doing anything.
- But it is still aware of events that happened during the build, and initiates the next build ASAP.

Consider a shell session with two bursts of events:

$ python this_example.py &
$ for i in {1..3}; do (set -x; touch $i.md); sleep 0.05; done; \
  sleep 0.5; \
  for i in {1..3}; do (set -x; touch $i.md); sleep 0.05; done; \
  fg

+ touch 1.md
Detected file changes
+ touch 2.md
Waiting for file changes to stop happening
+ touch 3.md
Waiting for file changes to stop happening
start
+ touch 1.md
+ touch 2.md
+ touch 3.md
finish
Detected file changes
start
finish

I already implemented this into a project in a more advanced shape here: mkdocs/mkdocs@a444c43#diff-1d9feb257a726960a5811814b912c90718f0935d46a7504c4d47268ccd67bb50R98

libcthorne · 2021-05-28T16:46:16Z

I confirm this finding, although I'm not sure which exact way to use Watchdog you're referring to.

For my own case of avoiding the git checkout issue I used a slightly hacky approach of combining --drop with an intermediate file, will share here in case it's useful:

# Celery with auto-restart using watchdog.
# An intermediate buffer directory (/tmp/celery_restart_trigger) is used for
# auto-restarting to avoid git branch changing triggering multiple restarts
# due to multiple files being changed at once. The "sleep 1" combined with the
# --drop parameter enforces an interval between checking if the directory has changed.

mkdir -p /tmp/celery_restart_trigger

watchmedo shell-command \
          --drop \
          --pattern=*.py \
          --recursive /scope/foodnet \
          -c 'touch /tmp/celery_restart_trigger && sleep 1' &

watchmedo auto-restart \
          --directory=/tmp/celery_restart_trigger -- \
          celery -A foodnet -l debug worker --concurrency=5

BoboTiG · 2023-01-14T09:06:56Z

@oprypin, @MareoRaft, @libcthorne: would you mind trying the patch from #940 to see if it fits your use cases 🙏?

nicbou · 2023-01-18T21:46:19Z

@BoboTiG, what a timely patch! I just started using watchdog and face the same issue as the people above.

This seems like a patch to watchmedo, but not to the observers themselves, correct? It does not fit my use case, unfortunately. My static site generator has a --watch argument that uses watchdog. Saving multiple files at once triggers a long chain of rebuilds that keep my computer busy for a while.

My use case would be to wait for changes to finish (as above), to prevent multiple successive builds.

oprypin · 2023-01-18T22:43:20Z

Ah, yes, that patch is only about watchmedo. Although you could just use the EventDebouncer from it for yourself as well.

Anyway as I had commented, I already made my own debouncer that's better integrated into my server: the server also knows to pause serving any files while the rebuild is ongoing.

nicbou · 2023-01-19T18:25:29Z

You're right. After spending a few ~~hours~~ days on this, I can't quite grok it. I understand what the code does, but not where to hook it up when using a simple Observer().

I'll try to extend FileSystemEventHandler and see where it takes me. The goal is to group multiple file change events to avoid multiple lengthy rebuilds.

oprypin mentioned this issue May 28, 2021

AutoRestartTrick restarts repeatedly if many files changed #610

Open

taleinat mentioned this issue Jan 13, 2023

Add optional event debouncing for auto-restart #940

Merged

BoboTiG closed this as completed in #940 Jan 29, 2023

taleinat mentioned this issue Jan 29, 2023

watchmedo auto-restart behavior change with 2.1.9 #922

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

watchdog runs too many times / repeatedly when many files are changed #542

watchdog runs too many times / repeatedly when many files are changed #542

MareoRaft commented Mar 7, 2019

libcthorne commented Oct 2, 2019

oprypin commented May 28, 2021

oprypin commented May 28, 2021 •

edited

Loading

libcthorne commented May 28, 2021

BoboTiG commented Jan 14, 2023

nicbou commented Jan 18, 2023

oprypin commented Jan 18, 2023 •

edited

Loading

nicbou commented Jan 19, 2023 •

edited

Loading

watchdog runs too many times / repeatedly when many files are changed #542

watchdog runs too many times / repeatedly when many files are changed #542

Comments

MareoRaft commented Mar 7, 2019

Proposed behavior "lazy run":

Example of "lazy run" in action:

libcthorne commented Oct 2, 2019

oprypin commented May 28, 2021

oprypin commented May 28, 2021 • edited Loading

libcthorne commented May 28, 2021

BoboTiG commented Jan 14, 2023

nicbou commented Jan 18, 2023

oprypin commented Jan 18, 2023 • edited Loading

nicbou commented Jan 19, 2023 • edited Loading

oprypin commented May 28, 2021 •

edited

Loading

oprypin commented Jan 18, 2023 •

edited

Loading

nicbou commented Jan 19, 2023 •

edited

Loading