Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race in run loop usage #486

Merged
merged 1 commit into from
Mar 6, 2019

Conversation

lebdron
Copy link
Contributor

@lebdron lebdron commented Feb 18, 2019

Thread sanitizer detects a data race in this run loop usage example #154
https://gist.github.com/lebdron/ee3e3e74250c27b2595a732bae5a9a22
Since queue can be accessed both by current thread and some other thread which calls schedule(), queue handles have to be protected by a lock.

Signed-off-by: Andrei Lebedev <lebdron@gmail.com>
@iam
Copy link
Contributor

iam commented Feb 20, 2019

This fixes empty() but I think peek() might still be racy?

If another thread mutates the queue concurrently then perhaps the reference could become invalid?

@kirkshoop
Copy link
Member

Interesting. Here was my logic for these.

I did not care if empty() or peek() are accurate, they are read-only operations and are allowed to be wrong without breaking code as long as callers are from the same thread that calls dispatch() and there is only one thread that calls dispatch(). I do see a use in having some validation of this requirement. I was trying to avoid taking the lock on every iteration of a main loop.

As long as empty() eventually resolves to true after an item is enqueued, then dispatch() will eventually be called. dispatch() uses the lock to protect all access so that it always has a coherent view of the queue contents. an atomic size member could be used to make empty() use an atomic load rather than take a lock. I wanted to avoid the overhead of an atomic load on every iteration of the main loop.

@iam
Copy link
Contributor

iam commented Feb 27, 2019

EDIT: see the next comment for a possible solution, this is just me thinking out loud

Is there a larger context for this, I think I may be missing some detail to better formulate a reply?

Runloop implementations I've seen often have a layer of mutexes/condition variables which are necessary to properly send/receive messages across threads. High throughput runloops could use lockfree queues [but tuning this is highly cpu-count and architecture specific].

Judging on the QT example here (#438), it looks like the QT main loop would occasionally call into the RX runloop through a polling mechanism?

For sparsely scheduled actions, this seems inherently less efficient than an event-based mechanism to feed the run loop?


Also re: thread safety, my under-informed gut reaction seems like perhaps the API could be split? A set of functions that are callable from the loop-owning-thread [perhaps while always under a lock, or while always out of a lock] and another set of functions callable from any thread. This is only a guess without having full context of the above.

Usually the only way to "avoid" all atomics would be to invoke user-callbacks while a lock is held, otherwise there would be races from data manipulation coming from other threads.

@iam
Copy link
Contributor

iam commented Feb 27, 2019

After looking at the API again, I'm not sure if it needs to have peek or empty at all?

I believe it is already assumed that the TimePoint is globally ordered across all threads (for schedule, queue order, etc, so we can take advantage of this assumption.

I'm not exactly sure yet how dispatch works (I filed #489 to clarify) but it seems to be one of the following:

  • dispatch pops one item from the queue and processes it (with when < now)
  • dispatch pops all items from the queue and processes all of them (with when < now)

The first use case would look like

   while (queue has more items) {
     rl.dispatch();
   }

The second use case simply looks like:

  rl.dispatch();

Either way, I think dispatch could be changed to return the time of the next item to be scheduled and this would be equally correct?

    std::optional<TimePoint> next = rl.dispatch();
    if (next && *next > rl.now()) {
      user_timer_schedule_at(*next);
    }

From what I understood so far it would obviate the need to call empty(), peek() etc ever again from user code. Of course the functions could still remain, but thread-safe, and be discouraged from using in non-debugging code.

@kirkshoop
Copy link
Member

I like your proposed change to dispatch(). The one change to your proposal I would ask for, is to use the rxcpp maybe instead of std::optional.

I would like to keep the current changes in this PR for now and add the proposed changes to dispatch in a different PR.

Thanks!

@kirkshoop kirkshoop merged commit 7c79c18 into ReactiveX:master Mar 6, 2019
@iam
Copy link
Contributor

iam commented Mar 6, 2019

Perhaps this might need to be reverted? There seems to be some potential issues:

  • this patch is introducing deadlocks when calling peek or empty from notify_earlier_wakeup (see Potential deadlock in rx-runloop #487)
  • peek is not actually thread-safe because the reference-into-queue could be deleted by another thread popping from the queue

@kirkshoop
Copy link
Member

@iam Yes. This is a breaking change. It will create deadlocks if a notify calls empty or peek. the reference returned from peek is indeed unsafe. these methods were only intended to be called from the same thread that calls dispatch. only dispatch would invalidate the head of the queue.

@lebdron lebdron deleted the fix/run-loop-data-race branch March 14, 2019 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants