Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition under load #76

Closed
mrzasa opened this issue Jan 7, 2019 · 5 comments
Closed

Race condition under load #76

mrzasa opened this issue Jan 7, 2019 · 5 comments
Labels

Comments

@mrzasa
Copy link

mrzasa commented Jan 7, 2019

We noticed that under load, mongoid locker allows more than one thread to enter a critical section. The issue can be reproduced using this repository that contains full description of the scenario.

During the investigation we removed mongoid-locker and mongoid and reproduced the issues. It means that the bug is not in the mongoid-locker itself. It breaks the mongoid-locker though. It's reported in mongo ruby driver tracker, I'm writing to make you aware of the issue.

I also provided a simpler lock implementation based on expiring and unique indices. It passes our test and cause no issues on production.

@dblock dblock added the bug? label Jan 7, 2019
@dblock
Copy link
Collaborator

dblock commented Jan 7, 2019

I also provided a simpler lock implementation based on expiring and unique indices. It passes our test and cause no issues on production.

I don't see this part. Is that something we can inherit here?

@mrzasa
Copy link
Author

mrzasa commented Jan 7, 2019

It is in the Lock class.

@dks17 dks17 mentioned this issue Jan 24, 2019
@dks17 dks17 mentioned this issue Mar 22, 2019
@dks17
Copy link
Collaborator

dks17 commented Apr 3, 2019

Nice investigation. Thanks that shared this.

Checked current Mongoid-Locker implementation against your repository and this fork.

THREAD_COUNT=100 PROCESS_COUNT=2 bundle exec ruby test/mongoid_locker_internal_test.rb
THREAD_COUNT=100 PROCESS_COUNT=2 bundle exec ruby test/mongoid_locker_external_test.rb

This configuration was used:

module Mongoid
  module Locker
    def self.custom_backoff(_doc, _opts)
      rand
    end
  end
end

Mongoid::Locker.configure do |config|
  config.backoff_algorithm = :custom_backoff
end

# by default retries is equal to INFINITY
document.with_lock(reload: false) do
  yield
end

Sometimes the tests passed and sometimes not (each time it is very close to 100%).
I am not sure that current implementation of Mongoid-Locker is 100% reliable, but with current implementation is possible achieve better results than with implementation based on time checking.

Perhaps, the test failures are dependent on some MRI side effect (not sure about this).

Looked at your Lock class implementation and could understand how it works.

class Lock
  include Mongoid::Document
  CannotObtainLockError = Class.new(StandardError)

  store_in collection: 'locks'
  field :_id, type: String, default: ->{ SecureRandom.uuid }, overwrite: true
  field :ts, type: Time

  def self.with_lock(stream)
    begin
      lock = self.create(_id: stream, ts: Time.now.utc)
    rescue ::Mongo::Error::OperationFailure
      raise CannotObtainLockError
    end
    yield if block_given?
  ensure
    lock.delete if lock.present?
  end

  index({ ts: 1 }, { expire_after_seconds: 1 })
end

You create lock object and override _id with stream value (it is BSON:ObjectId by a document from another collection):

lock = self.create(_id: stream, ts: Time.now.utc)

This is equivalent to default _id declaration provided by include Mongoid::Document. We can comment this line and it impacts nothing.

field :_id, type: String, default: ->{ SecureRandom.uuid }, overwrite: true
Lock.create(_id: BSON:ObjectId.new, ts: Time.now.utc)
#<Runners::MongoidLocks::Lock _id: 5ca4c6e2bb23be22e68a42ad, ts: 2019-04-03 14:44:50 UTC>

Lock.create(ts: Time.now.utc)
=> #<Runners::MongoidLocks::Lock _id: 183feff3-21bc-49bf-846d-5d7d25116c8b, ts: 2019-04-03 14:46:01 UTC>

Thus, SecureRandom.uuid is never called and we using usual BSON::OjbectId (which is unique enough).

If we replace a document creation inside self.with_lock method like here:

# lock = self.create(_id: stream, ts: Time.now.utc)
lock = self.create(ts: Time.now.utc)

The tests will be failed:

THREAD_COUNT=100 PROCESS_COUNT=2 bundle exec ruby test/mongoid_expirable_test.rb

I can understand something wrong, but your approach (lock based on unique index) does not look cleared for me. What is the approach?

@SimonVillage
Copy link

@dblock
Copy link
Collaborator

dblock commented Oct 23, 2019

Yep with 2.0.

@dblock dblock closed this as completed Oct 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants