-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace replicator filters by selector objects #11192
Comments
We are currently not able to start a fresh new database replication from global workqueue to local workqueue_inbox, as the HTTP request times out at 5min. The database itself does not have much deleted documents:
so I would not expect any problems to go through the list of deleted documents within the 5min timeout. Checking this on the node running couchdb workqueue database, I do see a very slow response for the relevant HTTP call, e.g.:
so we can likely discard any effect from the reverse proxy/APS. Having a brief chat with CouchDB developers in Slack, they acknowledge this to be something to be improved in CouchDB, such that we can save checkpoints as it iterates through deleted documents as well. Their suggestion is to actually implement the selector model, which is evaluated in erlang, hence it is expected to be much faster. |
It turns out the json file didn't make it to the WMAgent PyPi package, and this is the exception we get when we start a WMAgent
and here are all the json files that we can find in the installation area of the agent (with pip):
For now, Dario and I grabbed https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/AgentStatusWatcher/replication_selector.json and placed it directly into vocms0192, such that we can move forward with this validation. I am reopening this issue to have this resolved and available in the next patch release (likely 2.3.7.1). |
As reported in our WM Team mattermost channel, the ParentQueueUrl filter imposes a problem when we have multiple instances of global workqueue running under different domains, but talking to the same database. For instance, the integration CouchDB database is shared between cmsweb-testbed and cmsweb-preprod, depending on which global workqueue element creates those elements, it could have either:
or
If in our filter we say we want cmswe-testbed parent queue, that means any preprod WQE would not be replicated to the agent, even though the agent could start "negotiating" those elements. Having said that, I have updated the initial description here and I think we should loose the workqueue filter replication to stop passing this |
And now I understand why we ended up with While trying to make the replication work, I decided to change the agent configuration from cmsweb to cmsweb-prod:
which is then used to negotiate WQEs between local and global, setting the With that said, it looks like my previous comment is wrong and we would not fail to acquire workflow when there are multiple domains/instances talking to the same database. Nonetheless, I still think it is better to stop using ParentQueueUrl, as the source/target database is already doing the job. In addition, this makes the replication filter cheaper. |
Impact of the new feature
CouchDB and WMAgent
Is your feature request related to a problem? Please describe.
Not a problem, but according to CouchDB experts, the selector objects deliver a better performance than javascript filter functions. Further information about selector objects can be found at:
https://docs.couchdb.org/en/latest/replication/replicator.html#selector-objects
Describe the solution you'd like
Evaluate the 3 standard WMAgent replications and see if a filter function can be replaced by an equivalent selector object.
UPDATE: note that the
ParentQueueUrl
filter can likely be removed, given that the source or target database will always be pointing to one specific environment.Describe alternatives you've considered
Not do anything.
Additional context
None
The text was updated successfully, but these errors were encountered: