-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace WorkQueue replication filter by selector function #12143
Conversation
Can one of the admins verify this patch? |
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
I think the current testbed rules are making the validation of this patch very hard (with wrong ingress rules), so I decided to apply it to the fresh new agent running in vocms0255 and, even though the replication managed to go through within the frontend 5min limit, I don't see any of the expected documents replicated to the agent, unfortunately! My suspicious is that the selector field
which of course won't ever match any documents. |
Jenkins results:
|
Jenkins results:
|
I finally managed to get it working and it has been deployed to vocms0255 for now (as condor pressure was really bad over the past few days). All commits have been squashed in only 1, but further work is still necessary for:
|
After verifying vocms0255, everything seems to be fine with this patch. So I went ahead and applied it to submit11 as well. |
I still have to converge on the wmstats replication, but any feedback on the already implemented feature for workqueue would be appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alan!
I trust your tests, so I will only comment on what I perceive the logic of this PR to be.
If I understood correctly, the new code achieves the same thing as before: the selector replaces src/couchapps/WorkQueue/filters/queueFilter.js
:
- excludes soft deleted documents (aka marked as deleted but not removed from the DB, yet?)
- selects the type "[...].WorkQueueElement"
- adds the properties ParentQueueURL and ChildQueueURL to the result
I have only a simple comment, but it is not a blocker, feel free to ignore it if you do not agree.
@@ -61,6 +61,9 @@ def __init__(self, config): | |||
self.topicAMQ = getattr(config.AgentStatusWatcher, "topicAMQ", None) | |||
self.hostPortAMQ = getattr(config.AgentStatusWatcher, "hostPortAMQ", [('cms-mb.cern.ch', 61313)]) | |||
|
|||
# WorkQueue replication | |||
self.workqueueSelector = getattr(config.AgentStatusWatcher, "workqueueSelector", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am inclined to think that we should not import the selector from the configuration but keep it in the code here. I do not expect the selector to be different across wmagent instances, and it was part of "code" and not "configuration" before, so i do not see any reason to move the selector logic away from where it will be used.
Am i missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the replication logic is not supposed to change and having it in the code is likely better. I have been debating this as well, so it's good to have a 2nd opinion. I am going to update this PR later with those changes. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't mix Erlang code with Python, keep Erlang code apart since if you merge it here you'll be forced to change Python code every time you change a selector. While if you keep it separate you only may need to change it once and restart the Python code. From my experience, e.g. DBS using embedded SQL statements, it is much harder to maintain the code with embedded snippets than otherwise. I prefer to keep such stuff separate, e.g. for that separate area exists which can keep JS, images, templates, erlang, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we can call it an Erlang code, in the end it is just a JSON with rules. However, the values of the rule are defined during WMAgent runtime (deployment), so separating it can potentially add more confusion than what it is right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I provided my feedback for consideration. Given that the content is in JSON format, it might be tempting to incorporate it directly into Python code. However, doing so could blur the distinction, as Python does not differentiate between JSON and Python dictionaries, treating both the same way. This could create ambiguity—whether it's a standalone JSON structure or a Python map or object.
If maintaining clear separation between models (i.e., distinguishing what belongs to Python versus Erlang or other systems) is a priority, then I believe such separation is beneficial. However, I'll leave the final decision up to you. If you will find putting JSON in place I suggest to put appropriate comment that it is JSON (not python map) and where it belongs (Erlang).
etc/WMAgentConfig.py
Outdated
# NOTE that we need to escape the 'type' field to avoid JSON dot notation interpretation | ||
config.AgentStatusWatcher.workqueueSelector = {'_deleted': {"$exists": False}, | ||
'type': "WMCore.WorkQueue.DataStructs.WorkQueueElement.WorkQueueElement", | ||
'WMCore\.WorkQueue\.DataStructs\.WorkQueueElement\.WorkQueueElement': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few things here:
- why do you need to escape dots, is it CouchDB things (for my education)?
- why not to use as a key just
WorkQueueElement
, why full Python path? - lines below, are you sure that config will always provides queueParams dictionary with your keys? Is it better to check that first before assigning to selector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to escape dots, is it CouchDB things (for my education)?
Because if I don't escape it, CouchDB will interpret it as a JSON dot notation and will try to expand it to {'WMCore': {'Workqueue': {'DataStructs':...}}}
, which will never match against the field with dots in the string. Further details in: https://docs.couchdb.org/en/stable/api/database/find.html#subfields
why not to use as a key just WorkQueueElement, why full Python path?
I would love to change it! Or not to even put it inside that key, but that would require some decent amount of refactoring in WMCore.
lines below, are you sure that config will always provides queueParams dictionary with your keys? Is it better to check that first before assigning to selector?
It is a WMAgent configuration, so yes, queueParams will always be there. If it is not, everything broke during the agent deployment/execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I am planning to delete this configuration and instead use a json file to load these selector filters in the component (provided in my 3rd and 4th commits in here). You might want to have a look and share your thoughts.
@@ -61,6 +61,9 @@ def __init__(self, config): | |||
self.topicAMQ = getattr(config.AgentStatusWatcher, "topicAMQ", None) | |||
self.hostPortAMQ = getattr(config.AgentStatusWatcher, "hostPortAMQ", [('cms-mb.cern.ch', 61313)]) | |||
|
|||
# WorkQueue replication | |||
self.workqueueSelector = getattr(config.AgentStatusWatcher, "workqueueSelector", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't mix Erlang code with Python, keep Erlang code apart since if you merge it here you'll be forced to change Python code every time you change a selector. While if you keep it separate you only may need to change it once and restart the Python code. From my experience, e.g. DBS using embedded SQL statements, it is much harder to maintain the code with embedded snippets than otherwise. I prefer to keep such stuff separate, e.g. for that separate area exists which can keep JS, images, templates, erlang, etc.
'filter': wqfilter, 'query_params': query_params}) | ||
self.replicatorDocs.append({'source': localQInboxURL, 'target': parentQURL, | ||
'filter': wqfilter, 'query_params': query_params}) | ||
self.replicatorDocs.append({'source': parentQueueUrl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if replicator docs already have such source or target? is it better to check it first. I know that CouchDB will create a revision, but may be you don't want that. It depends on logic of handling multiple versions of the same document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replication documents are deleted before new replications are defined (see the beginning of this method).
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Update parameter names to reflect WQE Nested workqueue structure Escape dots in the field name Escape dots in the right field Escape dots in AgentStatusPoller as well Load JSON file with replications in AgentStatusPoller Fix location of json filter; correct WQ filter reference Remove component configuration; perform safe print for replication
Escape backslash in json
I ran a bunch of DMWM tests in vocms0261 and I can now confirm that all the 3 replications are working well (wmagent_summary and the bidirectional workqueue one). @LinaresToine we are likely cutting a new release today, then I will ask you to also validate this new selector function for the agent (I think it is the tier0_request replication). |
Given that it had already been approved by Dario in the previous review, I will just move forward with this PR. One thing that I am not sure is where is the best place for the new json file. Under |
Fixes #11192
Status
ready
Description
As the title says, stop using a JavaScript filter in favor of a native Erlang selector function, for database replication between (T0)WMAgent and central services.
A new JSON file has been provided with the relevant selector configuration, using the same name as the filter javascript file in the
couchapps
package.Note that string field with dot separators has to be escaped, otherwise it gets expanded in the selector to a nested JSON, failing to match any documents.
Is it backward compatible (if not, which system it affects?)
YES
Related PRs
None
External dependencies / deployment changes
Both vocms0255 and submit11 have been running with an old first commit.