Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication issues "LDAP error 51 (Server is busy)" #6551

Open
gudtanha opened this issue Jan 28, 2025 · 0 comments
Open

Replication issues "LDAP error 51 (Server is busy)" #6551

gudtanha opened this issue Jan 28, 2025 · 0 comments
Labels
needs triage The issue will be triaged during scrum

Comments

@gudtanha
Copy link

gudtanha commented Jan 28, 2025

Issue Description
For no apparent reason, the supplier somtimes isn't able to initialize a consumer, which is reflected in the supplier error log like this:

[timestamp] - INFO - NSMMReplicationPlugin - repl5_tot_run - Beginning total update of replica "agmt="consumer node 2" (consumer:636)".
[timestamp] - ERR - NSMMReplicationPlugin - perform_operation - agmt="cn=consumer node 2" (consumer:636): Failed to send extended operation: LDAP error 51 (Server is busy)

There's no error logged on the consumer. The accesslog of the consumer shows 250 entries received and a clean connection close.
Sometimes, the following error is logged on the consumer - couldn't figure out the realtion to when it's absent or accurring:

[timestamp] - NOTICE - NSMMReplicationPlugin - multisupplier_be_state_change - Replica dc=gi-de,dc=com is going offline; disabling replication
[timestamp] - INFO - bdb_instance_start - Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[timestamp] - ERR - factory_destructor - ERROR bulk import abandoned
[timestamp] - ERR - bdb_import_run_pass - import userroot: Thread monitoring returned: -23
...
[timestamp] - INFO - bdb_public_bdb_import_main - import userroot: Closing files...
[timestamp] - ERR - bdb_public_bdb_import_main - import userroot: Import failed.
[timestamp] - ERR - process_bulk_import_op - NULL target sdn

Repeating the initialization on the master will eventually work if you try often enough.

But the problem persists for regular (non-scheduled) replication sessions.
Again, for no apparent reason, but across all consumers, sending updates frequently fails, which is reflected in the supplier error log like this:

[timestamp] - WARN - send_updates - %s: Failed to send update operation to receiver (uniqueid %s, CSN %s): %s. %s.
 - agmt="cn=anotherconsumer singlenode" (anotherconsumer:636)[timestamp] - ERR - NSMMReplicationPlugin - perform_operation - agmt="cn=anotherconsumer singlenode" (anotherconsumer:636): Failed to send extended operation: LDAP error 51 (Server is busy)
[timestamp] - ERR - NSMMReplicationPlugin - release_replica - agmt="cn=anotherconsumer singlenode" (anotherconsumer:636): Unable to send endReplication extended operation (Server is busy)

Again, there's no correspondig log on the consumer, the machine isn't under load at the time of the event and network is fine.

Package Version and Platform:

  • Platform: RHEL9_5
  • Package and version: 2.5.3.202501211239git86dd51fd1 (supplier) / 2.5.2-2.el9_5 (consumer)
  • Browser <- is not a aedequate management application

Steps to Reproduce
Steps to reproduce the behavior:

  1. Create supplier/consumer replication agreement according to the docs using 'dsconf'
  2. Watch suppliers error log

Expected results
Initialization works all the time at first run
No updating errors during regular replication

Screenshots
Cockpit Monitoring showing:
Changes Sent: 1:15219845/0 0:18168/0
(The second counter shows up only if replication had issues)

Additional context
We've been running 1.3.10 for 3 years without a single issue.
I started updating to 2.3, where MemberOfPlugin suffers from soft-locks. Version 2.4 has the same Plugin problems, additional cockpit problems and as far as I remember introduced those "server Busy" errors.
After comaring branches, I found out the 1.4 builds on top of 1.3 and 2.5 is closest to 1.4 with regards to the memberOfPlugin.
So once again I updated the complete landscape, this time to 2.5.2 for the consumers (RHEL9_5 repo) and 2.5.3.202501211239git86dd51fd1 for the supplier. Our simple single-supplier replication setup isn't working errorfree, as opposed to what we had with 1.3.10.
I even can't find out why to choose what branch or why all these branches exist on the first hand, if commits are pulled untested anyways.

I already checked what was mentioned in this reply, without any effect:
https://www.mail-archive.com/389-users@lists.fedoraproject.org/msg10204.html

Any help highly appreciated

@gudtanha gudtanha added the needs triage The issue will be triaged during scrum label Jan 28, 2025
@gudtanha gudtanha changed the title Replication issues " LDAP error 51 (Server is busy)" Replication issues "LDAP error 51 (Server is busy)" Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage The issue will be triaged during scrum
Projects
None yet
Development

No branches or pull requests

1 participant