Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speeds up fate lock acquisition #5262

Merged
merged 12 commits into from
Feb 1, 2025
Merged

Conversation

keith-turner
Copy link
Contributor

Stores the lock data for fate locks in the zookeeper node name instead of the zookeeper data for the node. Ran some local performance test with hundreds of fate operations and saw lock times go from 750ms to 15ms.

fixes #5181

Stores the lock data for fate locks in the zookeeper node name instead
of the zookeeper data for the node.  Ran some local performance test
with hundreds of fate operations and saw lock times go from 750ms to
15ms.

fixes apache#5181
}
}

// TODO change data arg from byte[] to String.. in the rest of the code its always a String.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open a follow on issue for this and remove the TODO before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the TODO and #5264. In the TODO I was thinking of using String, but realized using more concrete types would be better.

@ctubbsii ctubbsii modified the milestones: 2.1.4, 4.0.0 Jan 15, 2025
List<FateLock.FateLockNode> lockNodes =
FateLock.validateAndWarn(fLockPath, zr.getChildren(fLockPath.toString()));

lockNodes.sort(Comparator.comparingLong(ln -> ln.sequence));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the return type be a sorted set instead of a list that you sort later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to a sorted set in 3f25ce0

keith-turner and others added 5 commits January 14, 2025 20:26
…eLock.java

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
…tributedReadWriteLock.java

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>

List<TabletId> tabletIds;
// start a compaction on each tablet
try (var tablets = client.tableOperations().getTabletInformation(table1, new Range())) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this test initially to check performance, but realized its a good test of many compaction on the same table w/ different config also.

I did not initially see a performance difference w/ this test. Looking into it the reason was the compaction were finishing too quickly, so only 8 or 9 compaction fate operations were active at a time. For the performance bug to be observed needed many more active fate operations. Locally I modified the test to kill the compactor processes and start 1K compaction fate operations. With those changes I could see orders of magnitude diffs for fate lock acquisition time.

Comment on lines 83 to 86
Preconditions.checkArgument(nodeName.startsWith(PREFIX) && nodeName.charAt(len - 11) == '#',
"Illegal node name %s", nodeName);
sequence = Long.parseLong(nodeName.substring(len - 10));
lockData = nodeName.substring(PREFIX.length(), len - 11);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, while this is probably unlikely to happen, the sequential numbering can overflow causing a negative to occur, which I think would cause a problem with this parsing instead of splitting on #.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added handing and test for negative seq numbers in 2309b9b

Comment on lines 83 to 84
Preconditions.checkArgument(nodeName.startsWith(PREFIX) && nodeName.charAt(len - 11) == '#',
"Illegal node name %s", nodeName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of an overflow, the character at (len - 11) be a minus sign and cause this to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the number of digits in the negative number. If its a 10 digit negative number then %010d will produce a 11 char string. If its 9 digits or less then %010d will produce a 10 char string including the negative. For the 9 digit or less case the # check will not catch negative, but the later Long.parseUnsignedLong call will catch it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the case Long.parseUnsignedLong catches the problem it throws a more specific exception, I updated the test to check for that.

locks = heldLocks;
} else {
locks = waitingLocks;
}

locks.computeIfAbsent(fateId, k -> new ArrayList<>()).add(lda[0].charAt(0) + ":" + id);
locks.computeIfAbsent(fateId, k -> new ArrayList<>())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the new FateLockEntry type could be used in this Admin class instead of the string things its doing here. Going to open a follow on issue.

Copy link
Contributor

@cshannon cshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes LGTM

System.arraycopy(fateIdBytes, 0, result, typeBytes.length + 1, fateIdBytes.length);
return result;
public String serialize() {
return lockType.name() + ":" + fateId.canonical();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is SO much nicer, not having to do all that manual effort to copy bytes around is great.

this.fateId =
FateId.from(new String(Arrays.copyOfRange(entry, split + 1, entry.length), UTF_8));
private FateLockEntry(String entry) {
var fields = entry.split(":", 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to verify the number of fields is 2? I suppose it will just fail a couple lines later if not correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That just grabs the first two splits even if there are more. The FateId also has a : so calling split w/o the 2 will return three fields.

// ignored
var parsed = new NodeName(name);
if (predicate.test(parsed.sequence, parsed.fateLockEntry)) {
// Use a supplier so we don't need to deserialize unless the calling code cares about
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to move this comment to NodeName where the supplier is created now, but it doesn't matter too much either way probably

@keith-turner keith-turner merged commit 3213646 into apache:main Feb 1, 2025
8 checks passed
@keith-turner keith-turner deleted the accumulo-5181 branch February 1, 2025 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of FateLock code for the case when there are lots of locks.
4 participants