Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counters for last persister successes #2167

Merged
merged 6 commits into from
Jan 22, 2021

Conversation

rosalind210
Copy link
Contributor

Counter to ensure that zk -> mysql is happening. Every time one of the persisters move their data to history, the counter is logged and increased.

@ssalinas
Copy link
Member

Small comment. Can we move these so that they mark the end of a full successful run of the persister? Might have to do something like keep a boolean running through the for-loop, since we try/catch within the loop to cover as much as we can. We would want to know this since, even if one small piece succeeds and increments the timestamp, we could still be missing/erroring on a much larger and more problematic part and not know it

@@ -184,8 +196,10 @@ protected boolean moveToHistory(SingularityTaskId object) {
LOG.debug("Moving {} to history", object);
try {
historyManager.saveTaskHistory(taskHistory.get());
persisterSuccess = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up making things the same as before. Due to the fact that we catch and continue the loop, we can end up with a situation like the following:

item 1 succeeds -> persisterSuccess = true
item 2 fails -> persisterSuccess = false
item 3 succeeds -> persisterSuccess = true
end -> mark as success

So in that case we are losing the fact that item 2 failed. Might be able to do it easier with a variable that starts as true and is just turned to false if we ever enter one of the catch blocks.

Side question, why make this a class level variable vs just initializing a boolean on each run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, that makes much more sense; I'll update.

I can update to initialize on each run, thinking about it now it's probably better for gc, I was trying to reduce some code copying.

Copy link
Member

@ssalinas ssalinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

if (persisterSuccess.get()) {
LOG.info(
"Request Persister: successful move to history ({} so far)",
lastPersisterSuccess.incrementAndGet()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably most helpful to set this to something like System.currentTimeMillis() that way we know the last timestamp it ran and can generate an effective 'lag' type of metric off of that. incrementAndGet will just add 1 to the existing value. Logging message would change then too, something like 'finished run on persister at {}' or similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good, I'll make that update.

Should the lastPeristerSuccess be a timestamp rather than an AtomicLong for consumption from InternalSingularityService/sfx?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AtomicLong is nice in that we are never competing between threads to read/update it (or if it's inside a lambda and has to be effectively final. It's also hard to update a raw long but also bind it in guice as a singleton without something like AtomicLong to wrap it

@ssalinas
Copy link
Member

🚢

@rosalind210 rosalind210 merged commit 28de537 into master Jan 22, 2021
@rosalind210 rosalind210 deleted the counters_for_successful_persister_runs branch April 27, 2021 19:07
@ssalinas ssalinas added this to the 1.5.0 milestone May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants