Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent new host overloading #1822

Merged
merged 29 commits into from
Aug 16, 2018
Merged

Prevent new host overloading #1822

merged 29 commits into from
Aug 16, 2018

Conversation

pschoenfelder
Copy link
Contributor

No description provided.

Copy link
Member

@ssalinas ssalinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall strategy of refreshing metrics in the offer scoring for things is good. Definitely want to make sure we only do it when metrics are too old (or maybe also after some number of additional tasks/percent of resources allocated since last metric?) and that we determine what kind of impact this has on scheduling speed

utilizationPerRequestId.values().forEach(usageManager::saveRequestUtilization);

if (configuration.isShuffleTasksForOverloadedSlaves()) {
shuffleTasksOnOverloadedHosts(overLoadedHosts);
}
}

public CompletableFuture<Void> getSlaveUsage(SingularitySlave slave) {
return usageCollectionSemaphore.call(() ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the individual method, not sure we want to make it completely async like the larger ones. This method will likely not be called from the same context as the poller itself, so it should probably fall under a different semaphore (e.g. the offer scoring one) if we want it to be async

&& t.getMesosTask().getSlaveId().getValue().equals(offerHolder.getSlaveId()))) {
Optional<SingularitySlave> maybeSlave = slaveManager.getSlave(offerHolder.getSlaveId());
if (maybeSlave.isPresent()) {
usagePoller.getSlaveUsage(maybeSlave.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will probably want to put this in currentSlaveUsagesBySlaveId after it's calculated. Calling this alone won't update the underlying values we pass to the scoring functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note, this would be a good thing to stick inside the completable future below. It's a good candidate to make async since we are now io bound on an api call and cpu bound on the calculations

Copy link
Member

@ssalinas ssalinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments on the async bits. The main point I'm not sure on is whether or not we are kicking off metric collect for use in a future offer scoring run, or gathering metrics synchronously for use in the current offer scoring run. I could see the first being faster for scoring, but we need to be careful that we aren't kicking off a bunch in a row for the same slave (i.e. because the first call hasn't finished yet). The second is more reliable in terms of making sure we have the most up to date metrics, but is slower since we have to wait for them

@@ -240,6 +238,59 @@ public SingularityMesosOfferScheduler(MesosConfiguration mesosConfiguration,
return offerHolders.values();
}

private Void buildScoringFuture(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit on naming, this method isn't actually building the future, it's a synchronous method for doing the actual scoring. I'd either rename this to calculateScore (or something like that), or move the supplyAsync inside this method and have it return the actual CompletableFuture<Void>

}
});
}
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I want to mention the choice here, could go either way on this. Currently this looks like the newly updated slave metrics will not be taken into account because we are returning here. Wouldn't we want to continue on to the scoring since we've gathered new metrics and put them in the map that is fed to calculateScore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll go with the latter option. For some reason I was thinking only the slave usage is necessary but probably safer to update the score too.

@@ -121,6 +129,10 @@ SingularitySlaveUsage getSlaveUsage() {
return diskInUseScore;
}

long getTimestamp() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be able to just do getSlaveUsage().getTimestamp() instead of having to store it in two places

utilizationPerRequestId.values().forEach(usageManager::saveRequestUtilization);

if (configuration.isShuffleTasksForOverloadedSlaves()) {
shuffleTasksOnOverloadedHosts(overLoadedHosts);
}
}

public CompletableFuture<SingularitySlaveUsage> getSlaveUsage(SingularitySlave slave) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I find as a code smell here is that now the offer scoring flow will rely on the usage semaphore and executor having enough permits/threads. Since within the offer scoring we are already in a block that is executed async, it may be worth calling collectSlaveUsage directly when used from the offer context, to avoid an additional layer of async work

} catch (Throwable t) {
String message = String.format("Could not get slave usage for host %s", slave.getHost());
LOG.error(message, t);
exceptionNotifier.notify(message, t);
}
return null; // TODO: is this really okay?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this method is called anywhere else that expects a return value. Could always wrap in an optional to make it more explicit that the result might not be there.

usage.get().getTimestamp()
));
} else {
throw new RuntimeException(throwable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the handling for this runtime exception? We currently aren't calling a get or join for the future created here which causes two issues for us:

  • calculateScore below may end up being called before the metrics recollection has a chance to run
  • If a RuntimeException is thrown here it is lost to us since there it will not propagate out of the future and no catch block currently logs it

new ConcurrentHashMap<>(),
usageManager.getRequestUtilizations(),
new ConcurrentHashMap<>(),
new AtomicLong(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given these arguments will be the same each time, does it make sense to create an overloaded method to handle those bits in UsagePoller instead?

@@ -180,7 +191,8 @@ public SingularityMesosOfferScheduler(MesosConfiguration mesosConfiguration,
mesosConfiguration.getScoreUsingSystemLoad(),
getMaxProbableUsageForSlave(activeTaskIds, requestUtilizations, offerHolders.get(usageWithId.getSlaveId()).getSanitizedHost()),
mesosConfiguration.getLoad5OverloadedThreshold(),
mesosConfiguration.getLoad1OverloadedThreshold()
mesosConfiguration.getLoad1OverloadedThreshold(),
usageWithId.getTimestamp()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second thoughts about the setup. Would it make sense to instead collect additional usages in this block instead? I'm realizing that the loop below would be called for each pending task. If we hit a case where collecting a particular slave usage is throwing exceptions or timing out, we will continue to recheck it for each pending task. Whereas, if we check in this block instead, we can just omit that up front and leave the block below as it was previously.

If we move the usage collection here, we'll likely want to convert this from parallelStream to a list of CompletableFutures like below to have better control over the concurrency

@ssalinas ssalinas changed the title WIP: Prevent new host overloading Prevent new host overloading Aug 6, 2018
@ssalinas
Copy link
Member

ssalinas commented Aug 6, 2018

🚢

1 similar comment
@pschoenfelder
Copy link
Contributor Author

🚢

if (slaveMetricsSnapshot != null) {
memoryMbReservedOnSlave = (long) slaveMetricsSnapshot.getSlaveMemUsed();
cpuReservedOnSlave = slaveMetricsSnapshot.getSlaveCpusUsed();
diskMbReservedOnSlave = (long) slaveMetricsSnapshot.getSlaveDiskUsed();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these actually mapped correctly? I know we said /slave/* maps to "reserved", but it also says "used"

if (slaveMetricsSnapshot != null) {
memoryMbReservedOnSlave = (long) slaveMetricsSnapshot.getSlaveMemUsed();
cpuReservedOnSlave = slaveMetricsSnapshot.getSlaveCpusUsed();
diskMbReservedOnSlave = (long) slaveMetricsSnapshot.getSlaveDiskUsed();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, casting doubles to long so we don't have to change the usage pojos? Seems smelly to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to using doubles everywhere. Can't remember which level it was where the longs were required

}

SingularityTaskUsage latestUsage = getUsage(taskUsage);
memoryBytesUsedOnSlave += latestUsage.getMemoryTotalBytes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these end up being any different than systemMemTotalBytes - systemMemFreeBytes ? I'm wondering if there isn't actually a need to total everything up for these cases. getUsage is still a zk call and it'd be nice to eliminate if we can. The totaled up by task values and the slave-reported totals seem fairly similar. I'm not certain why we have both in the POJO TBH, would have to look through commit history. (Or maybe @darcatron knows since he wrote the original versions of usage collection?)

SingularitySlaveUsage slaveUsage = new SingularitySlaveUsage(
cpuReservedOnSlave, cpuReservedOnSlave, cpusTotal,
memoryMbReservedOnSlave, memoryMbReservedOnSlave, memoryMbTotal,
diskMbReservedOnSlave, diskMbReservedOnSlave, diskMbTotal,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty funky to send in these same variables, but that's what reducing everything to use the metric snapshot resulted in (see the plain collectSlaveUsage below to compare).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the 'clean' way to do it could be to have a separate POJO with a more minimal set of fields, which gets used in the scheduling. The fuller class, which would extend that, has all of the fields is used in the poller + api

e.g. SingularitySlaveUsage extends SingularitySimpleSlaveUsage {}

@ssalinas ssalinas added this to the 0.21.0 milestone Aug 9, 2018
@baconmania
Copy link
Contributor

🚢

@ssalinas ssalinas merged commit 29c7199 into master Aug 16, 2018
@ssalinas ssalinas deleted the new-host-overloading branch August 16, 2018 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants