Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc changes for adaptive operation tracker #1218

Merged
merged 5 commits into from
Jul 17, 2019

Conversation

jsjtzyy
Copy link
Contributor

@jsjtzyy jsjtzyy commented Jul 16, 2019

  1. add log info to record which type of adaptive tracker is used
  2. enforce max number of inlight requests for adaptive tracker
  3. make excluding timedout request configurable
  4. support periodically dumping resource-level histogram to log file

1. add log info to record which type of adaptive tracker is used
2. enforce max number of inlight requests for adaptive tracker
3. make excluding timedout request configurable
4. support periodically dumping resource-level histogram to log file
@jsjtzyy jsjtzyy requested review from cgtz and zzmao July 16, 2019 21:58
@jsjtzyy jsjtzyy self-assigned this Jul 16, 2019
@jsjtzyy
Copy link
Contributor Author

jsjtzyy commented Jul 16, 2019

will keep adding java docs

@codecov-io
Copy link

codecov-io commented Jul 16, 2019

Codecov Report

Merging #1218 into master will decrease coverage by 0.11%.
The diff coverage is 40%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1218      +/-   ##
============================================
- Coverage     69.73%   69.61%   -0.12%     
- Complexity     5480     5484       +4     
============================================
  Files           431      431              
  Lines         33554    33613      +59     
  Branches       4258     4267       +9     
============================================
+ Hits          23398    23399       +1     
- Misses         8990     9036      +46     
- Partials       1166     1178      +12
Impacted Files Coverage Δ Complexity Δ
...ava/com.github.ambry.router/NonBlockingRouter.java 78.32% <100%> (+0.06%) 44 <0> (ø) ⬇️
....github.ambry.router/NonBlockingRouterMetrics.java 87.15% <22.22%> (-6.4%) 58 <8> (+3)
....github.ambry.router/AdaptiveOperationTracker.java 92.07% <77.77%> (-1.54%) 29 <0> (+1)
...ain/java/com.github.ambry/config/RouterConfig.java 97.5% <80%> (-2.5%) 1 <0> (ø)
...java/com.github.ambry.store/CompactionManager.java 87.33% <0%> (-2.67%) 19% <0%> (ø)
...github.ambry.rest/AsyncRequestResponseHandler.java 88.59% <0%> (-2.29%) 23% <0%> (ø)
...b.ambry.network/BlockingChannelConnectionPool.java 70.42% <0%> (-1.88%) 8% <0%> (ø)
...in/java/com.github.ambry.store/BlobStoreStats.java 70.92% <0%> (-1.04%) 103% <0%> (-1%)
...ain/java/com.github.ambry.router/PutOperation.java 90.6% <0%> (-0.54%) 110% <0%> (-1%)
...va/com.github.ambry.replication/ReplicaThread.java 74.62% <0%> (-0.19%) 66% <0%> (-1%)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79b311b...45beb74. Read the comment docs.

Copy link
Contributor

@cgtz cgtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes quickly. It looks good after addressing these comments

@@ -84,6 +84,10 @@
localDcResourceToHistogram = getResourceToLatencyMap(routerOperation, true);
crossDcResourceToHistogram = getResourceToLatencyMap(routerOperation, false);
}
if (parallelism > routerConfig.routerAdaptiveTrackerMaxInflightRequests) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to also enforce this in RouterConfig constructor so that startup will fail if it is set improperly instead of failing continuously at runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will make the change.

@@ -298,6 +298,35 @@
@Default("1000")
public final long routerOperationTrackerMinDataPointsRequired;

/**
* The maximum number of inflight requests that allowed for adaptive tracker. If current number of inflight requests
* is larger than or equal to this threshold, tracker shouldn't send out any request even though the oldest is past due.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Seems like all the other adaptive configs start with router.operation.tracker, even if they are only relevant to AdaptiveOperationTracker. Could we keep this the same?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also document the difference between parallelism and max.inflight.requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do

@@ -219,7 +224,8 @@ private Counter getWholeDcPastDueCounter(RouterOperation routerOperation) {

@Override
public boolean hasNext() {
return replicaIterator.hasNext() && (inflightCount < parallelism || isOldestRequestPastDue());
return replicaIterator.hasNext() && (inflightCount < parallelism || (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more readable to break this into multiple if statements:

      if (replicaIterator.hasNext()) {
        if (inflightCount < parallelism) {
          return true;
        }
        if (inflightCount < routerConfig.routerAdaptiveTrackerMaxInflightRequests && isOldestRequestPastDue()) {
          return true;
        }
      }
      return false;

Up to your personal preference, though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, your code is more readable. I will take your advice.

@Override
public void run() {
double quantile = routerConfig.routerLatencyToleranceQuantile;
for (Map.Entry<Resource, Histogram> resourceToHistogram : getBlobLocalDcResourceToLatency.entrySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of these maps can be null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will add check here.

@@ -400,13 +406,17 @@ public RouterConfig(VerifiableProperties verifiableProperties) {
verifiableProperties.getDouble("router.operation.tracker.reservoir.decay.factor", 0.015);
routerOperationTrackerMinDataPointsRequired =
verifiableProperties.getLong("router.operation.tracker.min.data.points.required", 1000L);
routerAdaptiveTrackerMaxInflightRequests =
routerOperationTrackerMaxInflightRequests =
verifiableProperties.getIntInRange("router.adaptive.tracker.max.inflight.requests", 2, 1, Integer.MAX_VALUE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change adaptive to operation here

@@ -652,6 +701,8 @@ private RouterConfig createRouterConfig(boolean crossColoEnabled, int successTar
props.setProperty("router.get.replicas.required", Integer.toString(Integer.MAX_VALUE));
props.setProperty("router.latency.tolerance.quantile", Double.toString(QUANTILE));
props.setProperty("router.operation.tracker.metric.scope", trackerScope.toString());
props.setProperty("router.adaptive.tracker.max.inflight.requests", Integer.toString(maxInflightNum));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also change here after changing config key

@@ -555,6 +555,7 @@ private OperationTracker getOperationTracker(boolean crossColoEnabled, int succe
Boolean.toString(includeNonOriginatingDcReplicas));
props.setProperty("router.get.replicas.required", Integer.toString(replicasRequired));
props.setProperty("router.latency.tolerance.quantile", Double.toString(QUANTILE));
props.setProperty("router.adaptive.tracker.max.inflight.requests", Integer.toString(parallelism));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, will fix soon

quantile * 100, histogram.getSnapshot().getValue(quantile));
}
for (Map.Entry<Resource, Histogram> resourceToHistogram : getBlobInfoLocalDcResourceToLatency.entrySet()) {
Resource resource = resourceToHistogram.getKey();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a function, method or class to make a general histgram dumper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but I need to think about this. Will make the change in future PR.

@cgtz cgtz merged commit 74364c7 into linkedin:master Jul 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants