Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18364][YARN] Expose metrics for YarnShuffleService #149

Merged
merged 3 commits into from
Mar 24, 2017

Conversation

ash211
Copy link

@ash211 ash211 commented Mar 23, 2017

Produces metrics that look like this:

[user@host ~]$ curl -sk -XGET https://`hostname -f`:8042/jmx | jq . | grep 'shuffleservice' -B 1 -A 18
    {
      "name": "Hadoop:service=NodeManager,name=shuffleservice",
      "modelerType": "shuffleservice",
      "tag.Hostname": "<redacted>",
      "openBlockRequestLatencyMillis_count": 1,
      "openBlockRequestLatencyMillis_rate15": 0.0011080303990206543,
      "openBlockRequestLatencyMillis_rate5": 0.0033057092356765017,
      "openBlockRequestLatencyMillis_rate1": 0.015991117074135343,
      "openBlockRequestLatencyMillis_rateMean": 0.003843993699021382,
      "blockTransferRateBytes_count": 118,
      "blockTransferRateBytes_rate15": 0.1307475870844372,
      "blockTransferRateBytes_rate5": 0.39007368980982715,
      "blockTransferRateBytes_rate1": 1.8869518147479705,
      "blockTransferRateBytes_rateMean": 0.45359183094454836,
      "registeredExecutorsSize": 2,
      "registerExecutorRequestLatencyMillis_count": 2,
      "registerExecutorRequestLatencyMillis_rate15": 0.001697343764758814,
      "registerExecutorRequestLatencyMillis_rate5": 0.002970701813078509,
      "registerExecutorRequestLatencyMillis_rate1": 0.0005857750515146702,
      "registerExecutorRequestLatencyMillis_rateMean": 0.007687995987242345
    },
[user@host ~]$

I'd happily get rid of these two lines if anyone has suggestions for how to do that:

      "modelerType": "shuffleservice",
      "tag.Hostname": "<redacted>",

Registers the shuffle server's metrics with the Hadoop Node Manager's
DefaultMetricsSystem.
@ash211
Copy link
Author

ash211 commented Mar 23, 2017

I'd like for other Palantirians to review this and then send it upstream.

try {
MetricsSystemImpl metricsSystem = (MetricsSystemImpl) defaultMetricsSystem;

Method registerSourceMethod = metricsSystem.getClass().getDeclaredMethod("registerSource",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it standard to use reflection in such a context?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this method is package private so that's the only way I can call it. The only way the Node Manager exposes to register sources is sources formatted in the Hadoop Metrics system, which isn't compatible with the dropwizard metrics system Spark uses. That's why I have to do the ugly conversion in YarnShuffleServiceMetrics between the systems

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just name the package accordingly - but either way is not ideal, and I have no preference between the two.

} else if (entry.getValue() instanceof Gauge) {
Gauge m = (Gauge) entry.getValue();
Object gaugeValue = m.getValue();
if (gaugeValue instanceof Integer) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can values have other numeric types? Longs or doubles?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkstyle will complain. Reflection isn't great but I don't see a better way.

t.getOneMinuteRate())
.addGauge(new ShuffleServiceMetricsInfo(name + "_rateMean", "Mean rate of timer " + name),
t.getMeanRate())
;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull that up


package org.apache.spark.network.yarn;

import com.codahale.metrics.*;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No start imports

m.getOneMinuteRate())
.addGauge(new ShuffleServiceMetricsInfo(name + "_rateMean", "Mean rate of meter " + name),
m.getMeanRate())
;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull it up

@ash211
Copy link
Author

ash211 commented Mar 23, 2017

Addressed comments in new commits

@ash211
Copy link
Author

ash211 commented Mar 23, 2017

@robert3005 @mccheah good to send upstream now?

Happy to merge once the build passes also so we can start rolling this out

@robert3005
Copy link

Yeah, I think this is as good as it can be

@ash211
Copy link
Author

ash211 commented Mar 23, 2017

Upstreamed with link at https://issues.apache.org/jira/browse/SPARK-18364

@ash211
Copy link
Author

ash211 commented Mar 23, 2017

Tests flaked with GC limit exceeded, restarted..

@ash211 ash211 merged commit 4db3e7f into master Mar 24, 2017
@ash211 ash211 deleted the feature/yarn-shuffle-metrics branch March 24, 2017 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants