-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18364][YARN] Expose metrics for YarnShuffleService #149
Conversation
Registers the shuffle server's metrics with the Hadoop Node Manager's DefaultMetricsSystem.
I'd like for other Palantirians to review this and then send it upstream. |
try { | ||
MetricsSystemImpl metricsSystem = (MetricsSystemImpl) defaultMetricsSystem; | ||
|
||
Method registerSourceMethod = metricsSystem.getClass().getDeclaredMethod("registerSource", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it standard to use reflection in such a context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this method is package private so that's the only way I can call it. The only way the Node Manager exposes to register sources is sources formatted in the Hadoop Metrics system, which isn't compatible with the dropwizard metrics system Spark uses. That's why I have to do the ugly conversion in YarnShuffleServiceMetrics between the systems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just name the package accordingly - but either way is not ideal, and I have no preference between the two.
} else if (entry.getValue() instanceof Gauge) { | ||
Gauge m = (Gauge) entry.getValue(); | ||
Object gaugeValue = m.getValue(); | ||
if (gaugeValue instanceof Integer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can values have other numeric types? Longs or doubles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's only Gauge right now so I didn't put in the others: https://github.com/apache/spark/blob/master/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java#L194
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checkstyle will complain. Reflection isn't great but I don't see a better way.
t.getOneMinuteRate()) | ||
.addGauge(new ShuffleServiceMetricsInfo(name + "_rateMean", "Mean rate of timer " + name), | ||
t.getMeanRate()) | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull that up
|
||
package org.apache.spark.network.yarn; | ||
|
||
import com.codahale.metrics.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No start imports
m.getOneMinuteRate()) | ||
.addGauge(new ShuffleServiceMetricsInfo(name + "_rateMean", "Mean rate of meter " + name), | ||
m.getMeanRate()) | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull it up
Addressed comments in new commits |
@robert3005 @mccheah good to send upstream now? Happy to merge once the build passes also so we can start rolling this out |
Yeah, I think this is as good as it can be |
Upstreamed with link at https://issues.apache.org/jira/browse/SPARK-18364 |
Tests flaked with GC limit exceeded, restarted.. |
Produces metrics that look like this:
I'd happily get rid of these two lines if anyone has suggestions for how to do that: