-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26285][CORE] accumulator metrics sources for LongAccumulator and Doub… #23242
Conversation
ok to test |
Test build #99831 has finished for PR 23242 at commit
|
/** | ||
* Usage: AccumulatorMetricsTest [partitions] [numElem] [blockSize] | ||
*/ | ||
object AccumulatorMetricsTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example named as Test is a bit confusing i think... thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@redsanket yes at first I thought that, but there are other many other "examples" here with the suffix Test.
Test build #99877 has finished for PR 23242 at commit
|
override def metricRegistry: MetricRegistry = registry | ||
} | ||
|
||
class LongAccumulatorSource extends AccumulatorSource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be good to mark all of these are Experimental for now, that gives us an out if we need to change something with api until we get some usage of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add small description for scala docs for Long and Double Accumulator Source
* the CollectionAccumulator, as that is a List of values (hard to report, | ||
* to a metrics system) | ||
*/ | ||
class AccumulatorSource extends Source { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make this private[spark] for now unless you have a use case that would need to public extend this
|
||
/** | ||
* Usage: AccumulatorMetricsTest [partitions] [numElem] [blockSize] | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a description on what this does
|
||
val spark = SparkSession | ||
.builder() | ||
.config("spark.broadcast.blockSize", blockSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this, assume copy from the BroadcastTest one
val observedSizes = sc.parallelize(1 to 10, slices).map(_ => { | ||
acc.add(1) | ||
acc2.add(1.1) | ||
barr1.value.length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could simplify the test by remove the broadcast stuff as that isn't really what we are testing, just leave the accumulator updates and do simple map. Print the accumulators at the end. Also perhaps we should put more information the register part and how someone running this might see those metrics output
…al flags and comments
…spark into SPARK-26285_accumulator_source
@tgravescs I made the suggested changes, and made the example more relevant. |
Test build #100292 has finished for PR 23242 at commit
|
Test build #100293 has finished for PR 23242 at commit
|
* | ||
* The only argument, numElem, sets the number elements in the collection to parallize. | ||
* | ||
* The result is output to stdout in the driver with the values of the accumulators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should expand this more saying using ConsoleSink so the metrics going to stdout show up as metrics like AccumulatorSource.my-double-metric
|
||
/** | ||
* :: Experimental :: | ||
* Metrics source specifically for LongAccumulators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should expand this and say user can register accumulators that show up in the spark driver metrics
Test build #100373 has finished for PR 23242 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
merged to master, thanks @abellina |
…nd Doub… …leAccumulator ## What changes were proposed in this pull request? This PR implements metric sources for LongAccumulator and DoubleAccumulator, such that a user can register these accumulators easily and have their values be reported by the driver's metric namespace. ## How was this patch tested? Unit tests, and manual tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#23242 from abellina/SPARK-26285_accumulator_source. Lead-authored-by: Alessandro Bellina <abellina@yahoo-inc.com> Co-authored-by: Alessandro Bellina <abellina@oath.com> Co-authored-by: Alessandro Bellina <abellina@gmail.com> Signed-off-by: Thomas Graves <tgraves@apache.org>
…nd Doub… …leAccumulator ## What changes were proposed in this pull request? This PR implements metric sources for LongAccumulator and DoubleAccumulator, such that a user can register these accumulators easily and have their values be reported by the driver's metric namespace. ## How was this patch tested? Unit tests, and manual tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#23242 from abellina/SPARK-26285_accumulator_source. Lead-authored-by: Alessandro Bellina <abellina@yahoo-inc.com> Co-authored-by: Alessandro Bellina <abellina@oath.com> Co-authored-by: Alessandro Bellina <abellina@gmail.com> Signed-off-by: Thomas Graves <tgraves@apache.org>
…leAccumulator
What changes were proposed in this pull request?
This PR implements metric sources for LongAccumulator and DoubleAccumulator, such that a user can register these accumulators easily and have their values be reported by the driver's metric namespace.
How was this patch tested?
Unit tests, and manual tests.
Please review http://spark.apache.org/contributing.html before opening a pull request.