Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26285][CORE] accumulator metrics sources for LongAccumulator and Doub… #23242

Closed

Conversation

abellina
Copy link
Contributor

@abellina abellina commented Dec 6, 2018

…leAccumulator

What changes were proposed in this pull request?

This PR implements metric sources for LongAccumulator and DoubleAccumulator, such that a user can register these accumulators easily and have their values be reported by the driver's metric namespace.

How was this patch tested?

Unit tests, and manual tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@abellina abellina changed the title SPARK-26285: accumulator metrics sources for LongAccumulator and Doub… [SPARK-26285][CORE] accumulator metrics sources for LongAccumulator and Doub… Dec 6, 2018
@tgravescs
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Dec 7, 2018

Test build #99831 has finished for PR 23242 at commit 45cfada.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AccumulatorSource extends Source

/**
* Usage: AccumulatorMetricsTest [partitions] [numElem] [blockSize]
*/
object AccumulatorMetricsTest {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example named as Test is a bit confusing i think... thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@redsanket yes at first I thought that, but there are other many other "examples" here with the suffix Test.

@SparkQA
Copy link

SparkQA commented Dec 8, 2018

Test build #99877 has finished for PR 23242 at commit de531c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

override def metricRegistry: MetricRegistry = registry
}

class LongAccumulatorSource extends AccumulatorSource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be good to mark all of these are Experimental for now, that gives us an out if we need to change something with api until we get some usage of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add small description for scala docs for Long and Double Accumulator Source

* the CollectionAccumulator, as that is a List of values (hard to report,
* to a metrics system)
*/
class AccumulatorSource extends Source {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this private[spark] for now unless you have a use case that would need to public extend this


/**
* Usage: AccumulatorMetricsTest [partitions] [numElem] [blockSize]
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a description on what this does


val spark = SparkSession
.builder()
.config("spark.broadcast.blockSize", blockSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this, assume copy from the BroadcastTest one

val observedSizes = sc.parallelize(1 to 10, slices).map(_ => {
acc.add(1)
acc2.add(1.1)
barr1.value.length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could simplify the test by remove the broadcast stuff as that isn't really what we are testing, just leave the accumulator updates and do simple map. Print the accumulators at the end. Also perhaps we should put more information the register part and how someone running this might see those metrics output

@abellina
Copy link
Contributor Author

@tgravescs I made the suggested changes, and made the example more relevant.

@SparkQA
Copy link

SparkQA commented Dec 18, 2018

Test build #100292 has finished for PR 23242 at commit cf015cd.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 19, 2018

Test build #100293 has finished for PR 23242 at commit 7f60730.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*
* The only argument, numElem, sets the number elements in the collection to parallize.
*
* The result is output to stdout in the driver with the values of the accumulators.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should expand this more saying using ConsoleSink so the metrics going to stdout show up as metrics like AccumulatorSource.my-double-metric


/**
* :: Experimental ::
* Metrics source specifically for LongAccumulators.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should expand this and say user can register accumulators that show up in the spark driver metrics

@SparkQA
Copy link

SparkQA commented Dec 22, 2018

Test build #100373 has finished for PR 23242 at commit 682cfd7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@tgravescs
Copy link
Contributor

merged to master, thanks @abellina

@asfgit asfgit closed this in 0a02d5c Dec 22, 2018
holdenk pushed a commit to holdenk/spark that referenced this pull request Jan 5, 2019
…nd Doub…

…leAccumulator

## What changes were proposed in this pull request?

This PR implements metric sources for LongAccumulator and DoubleAccumulator, such that a user can register these accumulators easily and have their values be reported by the driver's metric namespace.

## How was this patch tested?

Unit tests, and manual tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes apache#23242 from abellina/SPARK-26285_accumulator_source.

Lead-authored-by: Alessandro Bellina <abellina@yahoo-inc.com>
Co-authored-by: Alessandro Bellina <abellina@oath.com>
Co-authored-by: Alessandro Bellina <abellina@gmail.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…nd Doub…

…leAccumulator

## What changes were proposed in this pull request?

This PR implements metric sources for LongAccumulator and DoubleAccumulator, such that a user can register these accumulators easily and have their values be reported by the driver's metric namespace.

## How was this patch tested?

Unit tests, and manual tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes apache#23242 from abellina/SPARK-26285_accumulator_source.

Lead-authored-by: Alessandro Bellina <abellina@yahoo-inc.com>
Co-authored-by: Alessandro Bellina <abellina@oath.com>
Co-authored-by: Alessandro Bellina <abellina@gmail.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants