-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protect against null counters #1726
Conversation
@@ -63,9 +64,30 @@ private[scalding] final case class GenericFlowPCounterImpl(fp: FlowProcess[_], s | |||
override def increment(amount: Long): Unit = fp.increment(statKey.group, statKey.counter, amount) | |||
} | |||
|
|||
private[scalding] object HadoopFlowPCounterImpl { | |||
@transient lazy val logger: Logger = LoggerFactory.getLogger(this.getClass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be private (but not private[this])
import HadoopFlowPCounterImpl.logger | ||
|
||
private[this] val cntr: Option[Counter] = { | ||
val reporter = fp.getReporter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can skip all the logging maybe we can make it more idomatic/shorter? (is the logging useful/going to be read?
Option(fp.getReporter).flatMap { reporter =>
Option(reporter.getCounter(statKey.group, statKey.counter)).flatMap(identity)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or even:
(for {
rep <- Option(fp.getReporter)
cnt <- Option(reporter.getCounter(statKey.group, statKey.counter))
} yield cnt).fold {
logger.warn(s"Cannot increment counter(${statKey.group}, ${statKey.counter}) because HadoopFlowProcess.getReporter returned null")
}(())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have more logging in my company's fork of Scalding in order to try and track down the issue. I didn't want to include that full logging in this PR, but that makes this logging not useful. I'll remove the logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although a warning the first time would be useful so people can figure out why the counter was not incremented.
The `getCounter` method of the `Reporter` returned from `HadoopFlowProcess` was returning null in some cases for a few jobs that we run in production. (It is unclear why these jobs were seeing null counters.) From looking at the Hadoop source code, getCounter does return null in some instances, in particular the Reporter.NULL implementation unconditionally returns null from its getCounter implementation. Hadoop does this despite not documenting that null is a valid return value. Solution: Null check the return value of `Reporter.getCounter` to workaround the issue. Fixes twitter#1716
268ef46
to
dd89eb5
Compare
Logging sounds fine to me! Whatever you all want to do. |
Also would it be acceptable to backport this to the 0.17.x branch so it comes with the next 0.17.x point release? |
Yeah, we could make a new release of scaling 0.17 with everything that is
binary compatible.
On Tue, Sep 26, 2017 at 19:06 Tom Dyas ***@***.***> wrote:
Also would it be acceptable to backport this to the 0.17.x branch so it
comes with the next 0.17.x point release?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1726 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEJdnS-xK_mvAjYUu1DG_Owh5PSSwzIks5smdfbgaJpZM4PjJRt>
.
--
P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin
|
#1729 made a minor optimization of this. |
The
getCounter
method of theReporter
returned fromHadoopFlowProcess
was returning null in some cases for a few jobs that we run in production. (It is unclear why these jobs were seeing null counters.)From looking at the Hadoop source code, getCounter does return null in some instances, in particular the Reporter.NULL implementation unconditionally returns null from its getCounter implementation. Hadoop does this despite not documenting that null is a valid return value.
Solution: Null check the return value of
Reporter.getCounter
to workaround the issue and log a warning.Fixes #1716