Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protect against null counters #1726

Merged
merged 1 commit into from
Sep 26, 2017
Merged

Conversation

tdyas
Copy link
Contributor

@tdyas tdyas commented Sep 25, 2017

The getCounter method of the Reporter returned from HadoopFlowProcess was returning null in some cases for a few jobs that we run in production. (It is unclear why these jobs were seeing null counters.)

From looking at the Hadoop source code, getCounter does return null in some instances, in particular the Reporter.NULL implementation unconditionally returns null from its getCounter implementation. Hadoop does this despite not documenting that null is a valid return value.

Solution: Null check the return value of Reporter.getCounter to workaround the issue and log a warning.

Fixes #1716

@@ -63,9 +64,30 @@ private[scalding] final case class GenericFlowPCounterImpl(fp: FlowProcess[_], s
override def increment(amount: Long): Unit = fp.increment(statKey.group, statKey.counter, amount)
}

private[scalding] object HadoopFlowPCounterImpl {
@transient lazy val logger: Logger = LoggerFactory.getLogger(this.getClass)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be private (but not private[this])

import HadoopFlowPCounterImpl.logger

private[this] val cntr: Option[Counter] = {
val reporter = fp.getReporter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can skip all the logging maybe we can make it more idomatic/shorter? (is the logging useful/going to be read?

Option(fp.getReporter).flatMap { reporter =>
   Option(reporter.getCounter(statKey.group, statKey.counter)).flatMap(identity)
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even:

(for {
 rep <- Option(fp.getReporter)
 cnt <- Option(reporter.getCounter(statKey.group, statKey.counter))
} yield cnt).fold {
  logger.warn(s"Cannot increment counter(${statKey.group}, ${statKey.counter}) because HadoopFlowProcess.getReporter returned null")
}(())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have more logging in my company's fork of Scalding in order to try and track down the issue. I didn't want to include that full logging in this PR, but that makes this logging not useful. I'll remove the logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although a warning the first time would be useful so people can figure out why the counter was not incremented.

The `getCounter` method of the `Reporter` returned from `HadoopFlowProcess` was
returning null in some cases for a few jobs that we run in production. (It is
unclear why these jobs were seeing null counters.)

From looking at the Hadoop source code, getCounter does return null in some instances,
in particular the Reporter.NULL implementation unconditionally returns null from
its getCounter implementation. Hadoop does this despite not documenting that null
is a valid return value.

Solution: Null check the return value of `Reporter.getCounter` to workaround the issue.

Fixes twitter#1716
@tdyas tdyas force-pushed the null_check_counters branch from 268ef46 to dd89eb5 Compare September 25, 2017 21:25
@ianoc
Copy link
Collaborator

ianoc commented Sep 26, 2017

@tdyas Thanks for the changes, if you like @johnynek had some logging to try warn the first time it happened. But i'm going to merge your changes here as is since they look good, but adding some logging back is good with me too(just don't want to leave your PR linger on you).

@ianoc ianoc merged commit 13e230d into twitter:develop Sep 26, 2017
@tdyas
Copy link
Contributor Author

tdyas commented Sep 27, 2017

Logging sounds fine to me! Whatever you all want to do.

@tdyas
Copy link
Contributor Author

tdyas commented Sep 27, 2017

Also would it be acceptable to backport this to the 0.17.x branch so it comes with the next 0.17.x point release?

@johnynek
Copy link
Collaborator

johnynek commented Sep 27, 2017 via email

@johnynek
Copy link
Collaborator

#1729 made a minor optimization of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants