-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add options for verbose logging in FileSource #1613
Add options for verbose logging in FileSource #1613
Conversation
@@ -119,6 +110,32 @@ object AcceptAllPathFilter extends PathFilter { | |||
} | |||
|
|||
object FileSource { | |||
val LOG = LoggerFactory.getLogger(this.getClass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you tested this on the cluster? Seems like serialization woes are possible here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same worry. I'm in the middle of testing it out. It's also really hard to get these logs to show up in the tests. Something to do w/ slf4j configuration. So I also want to confirm these logs actually get printed on a real cluster.
I think this'll work, because I see it being done the same way in the reducer estimator and a few other places. Worse case I'll make it lazy transient.
|
||
val VerboseFileSourceLoggingKey = "com.twitter.scalding.filesource.verbose.logging" | ||
|
||
def verboseLogEnabled(conf: Configuration): Boolean = conf.getBoolean(VerboseFileSourceLoggingKey, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we try to use scalding.Config
where possible. Can we use that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that or make this private please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make it private. Is it better to use scalding.Config even if it's private? It seems that what I'll have access to is a Configuration -- does it make sense to convert that to a scalding.Config (probably not?)
|
||
def verboseLogEnabled(conf: Configuration): Boolean = conf.getBoolean(VerboseFileSourceLoggingKey, false) | ||
|
||
def ifVerboseLog(conf: Configuration)(msgFn: => String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this take shouldLog: Boolean
instead? And maybe make a
private def ifVerboseConf(conf: Configuration)(msg: => String): Unit =
ifVerboreLog(verboseLogEnabled(conf))
to minimize exposing Configuration
on our APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these can actually all be private, or maybe private[scalding] -- would that be enough?
@@ -7,7 +7,7 @@ import java.net.URI | |||
import java.nio.ByteBuffer | |||
|
|||
import org.apache.hadoop.conf.Configuration | |||
import org.apache.hadoop.mapreduce.filecache.{DistributedCache => HDistributedCache} | |||
import org.apache.hadoop.mapreduce.filecache.{ DistributedCache => HDistributedCache } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not commit style changes like this on the side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so this is what happens when you run the unit tests (scalafmt) -- I wasn't sure whether to keep this. If I don't commit this, next time someone runs the unit tests, it'll do this again. So I thought I'd just keep the changes. I can back them out though and send a separate PR that only does the style fixes.
@@ -59,7 +59,6 @@ object RichPipeSpecification extends Properties("RichPipe") { | |||
(newNum > basePipeNumber) && ap.getName.endsWith(s"-${lastGroup}") | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove this from the diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, what do you think?
@@ -119,6 +110,32 @@ object AcceptAllPathFilter extends PathFilter { | |||
} | |||
|
|||
object FileSource { | |||
val LOG = LoggerFactory.getLogger(this.getClass) | |||
|
|||
val VerboseFileSourceLoggingKey = "com.twitter.scalding.filesource.verbose.logging" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we declare this in Config
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes maybe, let me see how that works
|
||
def ifVerboseLog(conf: Configuration)(msgFn: => String): Unit = { | ||
if (verboseLogEnabled(conf)) { | ||
val stack = new RuntimeException() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we instead do Thread.currentThread().getStackTrace()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe -- I need to check if there's a difference. I think there might be but I'm not 100% sure.
@@ -119,6 +110,32 @@ object AcceptAllPathFilter extends PathFilter { | |||
} | |||
|
|||
object FileSource { | |||
val LOG = LoggerFactory.getLogger(this.getClass) | |||
|
|||
val VerboseFileSourceLoggingKey = "com.twitter.scalding.filesource.verbose.logging" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a short line describing the use of this param? Add if we have a sense of the overhead it causes, that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah good idea, i'll document this
OK -- updated to have comments + to use scalding.Config (to some extent) and to make the helper methods private. Let me know what you think about the style changes / if the should be removed. |
👍 if you were able to verify that this works on the cluster |
OK I've tested this on the cluster as well. It's pretty noisy -- we could log this to a side file instead maybe. |
👍 |
Thanks! Any opinion on log-to-console vs log-to-side-file? |
I think we should just use normal terrible java logging. If you want to log I don't see myself running this option, so I am not too worried about it as On Thu, Oct 27, 2016 at 1:27 PM Alex Levenson notifications@github.com
|
Ya makes sense. I could have done all the turning on/off of this via terrible slf4j configs but I didn't want to do that because it never really works / often gets misconfigured outside of your control. So that's why I used the hadoop conf. I'll leave this as is. |
For a long time we've had issues with getting the data availability checks right in FileSource. It's often hard to debug reports of intermittent errors in FileSource's behavior, because by the time we get to take a look, the HDFS file system may have changed already.
This PR adds a hadoop config variable that can be enabled which adds some very verbose logging, which sort of shows "what the file system looked like" when a job was run and when it was doing its various validations / sanity checks.
It's a little bit messy to have this logging code inside of FileSource but I do think it will help us going forward. Right now we are trying to debug some issues with the new behavior that allows for empty directories that contain success files. It may be doing exactly what it's supposed to, or it may have a bug, but we can't really catch it as it happens. This way we'll at least have the option to know what the FS looked like at run time.