-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest-attachment error/stacktrace logging fix #93608
base: main
Are you sure you want to change the base?
Conversation
by default. This can, of course, be turned back up as usual via the logging layer and/or cluster settings.
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @joegallo, I've created a changelog YAML for you. |
Note to self that we need to use the |
Since we're not using that library anymore.
https://github.com/apache/logging-log4j2/tree/release-2.x/log4j-slf4j-impl for the actual implementation, and https://github.com/apache/logging-log4j2/tree/release-2.x for the LICENSE.txt and NOTICE.txt contents.
https://github.com/qos-ch/slf4j/tree/master/jcl-over-slf4j is where the implementation lives, there's a LICENSE there but no notice, nor do I see a notice in the directory above that.
The CI issues here will be resolved once #93611 has been merged into main (and then main into this branch). (edit: ✅) double edit: Okay, actually, that did resolve one set of CI issues, but it has opened up a new set of CI issues. Looking at those now. 😬 triple edit: With the superhuman wisdom and patience of @mark-vieira, I think CI is finally going to be happy here. |
Otherwise the LLRC brings it in and we end up in jar-hell.
smash-my-computer.gif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -149,6 +149,8 @@ static PermissionCollection getRestrictedPermissions() { | |||
} catch (IOException e) { | |||
throw new UncheckedIOException(e); | |||
} | |||
// log4j needs to be able to check the environment | |||
perms.add(new RuntimePermission("getenv.*")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there specific environment variables that log4j needs which we can give access to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see log4j.unbox.ringbuffer.size
, log4j2.StatusLogger.level
, log4j2.status.entries
, log4j2.StatusLogger.DateFormat
, and log4j2.StatusLogger.properties
, but I'm guessing there are more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell though, we can't do wildcards like "getenv.log4j*"
. Is there a lot of risk in just allowing "getenv.*"
? I'm worried we'd miss something if we listed them all out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, within the bowels of log4j2, they're calling System.getenv().containsKey(key);
. That System.getenv()
needs getenv.*
. So even if we did have the list of all the keys they're going to access, it still wouldn't be a workaround as an allowlist because of the way they're calling into it.
Or at least that's how it looks to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need this doPrivileged block here? I'm not really familiar with why it exists. Can we just move the permissions (not including getenv.*
) into the module's plugin-security.policy file, and call this like an ordinary method? I tried that out and it seems to work. I'm not sure if this is just old code that predates the plugin-security.policy work, or if there is something really different it is trying to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of all the permissions here are to limit what Tika has access to. Parsing code is notorious for (unintentionally) allowing shady things to occur. Instead of running with permissions of the server (which all plugins get), we drop down to no permissions, and add the explicit ones Tika needs. I think we should try to avoid allowing access to all env vars. From the stack traces Joe showed me, it looks like we need some slf4j initialization such that calls can be interjected with doPrivileged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just slf4j though -- some parts of tika call log4j directly. Or maybe I misunderstood what you said?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our log4j has (or should have) the appropriate doPrivileged calls so that any logging calls will work. The issue here is that slf4j tries to do some privileged operations (in the bridge for log4j). My suggestion is to make those slf4j operations work in the access context of the ingest attachment plugin, instead of the reduced permission context of Tika.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something's not working as expected then. We're seeing this:
java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "getenv.*")
at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)
at java.base/java.security.AccessController.checkPermission(AccessController.java:1068)
at java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:411)
at java.base/java.lang.System.getenv(System.java:1198)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.util.EnvironmentPropertySource.containsProperty(EnvironmentPropertySource.java:99)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.util.PropertiesUtil$Environment.get(PropertiesUtil.java:513)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.util.PropertiesUtil$Environment.access$200(PropertiesUtil.java:434)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.util.PropertiesUtil.getStringProperty(PropertiesUtil.java:382)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.util.PropertiesUtil.getBooleanProperty(PropertiesUtil.java:169)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.status.StatusLogger.isDebugPropertyEnabled(StatusLogger.java:143)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.status.StatusLogger.isEnabled(StatusLogger.java:426)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.status.StatusLogger.isEnabled(StatusLogger.java:354)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1914)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.spi.AbstractLogger.debug(AbstractLogger.java:463)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.appender.rolling.PatternProcessor.formatFileName(PatternProcessor.java:291)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.appender.rolling.PatternProcessor.formatFileName(PatternProcessor.java:278)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.appender.rolling.AbstractRolloverStrategy.getEligibleFiles(AbstractRolloverStrategy.java:94)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.appender.rolling.AbstractRolloverStrategy.getEligibleFiles(AbstractRolloverStrategy.java:87)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.appender.rolling.DefaultRolloverStrategy.rollover(DefaultRolloverStrategy.java:524)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.appender.rolling.RollingFileManager.rollover(RollingFileManager.java:504)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.appender.rolling.RollingFileManager.rollover(RollingFileManager.java:394)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.appender.rolling.RollingFileManager.checkRollover(RollingFileManager.java:308)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.appender.RollingFileAppender.append(RollingFileAppender.java:300)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:161)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:675)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:633)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:616)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:552)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82)
at org.apache.logging.log4j.core@8.5.2/org.apache.logging.log4j.core.Logger.log(Logger.java:161)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2106)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.internal.DefaultLogBuilder.logMessage(DefaultLogBuilder.java:234)
at org.apache.logging.log4j@2.18.0/org.apache.logging.log4j.internal.DefaultLogBuilder.log(DefaultLogBuilder.java:162)
at org.apache.poi.hdgf.chunks.Chunk.processCommands(Chunk.java:184)
at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:207)
at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:66)
at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:70)
at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:77)
at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:77)
at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:89)
at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:52)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:232)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:180)
at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:71)
at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:108)
at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:406)
at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedOLE(AbstractOOXMLExtractor.java:351)
at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedPart(AbstractOOXMLExtractor.java:267)
at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:217)
at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:138)
at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:242)
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:115)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:152)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:180)
at org.apache.tika.Tika.parseToString(Tika.java:525)
at org.elasticsearch.ingest.attachment.TikaImpl.lambda$parse$0(TikaImpl.java:97)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's coming from #91964 (and not what I saw before for this issue, which was slf4j related). We fixed that previously by forcing static init of the Unbox class, but it looks like it didn't stick.
I'm thinking now we do need another fix at a lower level in our log4j setup. I would like to avoid giving env access to all Tika processors. I created #93714 to track this.
Without this, slf4j says that "SLF4J: No SLF4J providers were found."
I've converted this back to a draft PR to indicate that although I might iterate here by adding commits, I'm explicitly not seeking a +1 from anybody at all anymore (at least right now!). There's still value here in this PR, some of which may come to a different PR new you sometime soon, but it might require more work to get to something mergeable than I originally expected. All of which is to say, feel free to unsubscribe or just ignore incoming notifications -- the appearance of new commits here is not a ping that "oh, perhaps with the addition of this new code now this is mergeable". Also I'd like to say how much I appreciate the reviews we're getting here and the work that people are putting in with me and @masseyke to get this issue resolved in the best way. 👍s all around. |
No change in status here -- we're still not doing this PR this way, but I merged main into this branch in order to get #93878 on board. The overall diff here remains "if we did this (this way) then here's what it would look like" but indeed we're still not going to do this (this way). |
Closes #91964
Rather than having some of Tika's dependencies logging over commons-logging, and some over an unconfigured slf4j (oops!), we've ripped all that out and set it up to route both commons-logging and slf4j over log4j.
Furthermore, in order for log4j to work correctly, it needs permission to access environment variables, so we've added that.
Finally, the default logging from some of these tika dependencies is too noisy, so we've quieted them down out of the box.