Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TracingRiverMarshallerFactory to diagnose StackOverflowError deserializing program.dat #239

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@
import org.apache.commons.io.FileUtils;

import org.jboss.marshalling.ByteInput;
import org.jboss.marshalling.reflect.SerializableClassRegistry;
import org.jboss.marshalling.river.RiverUnmarshaller;
import org.jenkinsci.plugins.scriptsecurity.sandbox.Whitelist;
import org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox;
import org.kohsuke.accmod.Restricted;
Expand Down Expand Up @@ -151,7 +153,7 @@ public ListenableFuture<Unmarshaller> restorePickles(Collection<ListenableFuture
config.setClassResolver(new SimpleClassResolver(classLoader));
//config.setSerializabilityChecker(new SerializabilityCheckerImpl());
config.setObjectResolver(combine(evr, ownerResolver));
Unmarshaller eu = new RiverMarshallerFactory().createUnmarshaller(config);
Unmarshaller eu = new TracingRiverMarshallerFactory().createUnmarshaller(config);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overridden methods will increase the stack depth by 1 for each nested object, so if issues like JENKINS-52966 are not due to cycles but just deeply nested objects because of the way Declarative is designed (which seems quite possible given user comments in that issue), this will make the problem worse. Could we somehow use the old code path by default, but then whenever we hit a StackOverflowError re-deserialize things with TracingRiverMarshallerFactory enabled? Or maybe only enable tracing if a system property is set?

Copy link
Member Author

@jglick jglick Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. If this PR does turn out to be capable of diagnosing an issue in this area, we would need to decide how to activate it. Besides the issue you mention, it will add some heap usage when loading normal programs.

eu.start(Marshalling.createByteInput(din));

final Unmarshaller sandboxed = new SandboxedUnmarshaller(eu);
Expand Down Expand Up @@ -373,4 +375,59 @@ private static <T> T sandbox(ReadSAM<T> lambda) throws ClassNotFoundException, I

}

/**
* Intercepts {@link StackOverflowError} and tries to record the chain of object classes which led to it.
*/
private static final class TracingRiverMarshallerFactory extends RiverMarshallerFactory {
@Override public Unmarshaller createUnmarshaller(MarshallingConfiguration configuration) throws IOException {
return new RiverUnmarshaller(this, SerializableClassRegistry.getInstance(), configuration) {
int depth = 0;
StackOverflowError err;
List<String> trace = new ArrayList<>();
@Override protected Object doReadNewObject(int streamClassType, boolean unshared, boolean discardMissing) throws ClassNotFoundException, IOException {
depth++;
Object o;
try {
o = super.doReadNewObject(streamClassType, unshared, discardMissing);
} catch (StackOverflowError x) {
// Will cause a StreamCorruptionError eventually, but not before we capture the parent objects:
o = null;
err = x;
Comment on lines +392 to +395
Copy link
Member

@dwnusbaum dwnusbaum Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From some local testing, I think swallowing the StackOverflowError like this also has the side effect of making the standard tracing mechanism in JBoss marshalling work, at least to some extent, because it seems to have around 100 fewer entries than the custom trace added in this PR. For example, along with the new trace I also see the following, which does not exist without the PR:

Caused by: an exception which occurred:
	in field com.cloudbees.groovy.cps.impl.BlockScopeEnv.types
	in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@2ffb8577
	in object of type com.cloudbees.groovy.cps.impl.BlockScopeEnv
	in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
	in object com.cloudbees.groovy.cps.impl.TryBlockEnv@64d0aa4f
	in object of type com.cloudbees.groovy.cps.impl.TryBlockEnv
	in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
	in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@17735b9f
	in object of type com.cloudbees.groovy.cps.impl.BlockScopeEnv
	in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
        ...

At one point I tried to augment the CPS Env objects to carry a SourceLocation that we could then use to augment JBoss Marshalling trace information to make it easier to understand where the objects are coming from, but I never managed to get things to work in a way that was useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that trace is due to insertion of null into a place where an Object is expected, so a secondary error that we intentionally generate as the price of being able to continue walking back up the stack and collecting returning values from readObject. AFAICT it tells you nothing about the primary StackOverflowError.

The basic problem is that given the protected methods available in RiverUnmarshaller (readFields does not count because one of its args is package-scope!) all we can do is record what readObject successfully returns, but we cannot intercept what class it is starting to load. So the only thing we can do is attempt to force hundreds of readObject stack frames to return normally, noting the class name being loaded in each, before going back and marking the whole deserialization as a failure. This is clearly inferior to what you would be able to get from, say, a parser event listener with some kind of startElement method.

} finally {
depth--;
}
if (o != null) {
trace.add(" ".repeat(depth) + o.getClass().getName());
}
return o;
}

@Override protected Object doReadObject(boolean unshared) throws ClassNotFoundException, IOException {
Object o;
try {
o = super.doReadObject(unshared);
} catch (ClassNotFoundException | IOException | RuntimeException | Error x) {
if (err != null) {
dumpTrace();
x.addSuppressed(err);
}
throw x;
}
if (err != null) {
dumpTrace();
throw err;
}
trace.clear();
return o;
}

void dumpTrace() {
for (String line : trace) {
LOGGER.log(Level.WARNING, "StackOverflowError trace: {0}", line);
}
}
};
}
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
/*
* The MIT License
*
* Copyright 2023 CloudBees, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/

package org.jenkinsci.plugins.workflow.support.pickles.serialization;

import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.hasItem;

import hudson.model.Result;
import java.util.logging.Level;
import org.jenkinsci.plugins.workflow.cps.CpsFlowDefinition;
import org.jenkinsci.plugins.workflow.job.WorkflowJob;
import org.jenkinsci.plugins.workflow.job.WorkflowRun;
import org.jenkinsci.plugins.workflow.test.steps.SemaphoreStep;
import org.junit.ClassRule;
import org.junit.Rule;
import org.junit.Test;
import org.jvnet.hudson.test.BuildWatcher;
import org.jvnet.hudson.test.JenkinsSessionRule;
import org.jvnet.hudson.test.LoggerRule;

public final class RiverReaderTest {

@Rule public final JenkinsSessionRule rr = new JenkinsSessionRule();
@Rule public final LoggerRule logging = new LoggerRule().record(RiverReader.class, Level.FINE).capture(2000);
@ClassRule public static final BuildWatcher bw = new BuildWatcher();

@Test public void stackOverflow() throws Throwable {
rr.then(r -> {
WorkflowJob p = r.createProject(WorkflowJob.class, "p");
p.setDefinition(new CpsFlowDefinition("class R {Object f}; def x = new R(); for (int i = 0; i < 1000; i++) {def y = new R(); y.f = x; x = y}; semaphore 'wait'", true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test fails for me locally because there is no StackOverflowError. It seems the minimum number of iterations needed in my environment is around 1774, so maybe bump this?

Suggested change
p.setDefinition(new CpsFlowDefinition("class R {Object f}; def x = new R(); for (int i = 0; i < 1000; i++) {def y = new R(); y.f = x; x = y}; semaphore 'wait'", true));
p.setDefinition(new CpsFlowDefinition("class R {Object f}; def x = new R(); for (int i = 0; i < 2000; i++) {def y = new R(); y.f = x; x = y}; semaphore 'wait'", true));

There's kind of a fine line though, because if you go much farther then I start to get a StackOverflowError during serialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I had to tune the number to get a SOE during deser but not ser. Not sure how to make a test of this reliable. and maybe it should just be @Ignored and left there for use when experimenting.

SemaphoreStep.waitForStart("wait/1", p.scheduleBuild2(0).getStartCondition().get());
});
rr.then(r -> {
WorkflowRun b = r.jenkins.getItemByFullName("p", WorkflowJob.class).getBuildByNumber(1);
SemaphoreStep.success("wait/1", null);
r.assertBuildStatus(Result.FAILURE, r.waitForCompletion(b));
r.assertLogContains("java.lang.StackOverflowError", b);
r.assertLogContains("at org.jboss.marshalling.river.RiverUnmarshaller.doReadNewObject", b);
assertThat(logging.getMessages(), hasItem("StackOverflowError trace: R"));
});
}

}