Whitelists are not loaded correctly in driver based tests #5914

ulrikrasmussen · 2020-01-31T11:49:24Z

The problem

After an upgrade to Corda 4.3 from 4.0, my driver based tests started to fail because the nodes were unable to find classes to be serialized on the whitelist. However, they are present on a whitelist published as a META-INF.service in the CorDapp.

An investigation into the problem revealed that, when running a node via the driver based test framework, all CorDapps are loaded with an empty list of whitelists. The problem is in the function findWhitelists()in the file JarScanningCordappLoader.kt. It extends the classpath with the jar of the CorDapp that has been deployed on the node and then proceeds to create a list of instances of all classes implementing SerializationWhitelist that can be found on the extended classpath, filtering out the instances whose class does not come from the deployed CorDapp jar.

The code seems to hinge on the assumption that all SerializationWhitelist services on the classpath will actually be loaded, but this is not true. The documentation of ServiceLoader includes the following paragraph:

If a particular concrete provider class is named in more than one configuration file, or is named in the same configuration file more than once, then the duplicates are ignored. The configuration file naming a particular provider need not be in the same jar file or other distribution unit as the provider itself. The provider must be accessible from the same class loader that was initially queried to locate the configuration file; note that this is not necessarily the class loader from which the file was actually loaded.

Since the driver framework starts nodes with a classpath that includes the build paths of the project, and because parent classloaders are visited first, ServiceLoader will instantiate the whitelists from the wrong location, and the instances in the deployed jar will never be created because they have the same package and class names.

This happens regardless of the value of startNodesInProcess in the driver parameters.

To reproduce in Corda 4.3

The bug is reproduced here: https://github.com/ulrikrasmussen/reproduce-whitelist-bug

Run the driver based test and observe that it fails. However, the whitelist class is actually instantiated, and I have inserted logging code in the constructor which tells you the class location of the instantiated whitelist and the stack trace. This is printed to stdout on the nodes:

[INFO] 12:39:35,356 [main] cordapp.JarScanningCordappLoader. - Looking for CorDapps in [/home/ulrik/deon/dev/corda-whitelist-bug/cordapp-template-kotlin/workflows/build/node-driver/20200131-113853.736/BankA/cordapps]
[INFO] 12:39:35,376 [main] cordapp.JarScanningCordappLoader. - Loading CorDapps from RestrictedURL(url=file:/home/ulrik/deon/dev/corda-whitelist-bug/cordapp-template-kotlin/workflows/build/node-driver/20200131-113853.736/BankA/cordapps/workflows-0.1.jar, rootPackageName=null), RestrictedURL(url=file:/home/ulrik/deon/dev/corda-whitelist-bug/cordapp-template-kotlin/workflows/build/no
de-driver/20200131-113853.736/BankA/cordapps/contracts-0.1.jar, rootPackageName=null)
[INFO] 12:39:35,411 [main] cordapp.JarScanningCordappLoader. - Scanning CorDapp in file:/home/ulrik/deon/dev/corda-whitelist-bug/cordapp-template-kotlin/workflows/build/node-driver/20200131-113853.736/BankA/cordapps/workflows-0.1.jar
[INFO] 12:39:35,556 [main] flows.MyWhitelist. - Whitelist instantiated.
        location=file:/home/ulrik/deon/dev/corda-whitelist-bug/cordapp-template-kotlin/workflows/build/classes/kotlin/main/
        stacktrace:
                java.lang.Thread.getStackTrace(Thread.java:1559)
                com.template.flows.MyWhitelist.<init>(MyWhitelist.kt:13)
                sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
                sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                java.lang.reflect.Constructor.newInstance(Constructor.java:423)
                java.lang.Class.newInstance(Class.java:442)
                java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
                java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
                java.util.ServiceLoader$1.next(ServiceLoader.java:480)
                kotlin.collections.CollectionsKt___CollectionsKt.toCollection(_Collections.kt:1062)
                kotlin.collections.CollectionsKt___CollectionsKt.toMutableList(_Collections.kt:1095)
                kotlin.collections.CollectionsKt___CollectionsKt.toList(_Collections.kt:1086)
                net.corda.node.internal.cordapp.JarScanningCordappLoader.findWhitelists(JarScanningCordappLoader.kt:272)
                net.corda.node.internal.cordapp.JarScanningCordappLoader.toCordapp(JarScanningCordappLoader.kt:147)
                net.corda.node.internal.cordapp.JarScanningCordappLoader.loadCordapps(JarScanningCordappLoader.kt:106)
                [... snip ...]

Note that the location of the instantiated class is from the build output, and not the CorDapp jar. Only this instance is created.

Further digging: Why did it work in Corda 4.0?

AMQPSerializationScheme used to also invoke the ServiceLoader to find whitelists, which would skip verification of the class location and load the whitelist from the build directory anyway. This has probably masked the issue, and is why the problem did not occur in Corda 4.0. The behavior was removed in #5112, although the description in this PR does not indicate that this change of behavior was intentional.

Is it fixed in Corda 4.4?

Only partially. Tested at version 1380779.

The easiest way to test this was to apply the patch in the bottom of this issue to the simm-valuation-demo and observe the location of the instantiated whitelist in stdout when running the driver based test. I ran the driver based test with startNodesInProcess equal to true and false, respectively.:

startNodesInProcess = false: The loaded whitelist is the correct one from the deployed jar, so this case seems to have been fixed.
startNodesInProcess = true: The incorrect whitelist from the build output is loaded. Also, custom serializers cannot be found, so the test fails for other reasons before it can get to the whitelist issue. This is probably a related bug that also has to be fixed.

Debugging Patch:

diff --git a/samples/simm-valuation-demo/contracts-states/src/main/kotlin/net/corda/vega/plugin/SimmContractsPluginRegistry.kt b/samples/simm-valuation-demo/contracts-states/src/main/kotlin/net/corda/vega/plugin/SimmContractsPluginRegistry.kt
index 1e6abdfd0f..e4f7c2fd72 100644
--- a/samples/simm-valuation-demo/contracts-states/src/main/kotlin/net/corda/vega/plugin/SimmContractsPluginRegistry.kt
+++ b/samples/simm-valuation-demo/contracts-states/src/main/kotlin/net/corda/vega/plugin/SimmContractsPluginRegistry.kt
@@ -11,10 +11,13 @@ import com.opengamma.strata.market.param.CurrencyParameterSensitivities
 import com.opengamma.strata.market.param.CurrencyParameterSensitivity
 import com.opengamma.strata.market.param.TenorDateParameterMetadata
 import com.opengamma.strata.market.param.ParameterMetadata
+import net.corda.core.internal.location
 import net.corda.core.serialization.SerializationWhitelist
 import net.corda.vega.analytics.CordaMarketData
 import net.corda.vega.analytics.InitialMarginTriple
+import org.slf4j.LoggerFactory
 
+private val log = LoggerFactory.getLogger(SimmContractsPluginRegistry::class.java)
 /**
  * [SimmService] is the object that makes available the flows and services for the Simm agreement / evaluation flow.
  * It is loaded via discovery - see [SerializationWhitelist].
@@ -22,6 +25,11 @@ import net.corda.vega.analytics.InitialMarginTriple
  * It is loaded via discovery see [WebServerPluginRegistry].
  */
 class SimmContractsPluginRegistry : SerializationWhitelist {
+    init {
+        val cp = SimmContractsPluginRegistry::class.java.location
+        log.info("Whitelist instantiated.\n\tlocation=$cp\n\tstacktrace:\n\t\t${Thread.currentThread().stackTrace.joinToString("\n\t\t")}")
+    }
+
     override val whitelist = listOf(
             MultiCurrencyAmount::class.java,
             Ordering.natural<Comparable<Any>>().javaClass,

The text was updated successfully, but these errors were encountered:

rick-r3 · 2020-01-31T14:09:42Z

Filed as https://r3-cev.atlassian.net/browse/CORDA-3566.

Thanks for the excellent investigation / analysis.

ulrikrasmussen · 2020-02-05T12:11:36Z

Thanks for taking a look at this!

From Jira (which I can't seem to add comments to):

Chris Rankin
All very interesting. Basically, I fixed this for “out of process” nodes in 4.4 as part of my DJVM integration work. I deliberately did not fix this for “in-process” nodes, partially because I always wanted to test the DJVM with “out-of-process” nodes, but also because I was under the impression that “in-process” nodes are a very poor test of anything and that this mode needed to DIE.

Back-porting some of this work from 4.4 to 4.3 is (probably) easy enough. Do we care about in-process nodes though? If so, getting them working before 4.4 is released is presumably more
important from 4.3.

If there are good arguments for why in-process mode does not model reflect the environment of a real deployment, then I guess it can be removed. What are the main reasons that you think they don't do that?

I think that even if they don't provide a good way to run integration tests on CI, in-process tests are still valuable for us for the following reasons:

They use fewer resources and are faster to run. I prefer to run tests in-process on my development laptop for this reason.
They are far easier to debug, because I can just set a breakpoint anywhere and run the test in debugging mode, which will cause execution to halt for the first node that reaches the breakpoint. I am sure it is possible to set breakpoints in the separate process mode as well, but I currently have no idea how to do so.

If the in-process mode is removed, then I think the documentation should at least be updated with more details on how to debug failures.

chrisr3 · 2020-02-11T23:23:22Z

If there are good arguments for why in-process mode does not model reflect the environment of a real deployment, then I guess it can be removed. What are the main reasons that you think they don't do that?

As you have observed, when nodes are run "in-process" then classes do not always end up being sourced from the classloaders that you expect. But having said that, we do appear to have regressed Corda from 4.0 here, and that isn't good either.

You mentioned that your issue was at least fixed for "out-of-process" nodes in Corda 4.4. Can you try applying this diff for 4 4 to net.corda.testing.node.internal.DriverDSLImpl.kt please, to see if it fixes "in-process" nodes for you too?

@@ -142,11 +142,7 @@ class DriverDSLImpl(
    private lateinit var _notaries: CordaFuture<List<NotaryHandle>>
    override val notaryHandles: List<NotaryHandle> get() = _notaries.getOrThrow()

-    override val cordappsClassLoader: ClassLoader? = if (!startNodesInProcess) {
-        createCordappsClassLoader(cordappsForAllNodes)
-    } else {
-        null
-    }
+    override val cordappsClassLoader = createCordappsClassLoader(cordappsForAllNodes)

    interface Waitable {
        @Throws(InterruptedException::class)

ulrikrasmussen · 2020-02-13T07:28:59Z

@chrisr3
Sure! Unfortunately, it didn't seem to fix the problem. I ran the simm-valuation-demo test scenario that I described above, with startNodesInProcess=true, and tried it with and without your patch.

Before applying the patch, I see, as expected, that the Whitelist is instantiated several times, but each time from the wrong location (the build output rather than the node directory):

[INFO] 07:44:04,163 [driver-pool-thread-1] plugin.SimmContractsPluginRegistry. - Whitelist instantiated.
	location=file:/home/ulrik/deon/corda/samples/simm-valuation-demo/contracts-states/build/libs/contracts-states-4.4-SNAPSHOT-tiny-signed.jar

But after applying the pathc, I get the same result:

[INFO] 08:09:45,540 [Time-limited test] plugin.SimmContractsPluginRegistry. - Whitelist instantiated.
	location=file:/home/ulrik/deon/corda/samples/simm-valuation-demo/contracts-states/build/libs/contracts-states-4.4-SNAPSHOT-tiny-signed.jar

The reason is that even though a class loader is now created via DriverDSLImpl::createCordappsClassLoader, the URIs of the jars added to this classloader are derived from the current classpath, resulting in the same classpath. In DriverDSLImpl.kt:

        private fun createCordappsClassLoader(cordapps: Collection<TestCordappInternal>?): ClassLoader? {
            if (cordapps == null || cordapps.isEmpty()) {
                return null
            }
            return URLClassLoader(cordapps.map { it.jarFile.toUri().toURL() }.toTypedArray())
        }

each object in cordapps is an instance of TestCordappImpl which implements the jarFile getter by scanning the classpath.

chrisr3 · 2020-02-13T09:11:22Z

@ulrikrasmussen To be fair, I didn't expect that my diff would change where "in process" nodes would instantiate their whitelists from. Such a feat would probably require such a radical replumbing of how the driver uses ClassLoaders that it would ultimately be equivalent to using "out of process" nodes anyway.

What I was actually wondering was whether my diff would allow the failing driver-based tests from your original report to pass again.

ulrikrasmussen · 2020-02-14T13:57:43Z

Ah, sorry, I think I misunderstood you then. I'll try it out and will get back to you beginning of next week.

chrisr3 · 2020-02-19T14:08:40Z

This change has just been merged into the release/os/4.4 branch.

ulrikrasmussen · 2020-02-25T13:24:18Z

Sorry for taking so long to get back to you---I was suddenly sidetracked by something else at work. I wanted to test out your patch on my original problem, but I think we have been talking past each other. My original problem was that tests failed after an upgrade to Corda 4.3, but I can't apply your patch to that version of Corda (there is no cordappsClassLoader property on the InternalDriverDSL interface in that version).

chrisr3 · 2020-02-25T13:33:21Z

Yes, but you also reported that the problem was "partially fixed" in Corda 4.4. I was therefore wondering if the problem in Corda 4.4 is now fixed completely.

ulrikrasmussen · 2020-02-25T13:46:51Z

Yes, but that conclusion was based on just detecting the symptom of Whitelists not being instantiated from the correct jar, which I did by patching an existing test in the Corda repo instead of spending time upgrading my code to Corda 4.4. I would have to upgrade my own project to a release candidate of 4.4 to test if that makes my actual test pass. Is your patch included in the latest RC?

chrisr3 · 2020-02-27T08:40:30Z

Yes, this change has been included in tag release-os-4.4-RC03.

ulrikrasmussen · 2020-04-17T08:28:02Z

@chrisr3 I now finally got around to upgrading our code to Corda 4.4. It seems to be working for driver based tests on 4.4, but there is now a similar issue for MockNetwork tests. See corda/samples#105

chrisr3 · 2020-04-17T08:37:25Z

Hi @ulrikrasmussen, I think I fixed this in CORDA-3643, which should be included in Corda 4.5 😃 .

ulrikrasmussen mentioned this issue Feb 3, 2020

Add support for programmatic CorDapp deployment and serialization #5921

Closed

davidrapacchiale added Triaged bug labels Feb 4, 2020

MikkelThomsen mentioned this issue Apr 17, 2020

Serialization of whitelisted custom data classes without annotation in MockNetwork corda/samples#105

Open

davidrapacchiale closed this as completed May 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whitelists are not loaded correctly in driver based tests #5914

Whitelists are not loaded correctly in driver based tests #5914

ulrikrasmussen commented Jan 31, 2020

rick-r3 commented Jan 31, 2020

ulrikrasmussen commented Feb 5, 2020

chrisr3 commented Feb 11, 2020

ulrikrasmussen commented Feb 13, 2020

chrisr3 commented Feb 13, 2020 •

edited

Loading

ulrikrasmussen commented Feb 14, 2020

chrisr3 commented Feb 19, 2020

ulrikrasmussen commented Feb 25, 2020

chrisr3 commented Feb 25, 2020

ulrikrasmussen commented Feb 25, 2020 •

edited

Loading

chrisr3 commented Feb 27, 2020

ulrikrasmussen commented Apr 17, 2020

chrisr3 commented Apr 17, 2020

Whitelists are not loaded correctly in driver based tests #5914

Whitelists are not loaded correctly in driver based tests #5914

Comments

ulrikrasmussen commented Jan 31, 2020

The problem

To reproduce in Corda 4.3

Further digging: Why did it work in Corda 4.0?

Is it fixed in Corda 4.4?

rick-r3 commented Jan 31, 2020

ulrikrasmussen commented Feb 5, 2020

chrisr3 commented Feb 11, 2020

ulrikrasmussen commented Feb 13, 2020

chrisr3 commented Feb 13, 2020 • edited Loading

ulrikrasmussen commented Feb 14, 2020

chrisr3 commented Feb 19, 2020

ulrikrasmussen commented Feb 25, 2020

chrisr3 commented Feb 25, 2020

ulrikrasmussen commented Feb 25, 2020 • edited Loading

chrisr3 commented Feb 27, 2020

ulrikrasmussen commented Apr 17, 2020

chrisr3 commented Apr 17, 2020

chrisr3 commented Feb 13, 2020 •

edited

Loading

ulrikrasmussen commented Feb 25, 2020 •

edited

Loading