Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitelists are not loaded correctly in driver based tests #5914

Closed
ulrikrasmussen opened this issue Jan 31, 2020 · 13 comments
Closed

Whitelists are not loaded correctly in driver based tests #5914

ulrikrasmussen opened this issue Jan 31, 2020 · 13 comments

Comments

@ulrikrasmussen
Copy link
Contributor

The problem

After an upgrade to Corda 4.3 from 4.0, my driver based tests started to fail because the nodes were unable to find classes to be serialized on the whitelist. However, they are present on a whitelist published as a META-INF.service in the CorDapp.

An investigation into the problem revealed that, when running a node via the driver based test framework, all CorDapps are loaded with an empty list of whitelists. The problem is in the function findWhitelists()in the file JarScanningCordappLoader.kt. It extends the classpath with the jar of the CorDapp that has been deployed on the node and then proceeds to create a list of instances of all classes implementing SerializationWhitelist that can be found on the extended classpath, filtering out the instances whose class does not come from the deployed CorDapp jar.

The code seems to hinge on the assumption that all SerializationWhitelist services on the classpath will actually be loaded, but this is not true. The documentation of ServiceLoader includes the following paragraph:

If a particular concrete provider class is named in more than one configuration file, or is named in the same configuration file more than once, then the duplicates are ignored. The configuration file naming a particular provider need not be in the same jar file or other distribution unit as the provider itself. The provider must be accessible from the same class loader that was initially queried to locate the configuration file; note that this is not necessarily the class loader from which the file was actually loaded.

Since the driver framework starts nodes with a classpath that includes the build paths of the project, and because parent classloaders are visited first, ServiceLoader will instantiate the whitelists from the wrong location, and the instances in the deployed jar will never be created because they have the same package and class names.

This happens regardless of the value of startNodesInProcess in the driver parameters.

To reproduce in Corda 4.3

The bug is reproduced here: https://github.com/ulrikrasmussen/reproduce-whitelist-bug

Run the driver based test and observe that it fails. However, the whitelist class is actually instantiated, and I have inserted logging code in the constructor which tells you the class location of the instantiated whitelist and the stack trace. This is printed to stdout on the nodes:

[INFO] 12:39:35,356 [main] cordapp.JarScanningCordappLoader. - Looking for CorDapps in [/home/ulrik/deon/dev/corda-whitelist-bug/cordapp-template-kotlin/workflows/build/node-driver/20200131-113853.736/BankA/cordapps]
[INFO] 12:39:35,376 [main] cordapp.JarScanningCordappLoader. - Loading CorDapps from RestrictedURL(url=file:/home/ulrik/deon/dev/corda-whitelist-bug/cordapp-template-kotlin/workflows/build/node-driver/20200131-113853.736/BankA/cordapps/workflows-0.1.jar, rootPackageName=null), RestrictedURL(url=file:/home/ulrik/deon/dev/corda-whitelist-bug/cordapp-template-kotlin/workflows/build/no
de-driver/20200131-113853.736/BankA/cordapps/contracts-0.1.jar, rootPackageName=null)
[INFO] 12:39:35,411 [main] cordapp.JarScanningCordappLoader. - Scanning CorDapp in file:/home/ulrik/deon/dev/corda-whitelist-bug/cordapp-template-kotlin/workflows/build/node-driver/20200131-113853.736/BankA/cordapps/workflows-0.1.jar
[INFO] 12:39:35,556 [main] flows.MyWhitelist. - Whitelist instantiated.
        location=file:/home/ulrik/deon/dev/corda-whitelist-bug/cordapp-template-kotlin/workflows/build/classes/kotlin/main/
        stacktrace:
                java.lang.Thread.getStackTrace(Thread.java:1559)
                com.template.flows.MyWhitelist.<init>(MyWhitelist.kt:13)
                sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
                sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                java.lang.reflect.Constructor.newInstance(Constructor.java:423)
                java.lang.Class.newInstance(Class.java:442)
                java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
                java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
                java.util.ServiceLoader$1.next(ServiceLoader.java:480)
                kotlin.collections.CollectionsKt___CollectionsKt.toCollection(_Collections.kt:1062)
                kotlin.collections.CollectionsKt___CollectionsKt.toMutableList(_Collections.kt:1095)
                kotlin.collections.CollectionsKt___CollectionsKt.toList(_Collections.kt:1086)
                net.corda.node.internal.cordapp.JarScanningCordappLoader.findWhitelists(JarScanningCordappLoader.kt:272)
                net.corda.node.internal.cordapp.JarScanningCordappLoader.toCordapp(JarScanningCordappLoader.kt:147)
                net.corda.node.internal.cordapp.JarScanningCordappLoader.loadCordapps(JarScanningCordappLoader.kt:106)
                [... snip ...]

Note that the location of the instantiated class is from the build output, and not the CorDapp jar. Only this instance is created.

Further digging: Why did it work in Corda 4.0?

AMQPSerializationScheme used to also invoke the ServiceLoader to find whitelists, which would skip verification of the class location and load the whitelist from the build directory anyway. This has probably masked the issue, and is why the problem did not occur in Corda 4.0. The behavior was removed in #5112, although the description in this PR does not indicate that this change of behavior was intentional.

Is it fixed in Corda 4.4?

Only partially. Tested at version 1380779.

The easiest way to test this was to apply the patch in the bottom of this issue to the simm-valuation-demo and observe the location of the instantiated whitelist in stdout when running the driver based test. I ran the driver based test with startNodesInProcess equal to true and false, respectively.:

  • startNodesInProcess = false: The loaded whitelist is the correct one from the deployed jar, so this case seems to have been fixed.
  • startNodesInProcess = true: The incorrect whitelist from the build output is loaded. Also, custom serializers cannot be found, so the test fails for other reasons before it can get to the whitelist issue. This is probably a related bug that also has to be fixed.

Debugging Patch:

diff --git a/samples/simm-valuation-demo/contracts-states/src/main/kotlin/net/corda/vega/plugin/SimmContractsPluginRegistry.kt b/samples/simm-valuation-demo/contracts-states/src/main/kotlin/net/corda/vega/plugin/SimmContractsPluginRegistry.kt
index 1e6abdfd0f..e4f7c2fd72 100644
--- a/samples/simm-valuation-demo/contracts-states/src/main/kotlin/net/corda/vega/plugin/SimmContractsPluginRegistry.kt
+++ b/samples/simm-valuation-demo/contracts-states/src/main/kotlin/net/corda/vega/plugin/SimmContractsPluginRegistry.kt
@@ -11,10 +11,13 @@ import com.opengamma.strata.market.param.CurrencyParameterSensitivities
 import com.opengamma.strata.market.param.CurrencyParameterSensitivity
 import com.opengamma.strata.market.param.TenorDateParameterMetadata
 import com.opengamma.strata.market.param.ParameterMetadata
+import net.corda.core.internal.location
 import net.corda.core.serialization.SerializationWhitelist
 import net.corda.vega.analytics.CordaMarketData
 import net.corda.vega.analytics.InitialMarginTriple
+import org.slf4j.LoggerFactory
 
+private val log = LoggerFactory.getLogger(SimmContractsPluginRegistry::class.java)
 /**
  * [SimmService] is the object that makes available the flows and services for the Simm agreement / evaluation flow.
  * It is loaded via discovery - see [SerializationWhitelist].
@@ -22,6 +25,11 @@ import net.corda.vega.analytics.InitialMarginTriple
  * It is loaded via discovery see [WebServerPluginRegistry].
  */
 class SimmContractsPluginRegistry : SerializationWhitelist {
+    init {
+        val cp = SimmContractsPluginRegistry::class.java.location
+        log.info("Whitelist instantiated.\n\tlocation=$cp\n\tstacktrace:\n\t\t${Thread.currentThread().stackTrace.joinToString("\n\t\t")}")
+    }
+
     override val whitelist = listOf(
             MultiCurrencyAmount::class.java,
             Ordering.natural<Comparable<Any>>().javaClass,
@rick-r3
Copy link
Contributor

rick-r3 commented Jan 31, 2020

Filed as https://r3-cev.atlassian.net/browse/CORDA-3566.

Thanks for the excellent investigation / analysis.

@ulrikrasmussen
Copy link
Contributor Author

Thanks for taking a look at this!

From Jira (which I can't seem to add comments to):

Chris Rankin
All very interesting. Basically, I fixed this for “out of process” nodes in 4.4 as part of my DJVM integration work. I deliberately did not fix this for “in-process” nodes, partially because I always wanted to test the DJVM with “out-of-process” nodes, but also because I was under the impression that “in-process” nodes are a very poor test of anything and that this mode needed to DIE.

Back-porting some of this work from 4.4 to 4.3 is (probably) easy enough. Do we care about in-process nodes though? If so, getting them working before 4.4 is released is presumably more
important from 4.3.

If there are good arguments for why in-process mode does not model reflect the environment of a real deployment, then I guess it can be removed. What are the main reasons that you think they don't do that?

I think that even if they don't provide a good way to run integration tests on CI, in-process tests are still valuable for us for the following reasons:

  • They use fewer resources and are faster to run. I prefer to run tests in-process on my development laptop for this reason.
  • They are far easier to debug, because I can just set a breakpoint anywhere and run the test in debugging mode, which will cause execution to halt for the first node that reaches the breakpoint. I am sure it is possible to set breakpoints in the separate process mode as well, but I currently have no idea how to do so.

If the in-process mode is removed, then I think the documentation should at least be updated with more details on how to debug failures.

@chrisr3
Copy link
Contributor

chrisr3 commented Feb 11, 2020

If there are good arguments for why in-process mode does not model reflect the environment of a real deployment, then I guess it can be removed. What are the main reasons that you think they don't do that?

As you have observed, when nodes are run "in-process" then classes do not always end up being sourced from the classloaders that you expect. But having said that, we do appear to have regressed Corda from 4.0 here, and that isn't good either.

You mentioned that your issue was at least fixed for "out-of-process" nodes in Corda 4.4. Can you try applying this diff for 4 4 to net.corda.testing.node.internal.DriverDSLImpl.kt please, to see if it fixes "in-process" nodes for you too?

@@ -142,11 +142,7 @@ class DriverDSLImpl(
    private lateinit var _notaries: CordaFuture<List<NotaryHandle>>
    override val notaryHandles: List<NotaryHandle> get() = _notaries.getOrThrow()

-    override val cordappsClassLoader: ClassLoader? = if (!startNodesInProcess) {
-        createCordappsClassLoader(cordappsForAllNodes)
-    } else {
-        null
-    }
+    override val cordappsClassLoader = createCordappsClassLoader(cordappsForAllNodes)

    interface Waitable {
        @Throws(InterruptedException::class)

@ulrikrasmussen
Copy link
Contributor Author

@chrisr3
Sure! Unfortunately, it didn't seem to fix the problem. I ran the simm-valuation-demo test scenario that I described above, with startNodesInProcess=true, and tried it with and without your patch.

Before applying the patch, I see, as expected, that the Whitelist is instantiated several times, but each time from the wrong location (the build output rather than the node directory):

[INFO] 07:44:04,163 [driver-pool-thread-1] plugin.SimmContractsPluginRegistry. - Whitelist instantiated.
	location=file:/home/ulrik/deon/corda/samples/simm-valuation-demo/contracts-states/build/libs/contracts-states-4.4-SNAPSHOT-tiny-signed.jar

But after applying the pathc, I get the same result:

[INFO] 08:09:45,540 [Time-limited test] plugin.SimmContractsPluginRegistry. - Whitelist instantiated.
	location=file:/home/ulrik/deon/corda/samples/simm-valuation-demo/contracts-states/build/libs/contracts-states-4.4-SNAPSHOT-tiny-signed.jar

The reason is that even though a class loader is now created via DriverDSLImpl::createCordappsClassLoader, the URIs of the jars added to this classloader are derived from the current classpath, resulting in the same classpath. In DriverDSLImpl.kt:

        private fun createCordappsClassLoader(cordapps: Collection<TestCordappInternal>?): ClassLoader? {
            if (cordapps == null || cordapps.isEmpty()) {
                return null
            }
            return URLClassLoader(cordapps.map { it.jarFile.toUri().toURL() }.toTypedArray())
        }

each object in cordapps is an instance of TestCordappImpl which implements the jarFile getter by scanning the classpath.

@chrisr3
Copy link
Contributor

chrisr3 commented Feb 13, 2020

@ulrikrasmussen To be fair, I didn't expect that my diff would change where "in process" nodes would instantiate their whitelists from. Such a feat would probably require such a radical replumbing of how the driver uses ClassLoaders that it would ultimately be equivalent to using "out of process" nodes anyway.

What I was actually wondering was whether my diff would allow the failing driver-based tests from your original report to pass again.

@ulrikrasmussen
Copy link
Contributor Author

Ah, sorry, I think I misunderstood you then. I'll try it out and will get back to you beginning of next week.

@chrisr3
Copy link
Contributor

chrisr3 commented Feb 19, 2020

This change has just been merged into the release/os/4.4 branch.

@ulrikrasmussen
Copy link
Contributor Author

Sorry for taking so long to get back to you---I was suddenly sidetracked by something else at work. I wanted to test out your patch on my original problem, but I think we have been talking past each other. My original problem was that tests failed after an upgrade to Corda 4.3, but I can't apply your patch to that version of Corda (there is no cordappsClassLoader property on the InternalDriverDSL interface in that version).

@chrisr3
Copy link
Contributor

chrisr3 commented Feb 25, 2020

Yes, but you also reported that the problem was "partially fixed" in Corda 4.4. I was therefore wondering if the problem in Corda 4.4 is now fixed completely.

@ulrikrasmussen
Copy link
Contributor Author

ulrikrasmussen commented Feb 25, 2020

Yes, but that conclusion was based on just detecting the symptom of Whitelists not being instantiated from the correct jar, which I did by patching an existing test in the Corda repo instead of spending time upgrading my code to Corda 4.4. I would have to upgrade my own project to a release candidate of 4.4 to test if that makes my actual test pass. Is your patch included in the latest RC?

@chrisr3
Copy link
Contributor

chrisr3 commented Feb 27, 2020

Yes, this change has been included in tag release-os-4.4-RC03.

@ulrikrasmussen
Copy link
Contributor Author

@chrisr3 I now finally got around to upgrading our code to Corda 4.4. It seems to be working for driver based tests on 4.4, but there is now a similar issue for MockNetwork tests. See corda/samples#105

@chrisr3
Copy link
Contributor

chrisr3 commented Apr 17, 2020

Hi @ulrikrasmussen, I think I fixed this in CORDA-3643, which should be included in Corda 4.5 😃 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants