Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Overhead of SecurityManager #134

Open
justinas-dabravolskas opened this issue Apr 11, 2018 · 17 comments · May be fixed by #149
Open

Overhead of SecurityManager #134

justinas-dabravolskas opened this issue Apr 11, 2018 · 17 comments · May be fixed by #149

Comments

@justinas-dabravolskas
Copy link

Running javac(not limited to) inside nailgun process has very high contention caused by security checks, in our case mostly:

java.io.UnixFileSystem.canonicalize0(Native Method)
java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:172)
java.io.File.getCanonicalPath(File.java:618)
java.io.FilePermission$1.run(FilePermission.java:215)
java.io.FilePermission$1.run(FilePermission.java:203)
java.security.AccessController.doPrivileged(Native Method)
java.io.FilePermission.init(FilePermission.java:203)
java.io.FilePermission.<init>(FilePermission.java:277)
sun.net.www.protocol.file.FileURLConnection.getPermission(FileURLConnection.java:225)
sun.misc.URLClassPath.check(URLClassPath.java:604)

We iterated over several solutions to this and settled on a custom java agent that ‘disables’ security manager. This keeps nailgun protocol intact, but in our case improves compilation speed within pantsbuild from 75 to 12 minutes.

package com.r9.nailgun;

import java.io.IOException;
import java.lang.instrument.ClassDefinition;
import java.lang.instrument.Instrumentation;
import java.lang.instrument.UnmodifiableClassException;
import java.net.JarURLConnection;

import javassist.ClassPool;
import javassist.CtClass;
import javassist.CtMethod;

/**
 * This class disables security manager without breaking nailgun protocol that depends on it.
 * Implementation notes:
 * Can't use transformer cause classes are already loaded.
 * Can't add fields and methods or mark fields public with redefinition.
 * SecurityManager is not accessible with reflection.
 * Tested with nailgun 0.9.1 and java 8, should be compatible with nailgun 0.9.3 (and java 9 ???)
 * Warning: This agent disables security manager. Use on your own risk.
 * 
 * @author Justinas Dabravolskas
 * @author Darius Prakaitis
 *
 */
public class ExitAgent {

    // This is a storage for securityManager that is used in Runtime.exit
    public static volatile java.lang.SecurityManager exitSecurity;

    public static void premain(String agentArgs, Instrumentation inst) {
        try {

            // make ExitAgent accessible to application, we probably load the second instance in
            // different class loader but who cares
            JarURLConnection connection = (JarURLConnection) ExitAgent.class.getClassLoader().getResource("com/r9/nailgun/ExitAgent.class").openConnection();
            inst.appendToBootstrapClassLoaderSearch(connection.getJarFile());

            inst.redefineClasses(new ClassDefinition(Class.forName("java.lang.System"), getSystemByteCode()));

            inst.redefineClasses(new ClassDefinition(Class.forName("java.lang.Runtime"), getRuntimeByteCode()));

        } catch (ClassNotFoundException e1) {
            e1.printStackTrace();
            throw new RuntimeException(e1);
        } catch (UnmodifiableClassException e1) {
            e1.printStackTrace();
            throw new RuntimeException(e1);
        } catch (IOException e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        }
    }

    private static byte[] getSystemByteCode() {
        try {
            ClassPool cp = ClassPool.getDefault();
            CtClass cc = cp.get("java.lang.System");
            // don't access volatile field at all
            CtMethod getSecurityManager = cc.getDeclaredMethod("getSecurityManager", new CtClass[] {});
            getSecurityManager.setBody("{return null;}");

            CtMethod setSecurityManager = cc.getDeclaredMethod("setSecurityManager", new CtClass[] { cp.get("java.lang.SecurityManager") });
            // make security manager available to Runtime.exit only
            setSecurityManager.setBody("{com.r9.nailgun.ExitAgent.exitSecurity=$1;}");
            byte[] byteCode = cc.toBytecode();
            cc.detach();
            return byteCode;
        } catch (Exception ex) {
            ex.printStackTrace();
            throw new RuntimeException(ex);
        }
    }

    private static byte[] getRuntimeByteCode() {
        try {
            ClassPool cp = ClassPool.getDefault();
            CtClass cc = cp.get("java.lang.Runtime");
            CtMethod m = cc.getDeclaredMethod("exit", new CtClass[] { CtClass.intType });
            m.setBody("{SecurityManager security = com.r9.nailgun.ExitAgent.exitSecurity; if(security != null){security.checkExit($1);} Shutdown.exit($1);}");

            byte[] byteCode = cc.toBytecode();
            cc.detach();
            return byteCode;
        } catch (Exception ex) {
            ex.printStackTrace();
            throw new RuntimeException(ex);
        }
    }

}
@sbalabanov-zz
Copy link
Contributor

That's interesting. Is it only Nailgun related and if such then why? Permission checks are done for all file operations under Oracle's HotSpot implementation of a Path/File provider, it has to do nothing with nailgun.

Also, if you use HotSpot then this approach probably violates Oracle's Java license as system code is essentially altered.

To come along with this problems on projects we use with Nailgun, we are considering following options:

  • Implementing a custom FileSystem/FileSystemProvider
  • Running on forked OpenJDK which has those checks off

@justinas-dabravolskas
Copy link
Author

Security checks are triggered just in case securityManager!=null http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/sun/misc/URLClassPath.java#URLClassPath.checkURL%28java.net.URL%29
Overhead is added even with NOOP security manager. Usually you don't run javac under security manager, but Nailgun sets one for exit trap, so this overhead is triggered specifically by Nailgun https://github.com/facebook/nailgun/blob/master/nailgun-server/src/main/java/com/martiansoftware/nailgun/NGServer.java

@sbalabanov-zz
Copy link
Contributor

SecurityManager is the only way to intercept termination that I know of. Without altering bytecode, the only way I see is to pass a parameter to Nailgun whether or not to intercept termination. But, with interception switched off, you can't use System.exit in nail's code as it would do what supposed - terminate jvm instead of terminating a nail. Thoughts?

@justinas-dabravolskas
Copy link
Author

The only option 'without side effects' I know is byte code manipulation. I've tried just disabling exit interception and it broke Nailgun protocol, I guess it can be done in more elegant way, but terminating jvm after nail's System.exit doesn't sound like an option.

@sbalabanov-zz
Copy link
Contributor

What I mean is that instead of calling System.exit() the nail should call some other function explicitly (like NGSession.exit() which yet to be implemented) which will make sure the protocol is not broken and terminate a nail but not jvm. A linter should validate what no System.exit() code sneaks in for nail's codebase.
We can consider doing that for Buck.
Are you saying the win is mostly for in-proc javac? How about regular class loader?

@jvican
Copy link
Contributor

jvican commented Apr 13, 2018

the nail should call some other function explicitly (like NGSession.exit())

Are nails not supposed to run ngSession.exit() instead of System.exit() directly? If that's the case and the security manager is disabled completely, would it break the invariants of the Nailgun protocol?

jvican added a commit to scalacenter/nailgun that referenced this issue Apr 13, 2018
The security manager installed by Nailgun seems to add a noticeable
overhead in IO operations.  facebookarchive#134

As a result, we don't install a security manager. Nailgun only uses it
to trap exit, and in our case we don't need to do that since Bloop
doesn't define any `nailShutdown`.
@sbalabanov-zz
Copy link
Contributor

ngSession.exit() does not really interrupt running nail and in fact does not work correctly; the problem is client terminates the socket and the server still listens to that in NGCommunicator which cause listening thread to throw and log an error. This needs to be processed gracefully.

@justinas-dabravolskas do you mind to share numbers about the optimization win, like how many classes were sent to javac to achieve above mentioned improvement?

@justinas-dabravolskas
Copy link
Author

javac for ~50k files takes ~75 minutes without agent and 10-15minutes with one (Xmx set for minimal gc overhead, tested on macOS). Anyways overhead of security manager should be linear and not limited to javac. Offtopic suggestion for running javac 1.8 (should not be an issue with java 9) inside nailgun: use -XDuseUnsharedTable. In our case it saves ~2G of heap without noticeable time overhead.

jvican added a commit to scalacenter/nailgun that referenced this issue May 7, 2018
The security manager installed by Nailgun seems to add a noticeable
overhead in IO operations.  facebookarchive#134

As a result, we don't install a security manager. Nailgun only uses it
to trap exit, and in our case we don't need to do that since Bloop
doesn't define any `nailShutdown`.
jvican added a commit to scalacenter/nailgun that referenced this issue Jul 3, 2018
The security manager installed by Nailgun seems to add a noticeable
overhead in IO operations.  facebookarchive#134

As a result, we don't install a security manager. Nailgun only uses it
to trap exit, and in our case we don't need to do that since Bloop
doesn't define any `nailShutdown`.
@retronym
Copy link

retronym commented Aug 28, 2018

An alternative to simply avoiding use of the security manager would change NGSecurityManager to override more methods from SecurityManager to route around the slow parts.

For instance, the default checkWrite implementation:

    public void checkWrite(String file) {
        checkPermission(new FilePermission(file,
            SecurityConstants.FILE_WRITE_ACTION));
    }

Calls new File(file).canonicalize during construction of FilePermission.

(I just came across this performance problem in the context of IntelliJ IDEA's use of Nailgun in its compile server for Scala)


I see now that #11 already implemented this idea for checkRead. I'll submit a PR to make this more comprehensive.

retronym added a commit to retronym/nailgun that referenced this issue Aug 28, 2018
Continues the work started in facebookarchive#11 by systematically overriding
all the other methods of SecurityManager for a fast path when
the base security manager is null.

Fixes facebookarchive#134
@retronym retronym linked a pull request Aug 28, 2018 that will close this issue
niktrop pushed a commit to niktrop/nailgun that referenced this issue Aug 30, 2018
Continues the work started in facebookarchive#11 by systematically overriding
all the other methods of SecurityManager for a fast path when
the base security manager is null.

Fixes facebookarchive#134
@RobertDeRose
Copy link

This might be unrelated, as my knowledge of the SecurityManager protocol is about ZERO. However, while trying to embed Nailgun into a project, the fact that Nailgun overtakes the SecurityManager makes it so the parent application is unable to exit via System.exit as it throws an NGExitException.

My only workaround thus far has been to reset the SecurityManager back to the base after starting the NailgunServer. Unknown to me at this time what side-effects that has.

There is another issue about a lack of documentation, if this issue could result in some more documentation for the purpose of the NGSecurityManager and how to properly embed it, that would be awesome.

tbak pushed a commit to tbak/titus-control-plane that referenced this issue Feb 28, 2020
+ do not run in the security manager which add 30sec delay on startup due to slowness in
java.io.UnixFileSystem.canonicalize0
(more detailed analysis available here: facebookarchive/nailgun#134)
tbak pushed a commit to tbak/titus-control-plane that referenced this issue Feb 28, 2020
+ do not run in the security manager which add 30sec delay on startup due to slowness in
java.io.UnixFileSystem.canonicalize0
(more detailed analysis available here: facebookarchive/nailgun#134)
tbak pushed a commit to Netflix/titus-control-plane that referenced this issue Mar 5, 2020
+ do not run in the security manager which add 30sec delay on startup due to slowness in
java.io.UnixFileSystem.canonicalize0
(more detailed analysis available here: facebookarchive/nailgun#134)
@unkarjedy
Copy link

unkarjedy commented Nov 15, 2021

Note that SecurityManager is deprecated since Java 17 and will be removed in future releases:
https://openjdk.java.net/jeps/411

To move the Java Platform forward, we will deprecate the legacy Security Manager technology for removal from the JDK. We plan to deprecate and attenuate the capabilities of the Security Manager over a number of releases, simultaneously creating alternative APIs for such tasks as blocking System::exit and other use cases considered important enough to have replacements.

Currently, NailgunServer produces a warning in stderr:

WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by %s
WARNING: Please consider reporting this to the maintainers of %s
WARNING: System::setSecurityManager will be removed in a future release

@kolotyluk
Copy link

kolotyluk commented Nov 28, 2021

So, this is a blocking issue for me because I am trying to run Scala 3 code under JDK18 with Project Loom.

I look forward to a fix so that I can use Scala 3...

https://users.scala-lang.org/t/scala-compile-server-exception/7972

@sbalabanov
Copy link
Contributor

@kolotyluk you are welcome to come up with the fix :)
so far no one in this thread suggested a working solution to intercept System.exit() calls without using SecurityManager.

@pjfanning
Copy link

I haven't seen a great deal of discussion online about how to replace the SecurityManager. This is one the best to read - https://inside.java/2021/04/23/security-and-sandboxing-post-securitymanager/ .

It seems like ASM or byte-buddy might be the best bets for changing the System.exit behaviour.

@unkarjedy
Copy link

Note that in Scala Plugin 2021.3 we added a cosmetic workaround:
We ignore this warning output and do not show this output as a notification to the user.

(IJ Scala Plugin Compile Server uses Nailgun under the hood)

@unkarjedy
Copy link

@kolotyluk

So, this is a blocking issue for me because I am trying to run Scala 3 code under JDK18 with Project Loom.
I look forward to a fix so that I can use Scala 3...

Could you please clarify in which way it "blocks" you?
Scala Compile Server can be used as before.
The only difference is that during startup users saw an error notification, which might frighten them.
The notification shouldn't affect Scala 3 (or Scala 2) experience in any way.

@stewert72
Copy link

With JDK 18 I get following Exception while starting the nailgun server. Switching back to JDK17 and it works again. Maybe that's related to this topic.

[root@xxx]# java -Dlog4j.properties=/opt/xxx/cfg/log4j.properties -cp /opt/xxx/lib/nailgun-server.jar:some.more.jar com.facebook.nailgun.NGServer
Exception in thread "NGServer(all addresses, port 2113)" java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release
at java.base/java.lang.System.setSecurityManager(System.java:416)
at com.facebook.nailgun.NGServer.run(NGServer.java:318)
at java.base/java.lang.Thread.run(Thread.java:833)
NGServer shut down.

[root@xxx]# java -version
openjdk version "18.0.1" 2022-04-19
OpenJDK Runtime Environment 22.3 (build 18.0.1+10)
OpenJDK 64-Bit Server VM 22.3 (build 18.0.1+10, mixed mode, sharing)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants