Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Replace Java Security Manager (JSM) #1687

Closed
1 task done
reta opened this issue Dec 9, 2021 · 109 comments
Closed
1 task done

[RFC] Replace Java Security Manager (JSM) #1687

reta opened this issue Dec 9, 2021 · 109 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Security Project-wide roadmap label security Anything security related v2.19.0 Issues and PRs related to version 2.19.0 v3.0.0 Issues and PRs related to version 3.0.0

Comments

@reta
Copy link
Collaborator

reta commented Dec 9, 2021

Is your feature request related to a problem? Please describe.
It has been announced a while ago that SecurityManager is going to be phased out from the JDK. The first step, the deprecation of the SecurityManager (JEP-411), has been landed in JDK 17 and issues the following warnings on OpenSearch builds or server startup:

WARNING: System::setSecurityManager will be removed in a future release

The JDK 18 pushes it even further and now fails on startup (see please https://bugs.openjdk.java.net/browse/JDK-8270380), running OpenSearch builds or server on JDK 18 EA fails with:

Caused by: java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release
	at java.base/java.lang.System.setSecurityManager(System.java:416)

It now requires JVM command line option to enable it explicitly using (see please [1]):

-Djava.security.manager=allow 

Describe the solution you'd like
There is no alternative or replacement for the SecurityManager (to understand why, Project Loom is to "blame"), see please [2]. One of the options is to just drop it, it sounds risky but combined with Plugin Sandbox (see please [3], [4]) it may sounds like a viable option. Other options include (but not limited to): bytecode instrumentation, java agent, custom classloader.

Describe alternatives you've considered
We could keep it as long as we can, but once removed from the JDK, it will be a problem.

Additional context
The upcoming JDK-24 release disables SecurityManager permanently [6].
See please links.

[1] https://inside.java/2021/12/06/quality-heads-up/
[2] https://inside.java/2021/04/23/security-and-sandboxing-post-securitymanager/
[3] #1572
[4] #1422
[5] A possible JEP to replace SecurityManager after JEP 411
[6] openjdk/jdk#21498

@reta reta added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 9, 2021
@dblock
Copy link
Member

dblock commented Dec 9, 2021

@nknize suggested we remove security manager in 2.0, labelling issue as such - once we have agreed here on what to do for this issue let's open a campaign parent issue in https://github.com/opensearch-project/opensearch-plugins/

@dblock dblock added v2.0.0 Version 2.0.0 and removed untriaged labels Dec 9, 2021
@reta
Copy link
Collaborator Author

reta commented Dec 9, 2021

@dblock would you mind if I submit a small patch for 1.3.x+ so it could be run on JDK 18? Thank you

PS: To clarify why, JDK 18 is scheduled to be released in March, right around 1.4.x (planned) release, I suspect a number of people may give it a try. The change is only adding the command line property, non breaking.

@dblock
Copy link
Member

dblock commented Dec 9, 2021

@dblock would you mind if I submit a small patch for 1.3.x+ so it could be run on JDK 18? Thank you

PS: To clarify why, JDK 18 is scheduled to be released in March, right around 1.4.x (planned) release, I suspect a number of people may give it a try. The change is only adding the command line property, non breaking.

I'm A-OK with anything non-breaking on 1.x.

@nknize
Copy link
Collaborator

nknize commented Dec 9, 2021

You mean something like adding support to disable the security manager via -Djava.security.manager=disable? (EDIT: I should've read past the first line :) )

I suspect tests will blow up since the test infrastructure leverages a custom SecurityManger via SecureSM. That's going to be more impactful. I'd love some thoughts from @rmuir or @uschindler on this as they are much closer to the JDK security bits than I.

@rmuir
Copy link
Contributor

rmuir commented Dec 10, 2021

I think the issue is written up correctly. You'll want to set -Djava.security.manager=allow from startup scripts (e.g. .bat/.sh), and from gradle when running tests? Otherwise System.setSecurityManager() will fail.

Lucene uses a custom security manager too, no issues on JDK18. we just initialize it differently than opensearch, right at JVM startup time: -Djava.security.manager=org.apache.lucene.util.TestSecurityManager.

But in your case here, it is a little different because system starts up with no security manager, then parses some config files and maybe does a few evil things on startup, then it installs security manager via System.setSecurityManager(). That's the difference, the deferred initialization. So now for JDK18 you have to set "allow" property for that call to not fail.

@rmuir
Copy link
Contributor

rmuir commented Dec 10, 2021

Separately, as far as alternatives, I can suggest a few things:

  1. Keep the SystemCallFilter. This is unrelated to security manager and will stop RCE dead in its tracks, as it disables fork()/exec() etc completely in an irreversible way.
  2. Look into enhancing the systemd unit to compensate. You can do a lot here, such as allow/block lists of filesystem paths, and more. Recommended introduction. Especially file paths would be great, if you have a directory traversal vulnerability, it is way better to fail with a filesystem error than to transfer some private files. But in addition to file paths, you can also do fancy stuff such as system-call filtering (except for fork/exec which is why you still need to keep part 1), capability drops, etc.
  3. consider hardening Docker environment too. current entrypoint just runs the shell script, maybe it could instead use the systemd unit, to also benefit from work already done above.
  4. adjust existing classloader filtering: example. The filtering-classloader currently integrates with security manager, just as a convenient way to provide a list of allowable classes, but it doesn't have to work this way. It can be changed to get its list of allowed classes some other way, and then things like scripting languages at least keep that protection.

I don't recommend directly going the LSM route (AppArmor, SELinux, etc). There's a lot of complexity to those, and its so system-specific which if any are even available. I'd start with systemd which is basically universal now on linux systems, and it gets you the biggest wins anyway (e.g. filtering filesystem and so on).

@rmuir
Copy link
Contributor

rmuir commented Dec 10, 2021

Another win for stuff like ingest-attachment would be to just run the tika server (separate service/container) and have this plugin call out to it with a REST call. IMO it would be better security for using tika and they provide such a server these days. Then the tika could run in its own stricter separate sandbox.

but that strategy won't work for all the code: There's no one-size/fits-all solution. For example, things like analysis modules/plugins are extremely performance sensitive, and really need to just be passed to IndexWriter. At the same time, these plugins have less security risk (compared to e.g. Tika or scripting languages), so it's not a huge deal: they are just exposing lucene analyzers :)

@reta
Copy link
Collaborator Author

reta commented Dec 10, 2021

Thank you very much, @rmuir

I think the issue is written up correctly. You'll want to set -Djava.security.manager=allow from startup scripts (e.g. .bat/.sh), and from gradle when running tests? Otherwise System.setSecurityManager() will fail.

That is right.

@rmuir
Copy link
Contributor

rmuir commented Dec 14, 2021

I've also made my opinion loudly clear on twitter that removing SecurityManager without replacement is a bad idea for java right now. At least providing a "replacement" first (ideally enabled by default), to help protect server-side apps against the worst vulnerabilities, is really needed. Java is filled with security landmines.

Doubt anything will change on the java side, but I tried. I don't have the resources/energy to write up JEP proposals or anything to try to make real change here though, sorry.

@reta
Copy link
Collaborator Author

reta commented Dec 14, 2021

Thanks @rmuir , I think the large part with respect to "what the replacement should be" is still unknown, as it is dictated by Project Loom that is not there yet. But I do 💯 agree on the point: removing SecurityManager without replacement is a bad idea.

@rmuir
Copy link
Contributor

rmuir commented Dec 15, 2021

if you think of the entire internet (not just opensearch), i really do feel that something similar to the openbsd pledge() api would be at least a minimal replacement. process-wide: drop permissions to fork/exec (RCE), maybe drop network connect() permissions to hosts you don't need, maybe drop permissions to file paths you don't need. In many cases, perhaps the OS can enforce the functionality, in other cases, maybe java needs to do it.

but there's also the separate problem that java includes insecure functionality like JDNI ("landmines"), by default. Besides sandboxing, we need to get good secure defaults here and disable dangerous crap by default. it is a multi-pronged approach.

@anasalkouz anasalkouz added v3.0.0 Issues and PRs related to version 3.0.0 and removed v2.0.0 Version 2.0.0 labels Apr 12, 2022
@Pallavi-AWS
Copy link
Member

Pallavi-AWS commented Apr 20, 2022

Do we have a decision on whether OpenSearch will deprecate SecurityManager in a future release or will command line option be used? If it will be deprecated, will there be a replacement? @dblock @nknize @rmuir. thanks,

@reta
Copy link
Collaborator Author

reta commented Apr 20, 2022

@Pallavi-AWS the recent (one of many) discussions on OpenJDK mailing list hint there won't be replacements for SecurityManager (very likely, at least) as well as there won't be suitable mechanisms provided for implementing your own. For JDK-18, we explicitly allow SecurityManager but there is no official decision being made on deprecation since no replacement is available.

[1] https://mail.openjdk.java.net/pipermail/security-dev/2022-April/029643.html

@rmuir
Copy link
Contributor

rmuir commented Apr 20, 2022

i recommend to keep using it until it completely stops working. why would you voluntarily disable a security feature unless you have to?

@nknize
Copy link
Collaborator

nknize commented Apr 20, 2022

Do we have a decision on whether OpenSearch will deprecate SecurityManager

It's already deprecated in the jdk and can be found in the build logs: WARNING: System::setSecurityManager will be removed in a future release.

will there be a replacement?

This is still being worked and there are already some great suggestions on this issue. In the meantime, we planned to keep using it until it stops working and will converge on a plan before upgrading to a jdk that removes it completely.

peternied added a commit to peternied/security that referenced this issue May 6, 2022
Use of the SecurityManager and AccessController have been deprecated and
will be removed in java versions after 17.  While this is an issue its
also one that will take a concerted effort to resolve.  These warning
messages making discovering build errors and other warnings more
difficult; hence adding this supression logic.

For tracking the effort to replace these components look into opensearch-project/OpenSearch#1687

Signed-off-by: Peter Nied <petern@amazon.com>
@dblock dblock changed the title [RFC] Consider alternatives to SecurityManager moving forward [RFC] Consider alternatives to SecurityManager (JSM) moving forward Nov 30, 2022
@dblock dblock changed the title [RFC] Consider alternatives to SecurityManager (JSM) moving forward [RFC] Remove Java Security Manager (JSM) Nov 30, 2022
@dblock dblock changed the title [RFC] Remove Java Security Manager (JSM) [RFC] Replace Java Security Manager (JSM) Nov 30, 2022
@reta
Copy link
Collaborator Author

reta commented Dec 29, 2024

Thanks @kumargu , I think the Java agent is also on the table, right? [1] Or it was excluded on purpose?

[1] #16731

@pfirmstone
Copy link

pfirmstone commented Dec 30, 2024

A few thoughts / questions:

Is there a way to avoid needing SecurityManager in the Graal guest environment?
If the guest environment is process isolated and that process can be restricted by systemd, then each plugin can be isolated within its own process. The problem then becomes one of establishing communications between the Host and Guest processes. I'm concerned that Serialization might be a requirement of communications between processes, or is this concern unfounded?

In JGDMS there's a declared @AtomicSerial API for serialization / deserialization, for use with any protocol, I was working on support for ASN.1, but halted work after JEP411, until a solution was found for SM. This API is hardened against gadget attacks by failure atomicity and provides utility methods for input validation.

JGDMS also has JERI (Jini Extensible Remote Invocation), which was designed by the people who designed RMI to address the pitfalls with RMI.

If someone wanted, these features could be copied from JGDMS (AL2.0 license), and stripped down to their bare minimum, to use for communications between Host and Guests. I can provide guidance on how it works.

As an aside, the fork of OpenJDK I'm currently maintaining with SM, contains significant performance enhancements and security improvements, if people would like to test and provide performance comparisons and feedback, that would be greatly appreciated. The maintenance cost has been less than expected and I've been able to make significant SM improvements in a short space of time. Whether I continue to maintain a fork is dependent on community interest and viability of other possible solutions.

Recent build artifacts based on fork of OpenJDK 25, master branch:

Linux x64: https://github.com/pfirmstone/jdk-with-authorization/actions/runs/12497991476/artifacts/2362229379
MacOS x64: https://github.com/pfirmstone/jdk-with-authorization/actions/runs/12497991476/artifacts/2362228554
Windows x64: https://github.com/pfirmstone/jdk-with-authorization/actions/runs/12497991476/artifacts/2362245599

There's also a OpenJDK 24 fork branch here:
https://github.com/pfirmstone/jdk-with-authorization/tree/jdk24-with-authorization-trunk

The use of a hybrid Graal Systemd solution is compelling. If the guest is to use encryption over network connections, I think that might need to be performed by the host, for the guest, as it's not safe for the guest to have access to encryption keys, etc. On second thoughts, maybe independent truststore/ keystore's could be provided for each guest?

@pfirmstone
Copy link

pfirmstone commented Dec 30, 2024

Just documenting my forking strategy here in case it has been misunderstood:

  1. Weekly merge of OpenJDK master, into jdk-with-authorization master that contains reversions of SM removal, tests are run manually following merging. Only minor changes are made to master copy, to address any merge conflicts or test failures. This is not intended for release. Weekends are quiet, not many commits occur over the weekend, I've found this is a good time to merge.
  2. Weekly merge of master copy into trunk.
  3. Trunk is the development branch.

There were a large number of merge conflicts during JEP 486, not unexpected.
Now that JEP 486 has completed, merge conflicts have been rare.
All merge conflicts are dealt with in the merge between OpenJDK master and master copy.
There are no merge conflicts from master copy into trunk.
Interestingly the OpenJDK team were maintaining permission checks right up until JEP 486.
Additional Permission's have been added to trunk.

Release branches follow the same strategy, so that all upstream fixes and patches are included with weekly merges.

Permission checks were like shotgun surgery, as they were spread throughout OpenJDK, it was a big job to remove them.

We have a discord channel if anyone wants to become involved, let me know.

The largest maintenance task isn't merging from upstream; it's looking at new JEP features and determining how they need to be protected by new permission checks.

Some recent fixes:
pfirmstone/jdk-with-authorization#44
pfirmstone/jdk-with-authorization#40
pfirmstone/jdk-with-authorization#41
pfirmstone/jdk-with-authorization#28
pfirmstone/jdk-with-authorization#22
pfirmstone/jdk-with-authorization#32
pfirmstone/jdk-with-authorization#5

@kumargu
Copy link
Contributor

kumargu commented Dec 30, 2024

Thanks @kumargu , I think the Java agent is also on the table, right? [1] Or it was excluded on purpose?

[1] #16731

I wanted to sync with you on the outcome of the PoC before including it here. I was not clear if the PoC was finally working end-to-end. Secondly, I wanted an opinion if we'd need it if we had the Graal integration.

@kumargu
Copy link
Contributor

kumargu commented Dec 30, 2024

@pfirmstone (going to answer some of your comments and will come back to others later)

Is there a way to avoid needing SecurityManager in the Graal guest environment?

this is a temporary hack. It won't be needed once oracle/graal#10239 is addressed.

I'm concerned that Serialization might be a requirement of communications between processes, or is this concern unfounded?

that's the biggest concern for in-proc communication between plugins and core (discussed as con in Option 3).

Just documenting my forking strategy here in case it has been misunderstood:

I don't think we/I misunderstood the intentions here. We understand the dedication and amount of work you have put in to get this working. The challenge with fork is not only maintainability. A. This is not a long term solution, if we have a long term solution (GraalVM), we would like to pursue it. B. Cloud providers (such as AWS) or other organizations consuming a fork has to be convinced of usage of forked JDK given Open JDK states that security manager is not the right tooling for securing Java applications (although we know how useful security manager is).

In general, we want to move away from what is deprecated and use more modern tools (if available). If an alternative is not available, we will stick with it. GraalVM usage with security manager is a small step to help us migrate to JDK-24. When JAVA sandboxing is available in GraalVM, we will remove usage of of security manager. That's the long term goal. That step is risky too, because GraalVM is very new, so we also don't want to overcommit and take baby steps.

@pfirmstone
Copy link

I'm concerned that Serialization might be a requirement of communications between processes, or is this concern unfounded?

that's the biggest concern for in-proc communication between plugins and core (discussed as con in Option 3).

I think I may know a solution for that, but it requires modification to suit your use case. Currently it depends on SecurityManager, for authentication and authorization. But I don't think you need encryption, authorization and authentication for inter-process communications, it implements a subset of Java serialization (using a common constructor signature), without support for circular object graphs (million laugh attacks), it has defensive mechanisms that expect periodical stream resets, array and stream size limits, it doesn't serialize collections, instead it uses serializers that serialize an unmodifiable copy (not entirely true as it is array based, so could be modified in stream) and has api tooling to assist developers to perform type and input validation, such as checking collection's contain the correct types before copying their contents to a new collection. The api also allows invariant checks between subclass and superclasses, prior to calling a superclass and each class in an object has its own namespace for constructor arguments.

https://github.com/pfirmstone/JGDMS/tree/trunk/JGDMS/jgdms-jeri
https://github.com/pfirmstone/JGDMS/tree/trunk/JGDMS/jgdms-platform/src/main/java/org/apache/river/api/io

IMHO Java serialization vulnerabilities destroyed the client Java market. A lot more could have been done sooner to address it, but I think timing and limited resources had a lot to do with it.

SM is battle hardened, so I'm just basically leveraging that and addressing well documented published issues by security researchers (low hanging fruit). I have made some breaking changes, Permission's are no longer Serializable and it's no longer possible to set SM null (usually the last trick in a gadget attack), removed static permissions granted by code (prevents URL injection attacks) and reduced the size of the trusted platform to the java.base module. But it's also possibly an interim measure until something better comes along. It's also possible nothing better will come along, as security needs to be designed in at a language level, so it could become a long term interim measure. OpenJDK was very fast moving from deprecation to removal. It seems they've bet the farm on virtual threads, the asynchronous concurrency features hide valuable debugging information, so it makes sense they want to address that, however these aren't needed for high scalability, immutability, thread confinement, garbage collection, safe publication and NIO are more than sufficient for most, I suspect virtual threads will be a fizzer, I could be wrong, but I think they're trying to find a solution for a non-problem, but then there are some very promising, like the foreign function api, future possibilities such as reified generics. I still use primitive types, bit-shift operations etc, when I need performance and nothing else will cut it. Some of the tricks used in pooling threads in the past was to reduce their assigned memory, smaller object headers, there's plenty of good stuff in the pipeline.

@pfirmstone
Copy link

@kumargu I would like to see your efforts succeed.

@reta
Copy link
Collaborator Author

reta commented Dec 30, 2024

I wanted to sync with you on the outcome of the PoC before including it here. I was not clear if the PoC was finally working end-to-end. Secondly, I wanted an opinion if we'd need it if we had the Graal integration.

Yes, it is working end-to-end (for the socket connection as PoC), thanks @kumargu

@pfirmstone
Copy link

@kumargu It appears Graal doesn't use marshalling, it appears to be using memory access to java object structures...

@kumargu
Copy link
Contributor

kumargu commented Jan 3, 2025

@kumargu It appears Graal doesn't use marshalling, it appears to be using memory access to java object structures...

I think that is true, only if you use GraalVM building a native image. We are not going to use the native image, we just leverage sandboxing.

@pfirmstone
Copy link

It seems Graal makes it possible to allow access to host methods between the jvm with the host and jvm with guest, but how it does so isn't that clear to me yet, it does appear to be using InputStream and OutputStream in communications, but I haven't found any evidence of it using RMI or serialization.

https://github.com/oracle/graal/blob/3888b6934eca539fb7d1c4132d2140cba28e21a7/espresso/docs/how-espresso-works.md

Snips from https://www.graalvm.org/latest/security-guide/sandboxing/

Further restricts host access to ensure there are no implicit entry points to host code. This means that guest-code access to host arrays, lists, maps, buffers, iterables and iterators is disallowed. The reason is that there may be various implementations of these APIs on the host side, resulting in implicit entry points. In addition, direct mappings of guest implementations to host interfaces via HostAccess.Builder#allowImplementationsAnnotatedBy are disallowed. The HostAccess.UNTRUSTED host access policy is preconfigured to fulfill the requirements for the UNTRUSTED sandboxing policy.

Shared runtime: With the Java Security Manager, untrusted code executes in the same JVM environment as trusted code, sharing JDK classes and runtime services such as the garbage collector or the compiler. In the GraalVM sandbox, untrusted code runs in dedicated VM instances (GraalVM isolates), separating services and JDK classes of host and guest by design.

https://github.com/oracle/graal/blob/3888b6934eca539fb7d1c4132d2140cba28e21a7/truffle/src/com.oracle.truffle.polyglot/src/com/oracle/truffle/polyglot/PolyglotImpl.java
https://github.com/oracle/graal/blob/master/truffle/src/com.oracle.truffle.polyglot/src/com/oracle/truffle/polyglot/DefaultPolyglotHostService.java
https://github.com/oracle/graal/blob/master/truffle/src/com.oracle.truffle.api/src/com/oracle/truffle/api/impl/DispatchOutputStream.java
https://github.com/oracle/graal/blob/master/truffle/src/com.oracle.truffle.polyglot/src/com/oracle/truffle/polyglot/PolyglotWrapper.java

It looks like Graal is using Proxy's to call methods on host code in the host vm from the client vm, and vice versa. I could be wrong, I haven't looked at it for long.

@reta
Copy link
Collaborator Author

reta commented Jan 4, 2025

It looks like Graal is using Proxy's to call methods on host code in the host vm from the client vm, and vice versa. I could be wrong, I haven't looked at it for long.

Thanks @pfirmstone , yes, it does use Proxy (the list of supported interfaces is supplied as PolyglotInterfaceMappings property [1]), the host access has to be allowed in order for guest to use those.

[1] https://github.com/opensearch-project/OpenSearch/pull/16863/files#diff-cb10b082944a45237eaa8245e42f2ff3423ba42e68d52fde3099da101d5b0a5bR77

@kumargu
Copy link
Contributor

kumargu commented Jan 7, 2025

@reta I have included the Java agent idea in the proposal. thank-you for your offline feedback.

I also have put up my take on the preferences and what we should be picking for 3.0 release. I am fine if you would want to edit the preferences-- if you feel either ways of the opinion. Once the GraalVM POC is completed (assuming the patch works :) ), i think we should take a final call on the proposals.

@pfirmstone
Copy link

pfirmstone commented Jan 12, 2025

It looks like Graal is using Proxy's to call methods on host code in the host vm from the client vm, and vice versa. I could be wrong, I haven't looked at it for long.

Thanks @pfirmstone , yes, it does use Proxy (the list of supported interfaces is supplied as PolyglotInterfaceMappings property [1]), the host access has to be allowed in order for guest to use those.

[1] https://github.com/opensearch-project/OpenSearch/pull/16863/files#diff-cb10b082944a45237eaa8245e42f2ff3423ba42e68d52fde3099da101d5b0a5bR77

Thanks @reta makes sense why SM is still necessary. I haven't had time to investigate how Graal is making inter process calls, but I did determine it wasn't using Java Serialization or RMI. If you find it, please let me know.

Thoughts... SM is an authorization layer not a sandbox, OpenJDK hasn't had a sandbox for a good decade or more. Graal has a Sandbox, potentially immune to speculative execution attacks, but hasn't developed an authorization layer yet. A sandbox requires an authorization layer.

Prior to Java 1.2, Java had a simple authorization layer trusted code and untrusted code, Li Gong's team learnings were that a fined grained authorization layer was required. My observation is authorization layer complexity occurs due to the way that OpenJDK / Java used SM, with proper tooling to generate policy files and replacement of the concept of "trusted code" with "principles of least privilege", it is much simpler.

It will be interesting to see how a simpler authorization layer in Graal develops.

@pfirmstone
Copy link

SecurityManager & AccessController support for privileges and access control with VirtualThread's

pfirmstone/jdk-with-authorization#46

Just in case it's of interest ;)

@reta
Copy link
Collaborator Author

reta commented Jan 21, 2025

Thanks @reta makes sense why SM is still necessary. I haven't had time to investigate how Graal is making inter process calls, but I did determine it wasn't using Java Serialization or RMI. If you find it, please let me know.

Thanks @pfirmstone , I was also curious and looked into it but got lost in Value / InteropLibrary / @TruffleBoundary / ... , at least I think that it does not look like Java Serialization or RMI, I agree with you here.

@pfirmstone
Copy link

@reta I think it was using Byte Channels, I'll have another look when I get time. Basically just need to check that bytes can't be crafted to select any class or object across isolation boundaries, some form or authorization checking is made in the trusted VM and constructors are used for object instantiation following unmarshalling, so developers intended invariant checks are called.

Graal looks very promising, it appears to be what SM needed to secure the JVM against untrusted code.

@pfirmstone
Copy link

I think I found the reason OpenJDK didn't implement support for SM in Virtual threads:

pfirmstone/jdk-with-authorization#50

I hadn't seen this code until recently, when refactoring AccessControlContext for immutability. Over 15 years ago, I was authoring a new Policy implementation for scalability and had a thorough understanding of AccessControlContext from that time. But yikes, the new implementation was really messed up, just to add a convenience method. It would have been much simpler to implement by adding a ProtectionDomain with static permissions and a null codesource to the stack with minimal change or complexity. I think it highlights the minimal efforts directed towards maintenance of SM code.

I'm currently working on fixing AccessControlContext, implementing a Weakly referenced ConcurrentHashMap cache to avoid duplicating AccessControlContext. If there are millions of virtual threads, we can't have a two to one ratio of AccessControlContext : VirtualThread. The majority of concurrent code has a limited number of AccessControlContext's, common in many tasks / threads.

@pfirmstone
Copy link

pfirmstone commented Jan 29, 2025

My current thoughts are that Graal could be used to provide a Java compatibility layer, while the JVM that runs OpenSearch platform code performs authorization decisions using an OpenJDK fork.

I'm currently progressing through refactoring SM classes, AccessControlContext, AccessController, ProtectionDomain and SubjectDomainCombiner to support virtual threads. I implemented a stack walk in AccessController using ScopedValue and StackWalker, however this caused some issues with ScopedValue. AccessController and AccessControlContext are loaded very early during VM initialization. For now I have removed the new stack walk method. It's also worth noting that ScopedValue's are found using the same c++ methods as the existing c++ stack walk, so it's unlikely that these methods will go away, it may be less hassle to just continue using the c++ stack walk implementation, it's definitely cleaner and smaller, it's only 53 lines of code.

I cleaned up the mess that was AccessControlContext and it's now immutable with a cache, so there can be millions of virtual threads, all sharing the same context and it won't create an explosion of AccessControlContext objects. AccessControlContext now has builder methods called by VM code, allowing fields to be final, where previously an object was created and then fields were initialized. The cache is injected into AccessControlContext, just prior to instantiating SecurityManager, this was necessary as AccessControlContext is loaded by the VM's primordial class loader, but the cache is loaded by the platform ClassLoader. Note that I haven't completed refactoring, I also intend to remove the boolean field "isAuthorized", to reduce the number of cached AccessControlContext instances further. I've removed the synchronized weak cache from SubjectDomainCombiner, as AccessControlContext's cache will provide similar benefits without hugely impacting scalability.

There are some failing tests, pertaining to missing permissions, these will be fixed in the near future. Note that these changes make a huge performance difference to existing code utilising SM, especially regarding scalability, which may expose latent race conditions and concurrency bugs, but this shouldn't be of concern, as switching off SM will do the same ;)

A Linux x64 build for testing can be found here:
https://github.com/pfirmstone/jdk-with-authorization/actions/runs/13030236032/artifacts/2503644694

I asked Copilot AI to compare the latest AccessControlContext with Java 16:

Here are the key differences between AccessControlContext.java in pfirmstone/jdk-with-authorization and OpenJDK 16+36:

  1. Copyright Year:

    • pfirmstone/jdk-with-authorization: 1997, 2023
    • OpenJDK 16+36: 1997, 2019
  2. File Import Statements:

    • pfirmstone/jdk-with-authorization: Imports additional classes like java.util.Arrays, java.util.HashSet, java.util.concurrent.ConcurrentMap, jdk.internal.misc.VM.
    • OpenJDK 16+36: Imports sun.security.util.FilePermCompat.
  3. Class Description:

    • pfirmstone/jdk-with-authorization: Includes details about deprecated status since Java 17 and removal in Java 24, retained for authorization.
    • OpenJDK 16+36: Does not include this information.
  4. Fields:

    • pfirmstone/jdk-with-authorization:
      • private final ProtectionDomain[] context;
      • private final boolean isPrivileged;
      • private final boolean isAuthorized;
      • private final AccessControlContext privilegedContext;
      • private final DomainCombiner combiner;
    • OpenJDK 16+36:
      • private ProtectionDomain[] context;
      • private boolean isPrivileged;
      • private boolean isAuthorized = false;
      • private AccessControlContext privilegedContext;
      • private DomainCombiner combiner = null;
      • private Permission[] permissions;
      • private AccessControlContext parent;
      • private boolean isWrapped;
      • private boolean isLimited;
      • private ProtectionDomain[] limitedContext;
  5. Methods:

    • pfirmstone/jdk-with-authorization: Contains several build methods for creating AccessControlContext with various parameters.
    • OpenJDK 16+36: Includes additional methods and fields related to limited privilege scopes and more complex permission checks.

The pfirmstone/jdk-with-authorization version retains more functionality and details related to privileged actions and authorization checks, whereas OpenJDK 16+36 has a more complex structure with additional fields and methods for handling different contexts and permissions.

@reta
Copy link
Collaborator Author

reta commented Jan 29, 2025

My current thoughts are that Graal could be used to provide a Java compatibility layer, while the JVM that runs OpenSearch platform code performs authorization decisions using an OpenJDK fork.

Thanks @pfirmstone , yes, and we POCed the possible implementation path (#16861). The issue here is immense amount of work should go there, primary due to the fact that OpenSearch public API exposure is so huge.

@pfirmstone
Copy link

@reta , Are there other possible variations on (#16861)?

Does the whole OpenSearch API need to be exposed, or might OpenSearch be broken up in to modules, those that don't need privileges running in the isolated vm with client code, while sections that require privileges run in the host jvm using SM?

@reta
Copy link
Collaborator Author

reta commented Jan 29, 2025

@reta , Are there other possible variations on (#16861)?

I think it is difficult to find the meaningful isolated model that works, since the plugin APIs expose a whole lot by default.

Does the whole OpenSearch API need to be exposed, or might OpenSearch be broken up in to modules, those that don't need privileges running in the isolated vm with client code, while sections that require privileges run in the host jvm using SM?

That's technically possible and we are working towards it (see please #8110), but that is immense effort as well.

@pfirmstone
Copy link

That's technically possible and we are working towards it (see please #8110), but that is immense effort as well.

@reta I was involved in modularization of a large monolith, Apache River, we thought it was an insurmountable effort, until one day one of the developers contributed a script (groovy) from memory. I used that script modularize JGDMS into a maven build from Apache River, an ant build.

I would suggest going over the River development list archives, there's also Apache River's SVN commit history in JGDMS on github. I forget how many lines of code, but it's a big codebase. It's been so much easier working with a modular build, so any investment will pay dividends later.

@reta
Copy link
Collaborator Author

reta commented Jan 29, 2025

. It would have been much simpler to implement by adding a ProtectionDomain with static permissions and a null codesource to the stack with minimal change or complexity. I think it highlights the minimal efforts directed towards maintenance of SM code.

100: agree, I am pretty sure that the difficulty (in the current OpenJDK codebase) to do so was the cause that the simple "let's drop it" path was chosen, thanks for digging in!

My current thoughts are that Graal could be used to provide a Java compatibility layer, while the JVM that runs OpenSearch platform code performs authorization decisions using an OpenJDK fork.

I am wondering if it is feasible to follow GraalVM / TornadoVM / ... development model here and provide hardened JDK variant as a community? It seems like there is a lot on your plate ...

@pfirmstone
Copy link

pfirmstone commented Jan 30, 2025

I am wondering if it is feasible to follow GraalVM / TornatoVM / ... development model here and provide hardened JDK variant as a community? It seems like there is a lot on your plate ...

@reta That's definitely feasible, if privileged Java API's are exported, networks, file systems etc can be controlled based on the identity of the isolated jvm. It's important to isolate at the process layer, I don't think it's feasible to perform access control from within the same jvm the untrusted code is running in.

Edited...

I'm operating a business, so I don't have much time, I witnessed a significant moment in history unfolding and decided to take action, Java is the only language with security designed in from the beginning, it's not ideal and it needs improvement, but once the API is removed completely, then it's over, without some system of authorization, security will be forever compromised, the implementation was a problem, it accumulated years of maintenance debt, and that probably did more harm than good, it might have been better not to provide an incomplete implementation (which it was), and only provide an API, but I think applets put a lot of pressure on developers to provide a solution back in the late 1990's and that solidified it.

The investigation into GraalVM was very useful, it highlights the importance of authorization, if an existing development base can be retained and demonstrate effective defence against future security vulnerabilities, then remaining API's could remain in place for compatibility and improvements made / proposed. I'm not convinced Agents are a good solution for security hooks, while finalizers remain, Agents used in constructors aren't secure, disabling finalizers is a necessity, but I think they're just too much work to maintain. During discussions with OpenJDK, Allan confirmed the maintenance of hooks (permission checks) was a major burden, more so than other components, but these are the parts of the implementation that matter most.

It needs to be a community effort...

@kumargu
Copy link
Contributor

kumargu commented Jan 31, 2025

@reta should we close this RFC and track the proposal #17181

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Security Project-wide roadmap label security Anything security related v2.19.0 Issues and PRs related to version 2.19.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

No branches or pull requests