-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Replace Java Security Manager (JSM) #1687
Comments
@nknize suggested we remove security manager in 2.0, labelling issue as such - once we have agreed here on what to do for this issue let's open a campaign parent issue in https://github.com/opensearch-project/opensearch-plugins/ |
@dblock would you mind if I submit a small patch for 1.3.x+ so it could be run on JDK 18? Thank you PS: To clarify why, JDK 18 is scheduled to be released in March, right around 1.4.x (planned) release, I suspect a number of people may give it a try. The change is only adding the command line property, non breaking. |
I'm A-OK with anything non-breaking on 1.x. |
I suspect tests will blow up since the test infrastructure leverages a custom SecurityManger via |
I think the issue is written up correctly. You'll want to set Lucene uses a custom security manager too, no issues on JDK18. we just initialize it differently than opensearch, right at JVM startup time: But in your case here, it is a little different because system starts up with no security manager, then parses some config files and maybe does a few evil things on startup, then it installs security manager via |
Separately, as far as alternatives, I can suggest a few things:
I don't recommend directly going the LSM route (AppArmor, SELinux, etc). There's a lot of complexity to those, and its so system-specific which if any are even available. I'd start with systemd which is basically universal now on linux systems, and it gets you the biggest wins anyway (e.g. filtering filesystem and so on). |
Another win for stuff like but that strategy won't work for all the code: There's no one-size/fits-all solution. For example, things like analysis modules/plugins are extremely performance sensitive, and really need to just be passed to IndexWriter. At the same time, these plugins have less security risk (compared to e.g. Tika or scripting languages), so it's not a huge deal: they are just exposing lucene analyzers :) |
Thank you very much, @rmuir
That is right. |
I've also made my opinion loudly clear on twitter that removing SecurityManager without replacement is a bad idea for java right now. At least providing a "replacement" first (ideally enabled by default), to help protect server-side apps against the worst vulnerabilities, is really needed. Java is filled with security landmines. Doubt anything will change on the java side, but I tried. I don't have the resources/energy to write up JEP proposals or anything to try to make real change here though, sorry. |
Thanks @rmuir , I think the large part with respect to "what the replacement should be" is still unknown, as it is dictated by Project Loom that is not there yet. But I do 💯 agree on the point: removing |
if you think of the entire internet (not just opensearch), i really do feel that something similar to the openbsd but there's also the separate problem that java includes insecure functionality like JDNI ("landmines"), by default. Besides sandboxing, we need to get good secure defaults here and disable dangerous crap by default. it is a multi-pronged approach. |
@Pallavi-AWS the recent (one of many) discussions on OpenJDK mailing list hint there won't be replacements for [1] https://mail.openjdk.java.net/pipermail/security-dev/2022-April/029643.html |
i recommend to keep using it until it completely stops working. why would you voluntarily disable a security feature unless you have to? |
It's already deprecated in the jdk and can be found in the build logs:
This is still being worked and there are already some great suggestions on this issue. In the meantime, we planned to keep using it until it stops working and will converge on a plan before upgrading to a jdk that removes it completely. |
Use of the SecurityManager and AccessController have been deprecated and will be removed in java versions after 17. While this is an issue its also one that will take a concerted effort to resolve. These warning messages making discovering build errors and other warnings more difficult; hence adding this supression logic. For tracking the effort to replace these components look into opensearch-project/OpenSearch#1687 Signed-off-by: Peter Nied <petern@amazon.com>
A few thoughts / questions: Is there a way to avoid needing SecurityManager in the Graal guest environment? In JGDMS there's a declared @AtomicSerial API for serialization / deserialization, for use with any protocol, I was working on support for ASN.1, but halted work after JEP411, until a solution was found for SM. This API is hardened against gadget attacks by failure atomicity and provides utility methods for input validation. JGDMS also has JERI (Jini Extensible Remote Invocation), which was designed by the people who designed RMI to address the pitfalls with RMI. If someone wanted, these features could be copied from JGDMS (AL2.0 license), and stripped down to their bare minimum, to use for communications between Host and Guests. I can provide guidance on how it works. As an aside, the fork of OpenJDK I'm currently maintaining with SM, contains significant performance enhancements and security improvements, if people would like to test and provide performance comparisons and feedback, that would be greatly appreciated. The maintenance cost has been less than expected and I've been able to make significant SM improvements in a short space of time. Whether I continue to maintain a fork is dependent on community interest and viability of other possible solutions. Recent build artifacts based on fork of OpenJDK 25, master branch: Linux x64: https://github.com/pfirmstone/jdk-with-authorization/actions/runs/12497991476/artifacts/2362229379 There's also a OpenJDK 24 fork branch here: The use of a hybrid Graal Systemd solution is compelling. If the guest is to use encryption over network connections, I think that might need to be performed by the host, for the guest, as it's not safe for the guest to have access to encryption keys, etc. On second thoughts, maybe independent truststore/ keystore's could be provided for each guest? |
Just documenting my forking strategy here in case it has been misunderstood:
There were a large number of merge conflicts during JEP 486, not unexpected. Release branches follow the same strategy, so that all upstream fixes and patches are included with weekly merges. Permission checks were like shotgun surgery, as they were spread throughout OpenJDK, it was a big job to remove them. We have a discord channel if anyone wants to become involved, let me know. The largest maintenance task isn't merging from upstream; it's looking at new JEP features and determining how they need to be protected by new permission checks. Some recent fixes: |
I wanted to sync with you on the outcome of the PoC before including it here. I was not clear if the PoC was finally working end-to-end. Secondly, I wanted an opinion if we'd need it if we had the Graal integration. |
@pfirmstone (going to answer some of your comments and will come back to others later)
this is a temporary hack. It won't be needed once oracle/graal#10239 is addressed.
that's the biggest concern for in-proc communication between plugins and core (discussed as con in Option 3).
I don't think we/I misunderstood the intentions here. We understand the dedication and amount of work you have put in to get this working. The challenge with fork is not only maintainability. A. This is not a long term solution, if we have a long term solution (GraalVM), we would like to pursue it. B. Cloud providers (such as AWS) or other organizations consuming a fork has to be convinced of usage of forked JDK given Open JDK states that security manager is not the right tooling for securing Java applications (although we know how useful security manager is). In general, we want to move away from what is deprecated and use more modern tools (if available). If an alternative is not available, we will stick with it. GraalVM usage with security manager is a small step to help us migrate to JDK-24. When JAVA sandboxing is available in GraalVM, we will remove usage of of security manager. That's the long term goal. That step is risky too, because GraalVM is very new, so we also don't want to overcommit and take baby steps. |
I think I may know a solution for that, but it requires modification to suit your use case. Currently it depends on SecurityManager, for authentication and authorization. But I don't think you need encryption, authorization and authentication for inter-process communications, it implements a subset of Java serialization (using a common constructor signature), without support for circular object graphs (million laugh attacks), it has defensive mechanisms that expect periodical stream resets, array and stream size limits, it doesn't serialize collections, instead it uses serializers that serialize an unmodifiable copy (not entirely true as it is array based, so could be modified in stream) and has api tooling to assist developers to perform type and input validation, such as checking collection's contain the correct types before copying their contents to a new collection. The api also allows invariant checks between subclass and superclasses, prior to calling a superclass and each class in an object has its own namespace for constructor arguments. https://github.com/pfirmstone/JGDMS/tree/trunk/JGDMS/jgdms-jeri IMHO Java serialization vulnerabilities destroyed the client Java market. A lot more could have been done sooner to address it, but I think timing and limited resources had a lot to do with it. SM is battle hardened, so I'm just basically leveraging that and addressing well documented published issues by security researchers (low hanging fruit). I have made some breaking changes, Permission's are no longer Serializable and it's no longer possible to set SM null (usually the last trick in a gadget attack), removed static permissions granted by code (prevents URL injection attacks) and reduced the size of the trusted platform to the java.base module. But it's also possibly an interim measure until something better comes along. It's also possible nothing better will come along, as security needs to be designed in at a language level, so it could become a long term interim measure. OpenJDK was very fast moving from deprecation to removal. It seems they've bet the farm on virtual threads, the asynchronous concurrency features hide valuable debugging information, so it makes sense they want to address that, however these aren't needed for high scalability, immutability, thread confinement, garbage collection, safe publication and NIO are more than sufficient for most, I suspect virtual threads will be a fizzer, I could be wrong, but I think they're trying to find a solution for a non-problem, but then there are some very promising, like the foreign function api, future possibilities such as reified generics. I still use primitive types, bit-shift operations etc, when I need performance and nothing else will cut it. Some of the tricks used in pooling threads in the past was to reduce their assigned memory, smaller object headers, there's plenty of good stuff in the pipeline. |
@kumargu I would like to see your efforts succeed. |
Yes, it is working end-to-end (for the socket connection as PoC), thanks @kumargu |
@kumargu It appears Graal doesn't use marshalling, it appears to be using memory access to java object structures... |
I think that is true, only if you use GraalVM building a native image. We are not going to use the native image, we just leverage sandboxing. |
It seems Graal makes it possible to allow access to host methods between the jvm with the host and jvm with guest, but how it does so isn't that clear to me yet, it does appear to be using InputStream and OutputStream in communications, but I haven't found any evidence of it using RMI or serialization. Snips from https://www.graalvm.org/latest/security-guide/sandboxing/
https://github.com/oracle/graal/blob/3888b6934eca539fb7d1c4132d2140cba28e21a7/truffle/src/com.oracle.truffle.polyglot/src/com/oracle/truffle/polyglot/PolyglotImpl.java It looks like Graal is using Proxy's to call methods on host code in the host vm from the client vm, and vice versa. I could be wrong, I haven't looked at it for long. |
Thanks @pfirmstone , yes, it does use Proxy (the list of supported interfaces is supplied as |
@reta I have included the Java agent idea in the proposal. thank-you for your offline feedback. I also have put up my take on the preferences and what we should be picking for 3.0 release. I am fine if you would want to edit the preferences-- if you feel either ways of the opinion. Once the GraalVM POC is completed (assuming the patch works :) ), i think we should take a final call on the proposals. |
Thanks @reta makes sense why SM is still necessary. I haven't had time to investigate how Graal is making inter process calls, but I did determine it wasn't using Java Serialization or RMI. If you find it, please let me know. Thoughts... SM is an authorization layer not a sandbox, OpenJDK hasn't had a sandbox for a good decade or more. Graal has a Sandbox, potentially immune to speculative execution attacks, but hasn't developed an authorization layer yet. A sandbox requires an authorization layer. Prior to Java 1.2, Java had a simple authorization layer trusted code and untrusted code, Li Gong's team learnings were that a fined grained authorization layer was required. My observation is authorization layer complexity occurs due to the way that OpenJDK / Java used SM, with proper tooling to generate policy files and replacement of the concept of "trusted code" with "principles of least privilege", it is much simpler. It will be interesting to see how a simpler authorization layer in Graal develops. |
SecurityManager & AccessController support for privileges and access control with VirtualThread's pfirmstone/jdk-with-authorization#46 Just in case it's of interest ;) |
Thanks @pfirmstone , I was also curious and looked into it but got lost in |
@reta I think it was using Byte Channels, I'll have another look when I get time. Basically just need to check that bytes can't be crafted to select any class or object across isolation boundaries, some form or authorization checking is made in the trusted VM and constructors are used for object instantiation following unmarshalling, so developers intended invariant checks are called. Graal looks very promising, it appears to be what SM needed to secure the JVM against untrusted code. |
I think I found the reason OpenJDK didn't implement support for SM in Virtual threads: pfirmstone/jdk-with-authorization#50 I hadn't seen this code until recently, when refactoring AccessControlContext for immutability. Over 15 years ago, I was authoring a new Policy implementation for scalability and had a thorough understanding of AccessControlContext from that time. But yikes, the new implementation was really messed up, just to add a convenience method. It would have been much simpler to implement by adding a ProtectionDomain with static permissions and a null codesource to the stack with minimal change or complexity. I think it highlights the minimal efforts directed towards maintenance of SM code. I'm currently working on fixing AccessControlContext, implementing a Weakly referenced ConcurrentHashMap cache to avoid duplicating AccessControlContext. If there are millions of virtual threads, we can't have a two to one ratio of AccessControlContext : VirtualThread. The majority of concurrent code has a limited number of AccessControlContext's, common in many tasks / threads. |
My current thoughts are that Graal could be used to provide a Java compatibility layer, while the JVM that runs OpenSearch platform code performs authorization decisions using an OpenJDK fork. I'm currently progressing through refactoring SM classes, AccessControlContext, AccessController, ProtectionDomain and SubjectDomainCombiner to support virtual threads. I implemented a stack walk in AccessController using ScopedValue and StackWalker, however this caused some issues with ScopedValue. AccessController and AccessControlContext are loaded very early during VM initialization. For now I have removed the new stack walk method. It's also worth noting that ScopedValue's are found using the same c++ methods as the existing c++ stack walk, so it's unlikely that these methods will go away, it may be less hassle to just continue using the c++ stack walk implementation, it's definitely cleaner and smaller, it's only 53 lines of code. I cleaned up the mess that was AccessControlContext and it's now immutable with a cache, so there can be millions of virtual threads, all sharing the same context and it won't create an explosion of AccessControlContext objects. AccessControlContext now has builder methods called by VM code, allowing fields to be final, where previously an object was created and then fields were initialized. The cache is injected into AccessControlContext, just prior to instantiating SecurityManager, this was necessary as AccessControlContext is loaded by the VM's primordial class loader, but the cache is loaded by the platform ClassLoader. Note that I haven't completed refactoring, I also intend to remove the boolean field "isAuthorized", to reduce the number of cached AccessControlContext instances further. I've removed the synchronized weak cache from SubjectDomainCombiner, as AccessControlContext's cache will provide similar benefits without hugely impacting scalability. There are some failing tests, pertaining to missing permissions, these will be fixed in the near future. Note that these changes make a huge performance difference to existing code utilising SM, especially regarding scalability, which may expose latent race conditions and concurrency bugs, but this shouldn't be of concern, as switching off SM will do the same ;) A Linux x64 build for testing can be found here: I asked Copilot AI to compare the latest AccessControlContext with Java 16: Here are the key differences between
The |
Thanks @pfirmstone , yes, and we POCed the possible implementation path (#16861). The issue here is immense amount of work should go there, primary due to the fact that OpenSearch public API exposure is so huge. |
@reta , Are there other possible variations on (#16861)? Does the whole OpenSearch API need to be exposed, or might OpenSearch be broken up in to modules, those that don't need privileges running in the isolated vm with client code, while sections that require privileges run in the host jvm using SM? |
I think it is difficult to find the meaningful isolated model that works, since the plugin APIs expose a whole lot by default.
That's technically possible and we are working towards it (see please #8110), but that is immense effort as well. |
@reta I was involved in modularization of a large monolith, Apache River, we thought it was an insurmountable effort, until one day one of the developers contributed a script (groovy) from memory. I used that script modularize JGDMS into a maven build from Apache River, an ant build. I would suggest going over the River development list archives, there's also Apache River's SVN commit history in JGDMS on github. I forget how many lines of code, but it's a big codebase. It's been so much easier working with a modular build, so any investment will pay dividends later. |
100: agree, I am pretty sure that the difficulty (in the current OpenJDK codebase) to do so was the cause that the simple "let's drop it" path was chosen, thanks for digging in!
I am wondering if it is feasible to follow GraalVM / TornadoVM / ... development model here and provide hardened JDK variant as a community? It seems like there is a lot on your plate ... |
@reta That's definitely feasible, if privileged Java API's are exported, networks, file systems etc can be controlled based on the identity of the isolated jvm. It's important to isolate at the process layer, I don't think it's feasible to perform access control from within the same jvm the untrusted code is running in. Edited... I'm operating a business, so I don't have much time, I witnessed a significant moment in history unfolding and decided to take action, Java is the only language with security designed in from the beginning, it's not ideal and it needs improvement, but once the API is removed completely, then it's over, without some system of authorization, security will be forever compromised, the implementation was a problem, it accumulated years of maintenance debt, and that probably did more harm than good, it might have been better not to provide an incomplete implementation (which it was), and only provide an API, but I think applets put a lot of pressure on developers to provide a solution back in the late 1990's and that solidified it. The investigation into GraalVM was very useful, it highlights the importance of authorization, if an existing development base can be retained and demonstrate effective defence against future security vulnerabilities, then remaining API's could remain in place for compatibility and improvements made / proposed. I'm not convinced Agents are a good solution for security hooks, while finalizers remain, Agents used in constructors aren't secure, disabling finalizers is a necessity, but I think they're just too much work to maintain. During discussions with OpenJDK, Allan confirmed the maintenance of hooks (permission checks) was a major burden, more so than other components, but these are the parts of the implementation that matter most. It needs to be a community effort... |
Is your feature request related to a problem? Please describe.
It has been announced a while ago that
SecurityManager
is going to be phased out from the JDK. The first step, the deprecation of the SecurityManager (JEP-411), has been landed in JDK 17 and issues the following warnings on OpenSearch builds or server startup:The JDK 18 pushes it even further and now fails on startup (see please https://bugs.openjdk.java.net/browse/JDK-8270380), running OpenSearch builds or server on JDK 18 EA fails with:
It now requires JVM command line option to enable it explicitly using (see please [1]):
Describe the solution you'd like
There is no alternative or replacement for the
SecurityManager
(to understand why, Project Loom is to "blame"), see please [2]. One of the options is to just drop it, it sounds risky but combined with Plugin Sandbox (see please [3], [4]) it may sounds like a viable option. Other options include (but not limited to): bytecode instrumentation, java agent, custom classloader.Describe alternatives you've considered
We could keep it as long as we can, but once removed from the JDK, it will be a problem.
Additional context
The upcoming JDK-24 release disables
SecurityManager
permanently [6].See please links.
[1] https://inside.java/2021/12/06/quality-heads-up/
[2] https://inside.java/2021/04/23/security-and-sandboxing-post-securitymanager/
[3] #1572
[4] #1422
[5] A possible JEP to replace SecurityManager after JEP 411
[6] openjdk/jdk#21498
The text was updated successfully, but these errors were encountered: