-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow bundle persistence in cloud native environments #4782
Comments
I believe one could simply change the call here with this implementation instead. |
Do you mean something like this : Joffref@cf2f652 I didn't try it, but just give me a quick feedback if you have time to do so ! 💯 |
Thanks for the detailed feature request. OPA really isn't designed to share a file system with other instances. So while it might work to introduce locking, what about other things like disk persistence, or some future file-based decision logger plugin? Also, having multiple instances share a directory isn't really tested at all. I think your best bet is to avoid this. Give each OPA instance its own directory. What is it that you've been trying to achieve here? I don't suppose it's about disk usage, is it? I'm afraid OPA wouldn't watch the persistence dir for updates anyway, so this wouldn't work as a synchronization method... 🤔 |
Maybe this tool is fitted for your use case. |
Initially, I thought that could be a solution, yes, but then I looked a bit closer and realized that it will not be enough, since the current persistence mechanism tries to atomically update the persisted bundle file using
I believe this could lead to the destination file being both truncated and actively written to, making the atomicity of As I see it, there are two possible solutions to this with some different pros and cons: Solution 1Avoid using any file locks but change the implementation so that each instance operates in its own unique temporary file. This means that each instance can write to their own file without the risk of truncation or corruption and then we rely upon the fact that Pros:
Cons:
Solution 2Introduce a file lock around the temporary file for the entirety of the process of opening, truncating, writing to and renaming the file. Pros:
Cons:
|
Thanks for the suggestions on some alternative solutions! The goal here is to make OPA more robust so that it can actually utilize the bundle persistence feature in a cloud environment on its own without any other supporting services. I know that this can be solved by building tools around OPA, rather than having this supported inside OPA itself (which is actually what we are doing today already). However, this particular feature is something that I believe can easily be supported by OPA with some minimal changes. I also think it is in line with OPA being touted as cloud native. I mean, OPA is a CNCF graduated project and the tagline is literally:
|
Since storage is usually pretty cheap, I would actually opt for solution 1 that I outlined above. If you agree to my reasoning for this change I can send a PR your way. |
So, whether a project may be called "cloud native" or not basically comes down to their willingness to support shared instance state using a distributed file system protocol from 1984? |
@anderseknert: I am not trying to say anything of the sort. I am merely trying to contribute to OPA and make it more robust in cloud environments. If you try to deploy OPA the way it is currently implemented in Kubernetes with a common cloud storage driver and try to use the bundle persistence feature, then the end result may be that you get corrupt bundles in your persistent storage. I have already offered my help at fixing this and can send you a PR, if you are interested. |
@srenatus: Perhaps I should elaborate that this is not intended to be used as some sort of way to synchronize the bundles across the instances, but rather as a disaster recovery mechanism, as I believe the feature originally was intended. |
In order to use the feature to persist activated bundles to disk in a cloud environment with shared storage, e.g. Kubernetes with the Amazon EFS storage driver, each instance of OPA needs to either synchronize their access to the temporary file using advisory file locks, or use unique temporary files. If not, then the following situation may occur: p1: open and trunc tmp file p1: write to tmp file p2: open and trunc tmp file p1: rename tmp file to dst p2: write to tmp file p2: rename tmp file to dst This may then lead to the persisted bundle being truncated or corrupted. Here the approach of using unique temporary files is chosen because it avoids the overhead of introducing file locks, and the additional dependency since Go lacks any such mechanisms in the standard library. This solution should avoid truncated or corrupt bundles as `rename()` is guaranteed to be atomic, even in file systems like NFS. Fixes: open-policy-agent#4782 Signed-off-by: Fredrik Appelros <fredrik.appelros@sinch.com>
Thanks, all context is good. Could you elaborate on how you'd like to leverage the DR mechanism? I think bundle persistence was intended to allow a restarting OPA instance to quickly pick up where it left off before, without having to refetch bundles. Would this not be possible without sharing a file system? 🤔 |
@srenatus: I am thinking it would be leveraged as designed today, just in a cloud environment where there are multiple instances deployed to make the application resilient to failures. It is not so much that it is not possible without sharing a file system, but rather how persistent storage works in most cloud environments. If you take Kubernetes as an example then you provide access to persistent storage through PersistentVolumeClaims (PVCs) which in turn uses an underlying storage driver to provision the actual persistent volume. These drivers typically provision these volumes using some form of shared storage solution, e.g. Amazon EBS or EFS. Since any pods running in the cluster are seen as disposable, they may generally move between nodes freely (within its affinities) and hence shared storage is the only solution that makes sense in this type of deployment. One could argue that a local storage class could be used for this, however, this would decrease durability and make management more difficult as this method does not support dynamic provisioning. Imagine that you have a large cluster where nodes are swapped in and out dynamically, making it impossible to manage this statically, and you want to provide durability to the application so that OPA is able to start up and activate the last known bundle without any dependent services. That is what this fix is trying to address. |
@FredrikAppelros you mentioned the below as a potential solution and that is what your PR seems to do as well.
So if the intent is that each OPA instance uses a unique temporary file to avoid corruption, do you think if we configure the persistence directory that each OPA uses to persist bundles to be unique this would be enough to resolve any conflicts ? Currently when you start OPA, you can set the |
@ashutosh-narkar: In theory, yes, that should solve the issue, but again, in practice, given how Kubernetes works it would not be possible to manage. You do not configure each pod on their own as they are managed transparently by a ReplicaSet which in turn may be affected by e.g. your auto-scaling policies. Given the dynamic nature of a cloud environment this needs to be done without manual configuration. |
💭 I think what I don't understand is the assumption that the persistence directory is something that needs to be taken care of at all. I would think that keeping it in the local ephemeral file system is fine. When an OPA instance goes away, so does its persistence directory; when a new one is spun up, it'll have a new persistence directory. The mechanism for doing that in an automatic way is bundles, since the instances will fetch their bundles on startup. But I'm no cloud architect, maybe this isn't a feasible approach for your setup. Why is it not? |
@srenatus: So the issue with keeping it in the node-local ephemeral storage is the fact that pods are disposable, as I previously mentioned. There is no guarantee that the new pod will be spun up on the same node and hence it might not be able to access the persisted bundle. If this happens, we have a dependency on an external bundle service in order to start up new instances, which is exactly what I believe the bundle persistence feature aims to avoid. As I have tried to explain before, the goal here is making OPA more robust in a cloud environment, ensuring that it can continue operating on cached data in the event of a failure. |
In your PR, you're using a unique name for the temp bundle file. If you could dynamically set the |
I think the point is to keep the deterministic persistence directory, to be able to start new instances with a pre-filled persistence directory. To keep the orchestration needs down for this, they're using a shared file system. What goes wrong with the current code is concurrent writes to that file system. |
I'd think the bundle service is critical anyways, for proper functioning... But yeah I think that's the crux here, isn't it? |
@srenatus: Exactly, right now the bundle service is critical, but it doesn't have to be. One can usually live with some stale authorization data rather than no data at all. Have I misunderstood what the bundle persistence feature is about? I would imagine this is the exact use case it tries to prevent, only that it was not designed to function with shared storage. In your documentation it says the following about this feature:
|
I'd like to let @ashutosh-narkar take this one. I think your rename approach in the PR is a decent way to go. I'm still somewhat hesitant about sharing the persistence directory... it's used by the OCI bundle downloader, too, and I don't know if those code paths are happy with having multiple instances share a directory. (I don't know if they use renames when writing to that dir, or if they lock something; or if they even have to.) -- @carabasdaniel do you know if the |
I think the associated PR for this issue is becoming a bit stale. Is there anything I can do to speed up the process? |
I think it boils down to this: there's a not supported deployment scenario, having multiple OPA instances share a persistence dir. If that's done, the fix you've proposed is what's necessary to ensure smooth sailing when using bundles. I'm OK merging your PR, since it clearly unblocks you at a not-too-high cost. What I'm hesitant about is signing off on the deployment scenario. I'm wondering if merging the PR would be understood as supporting it, increasing head aches all round 🤔 @ashutosh-narkar what's your take on this? |
I think the PR on its own is good to go. OPA tries to load and activate persisted bundles on a best-effort basis and so OPA would ideally always look to get the bundles from the server. The deployment scenario is something that the OPA operator decides how they want to run OPA, so this change in itself (ie. using a unique name for the temp file) seems it can be separated from the discussion about how the OPA gets deployed and should be only be looked from the pov of how OPA names temp files going ahead. Merging the PR while adhering to the best-effort persistence guarantees and not implying that we support some advanced or numerous deployment models would be the way to go imo. |
In order to use the feature to persist activated bundles to disk in a cloud environment with shared storage, e.g. Kubernetes with the Amazon EFS storage driver, each instance of OPA needs to either synchronize their access to the temporary file using advisory file locks, or use unique temporary files. If not, then the following situation may occur: p1: open and trunc tmp file p1: write to tmp file p2: open and trunc tmp file p1: rename tmp file to dst p2: write to tmp file p2: rename tmp file to dst This may then lead to the persisted bundle being truncated or corrupted. Here the approach of using unique temporary files is chosen because it avoids the overhead of introducing file locks, and the additional dependency since Go lacks any such mechanisms in the standard library. This solution should avoid truncated or corrupt bundles as `rename()` is guaranteed to be atomic, even in file systems like NFS. Fixes: #4782 Signed-off-by: Fredrik Appelros <fredrik.appelros@sinch.com>
👏 thank you for your contribution @FredrikAppelros |
What part of OPA would you like to see improved?
Make the feature of persisting activated bundles to disk usable in cloud environments.
Describe the ideal solution
While a great feature, the ability to persist activated bundles to disk is not usable in a cloud environment where multiple instances of OPA may be trying to write to the same shared storage as the current implementation is not synchronized. This can be solved by using an advisory file lock on the destination file. This would then be propagated through file systems like NFS to ensure that only one client is able to write to the file.
Describe a "Good Enough" solution
The ideal solution is trivial enough as is.
Additional Context
In order to fix this one would need to first choose a file locking library, as there is no such facility in the Go standard library. One candidate is to use this module which exposes some higher level functions based on the underlying internal implementation in Go.
The text was updated successfully, but these errors were encountered: