-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adds a key-value store endpoint for Superset #17337
feat: Adds a key-value store endpoint for Superset #17337
Conversation
b211cee
to
da8ae4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments... the most important is to return SIP-40 compliant errors, specially since this is a new API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about some of the proposed usages - particularly access tokens, refresh tokens, and public key storage. These items should all be held in an encrypted system, not a plain-text field. That's a big security no-no.
I would also recommend adding a user_id to the key value store and only allowing retrieval of items by the same user. Otherwise this system potentially allows any user to read any stored key, which for most of the uses you recommend would constitute a security hole.
The original key value store was originally deprecated due to similar security concerns. Keep in mind that obscurity of a long key is not the same thing as security - even if the keys are hard to guess, we should have security controls on the individual keys.
Hi @willbarrett. Thanks so much for helping with the review!
You're right about this. There are more secure structures for this type of information. I removed them from possible use cases in the PR description.
The key-value table has a |
Thanks for the response @michael-s-molina - I think as Superset moves into more organizations we should default to closed in all cases, so I would support the closed-sharing model. The open-sharing model is pretty dangerous - it becomes easy to create a link that's accessible to everyone in a company, which I believe wouldn't be a desired behavior by most larger organizations as a default behavior. In Superset, URL parameters, application state, and cache can contain highly sensitive information so I think we should shy away from the open-sharing model in all cases. |
@willbarrett These are good points. I agree that changing the default to the restricted model is more appropriate. I also think we should support the "Anyone with the key" model because we have some resources like public dashboards where we can benefit from it and we don't need the whole security configuration part from the user. I'll ping @dpgaspar to discuss this and increment the PR with these requirements. Thank you so much again! |
Thanks @michael-s-molina - if we do implement the "anyone with a key" model, we should throw some restrictions or confirmation around it so it's very clear to the user that they're about to share very widely. Something to think about on the UI-side of the house. |
Definitely! We should be really clear about the security implications. I was just checking the Google Drive interface and we can follow the same pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some comments, we should add more tests also
|
||
__tablename__ = "key_value" | ||
key = Column(String(256), primary_key=True) | ||
value = Column(Text, nullable=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if value
is potentially sensitive, we should encrypt this field
Would definitely fall to restrict access to the owner of the key. but the K/V store goal is not clear yet or it's just too broad. Session management and caching are sensitive, caching values could potentially defeat dataset ownership and RBAC permissions. We can make the ownership restriction optional and on by default behind a config key. Or discuss this further on a secure sharing model |
Thanks for the review @dpgaspar! I'll address all your comments |
@dpgaspar @willbarrett @betodealmeida I updated the PR description with the results of our last interactions about security and suggested use cases. I also added examples of the endpoint operations and their configurations. @john-bodley I would really like your review too. @dpgaspar even though the names in the API are editor and viewer, I’m still going to use the owner nomenclature in the database to match our existing schema 😉 You'll notice that the code is not reflecting the current description. I'm looking for more reviews on the security model before implementing it. If you can, just give a thumbs-up in this comment so I know you're ok with it or add a comment otherwise. I'm really excited about my first Python PR so thank you all for the reviews. It's good to be back to full-stack development 🙌🏼 |
❗ Please consider rebasing your branch to avoid db migration conflicts. |
92042f7
to
4c9c30d
Compare
Codecov Report
@@ Coverage Diff @@
## master #17337 +/- ##
==========================================
- Coverage 76.94% 76.86% -0.09%
==========================================
Files 1042 1052 +10
Lines 56248 56533 +285
Branches 7784 7784
==========================================
+ Hits 43282 43454 +172
- Misses 12707 12820 +113
Partials 259 259
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Awesome work @michael-s-molina! |
We had a meeting about the proposed solution in this PR and decided to take a slightly different approach to resolve the use cases. Exposing a generic key-value interface had the potential to induce users to store any type of information in the store, diminishing the benefits of the current endpoints where each type of resource is represented by a specific path. We could also introduce some conflicts with the current security model in terms of access grants. One example would be to store information in the key-value store that is related to a dashboard. The dashboard already has configured access grants and re-defining those in each key-value entry had the potential for security problems. We decided to offer the same type of functionality under each currently defined endpoint to preserve the more typed nature of our endpoints and to leverage the existing security model. I'm closing this PR to preserve all the discussions and conclusions for historical reasons and I'll open a new one with the new approach. A big thanks to all the reviewers that are helping to shape this new feature. |
SUMMARY
This PR adds a key-value store endpoint for Superset with secure key generation and management of store entries.
Keys are generated on the server-side using the https://docs.python.org/3/library/secrets.html#secrets.token_urlsafe function and the value can be any JSON supported text.
Each value can have an associated duration. A Celery cleanup task is responsible for managing expired entries in the key-value store.
The user can also set a configuration to reset the duration upon retrieval to preserve actively accessed entries.
This endpoint can be used to solve a bunch of uses cases for Superset. Each client of this endpoint should carefully consider security, normalization, and ownership implications when deciding to store some type of information into the key-value store. Some possible use cases for this endpoint are listed below:
We currently have an endpoint at
/kv/store
, but it has serial/monotonic ids which allow a user to iterate through other people’s saved states. That endpoint is also using the deprecated API system onsuperset/views
, so it would need to be refactored anyway. Another important point is that the/kv/store
endpoint does not contain any management capabilities regarding entries duration, ownership, and creation timestamps, which makes automatic sanitization harder.The security model of the entry point is inspired by the Google Drive sharing model and it’s intended to support similar use cases. By default, any added value to the key-value store is accessible only to the user who created the value. This can be altered using the
anyone_with_key
,editor_*
, andviewer_*
configurations. Theanyone_with_key
option accepts theeditor
andviewer
values indicating that anyone with the key can edit or view the store value. This is useful when storing non-sensitive information that needs to be shared among users. Theeditor_*
andviewer_*
configurations allow specific access restrictions to the key-value entries like granting read or write permissions to specific users or roles. In the case of access configuration conflicts, the more restrictive rule should always apply. An example would beanyone_with_key: ‘editor’
andeditor_users: [1, 2, 3]
. In this case, only users 1, 2, and 3 would have editor access.Here's a pseudo-code of the endpoint operations and their configurations. For the complete status codes and possible responses check the Swagger documentation:
TESTING INSTRUCTIONS
1 - Execute Python tests
2 - All tests should pass
ADDITIONAL INFORMATION