fix: Address regression introduced in #24789 #25008

john-bodley · 2023-08-16T19:49:22Z

SUMMARY

In #24996, @jfrag1 identified an issue with #24789 as there were no guarantees that form-data provided was backed by a trusted source.

Per #24996 (comment), I looked into seeing whether the access request could be more context aware, though regrettably the caller is of type QueryContext which is dashboard agnostic. The proposed fix, per @nytai's suggestion, was to add an additional check to verify that the dashboard chart (per the dashboardId and slice_id form-data fields) is associated with the datasource in question.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Added unit tests.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

jfrag1 · 2023-08-16T20:02:18Z

superset/security/manager.py

                    form_data
                    and (dashboard_id := form_data.get("dashboardId"))
                    and (dashboard := DashboardDAO.find_by_id(dashboard_id))
+                    and any(slc.datasource == datasource for slc in dashboard.slices)
                    and self.can_access_dashboard(dashboard)


Another thing that occurred to me recently, shouldn't we only be doing this check if dashboard rbac is enabled (or if the user is a guest user for embedded purposes).

For example before #24789 there was the following logic:

should_check_dashboard_access = ( feature_flag_manager.is_feature_enabled("DASHBOARD_RBAC") or self.is_guest_user() ) if not ( self.can_access_schema(datasource) or self.can_access("datasource_access", datasource.perm or "") or self.is_owner(datasource) or ( should_check_dashboard_access and self.can_access_based_on_dashboard(datasource)

but now, we're checking if if the user can access the dataset via the dashboard no matter what

@jfrag1 I updated the PR description.

My concern is that we're applying security rules intended for DASHBOARD_RBAC even when the flag isn't enabled. With the flag disabled, a user has access to any published dashboard containing a chart they have access to. With this logic, the user would then automatically get access to all other charts/datasets on the dashboard (albeit only with the dashboard context included), which I don't think should happen.

I think we need to bring back should_check_dashboard_access

@jfrag1 the self.can_access_dashboard(dashboard) check is still being invoked and will always raise if a user doesn't have access to the dashboard regardless of what the context is.

The other formulation I can think of is to use the referrer to identify the dashboard in question, however I guess per here one can also spoof the referrer.

I understand that, here's a more concrete example to illustrate what I'm trying to get at:

DASHBOARD_RBAC is disabled. There's a published dashboard with 2 charts, powered by datasets A and B. A user has access to dataset A, but not dataset B. The user has access to the dashboard because they have access to dataset A. The question I'd then like to pose is, what is the expected behavior when the user goes to view the dashboard?

My understanding is that the expected behavior is that the user can view the chart powered by dataset A, but the chart powered by dataset B shows an error because they don't have access to the dataset.

However, after #24789 removed the should_check_dashboard_access check, the user would be able to view both charts on the dashboard. The user's access to dataset A grants them access to the dashboard, which then grants them access to all other charts on the dashboard, which I'd argue is another regression from #24789

@jfrag1 I think the problem we're all struggling with is what the expected behavior should be. Superset's security manager was already complex and difficult to grok—which doesn't bode well from an understanding perspective—which we made worse with the introduction of the dashboard RBAC feature and embedded dashboards (which potentially have access patterns which conflict with the core manager). As the rules become more complex and/or conflicting we end up playing the game of Whac-A-Mole—where closing one attack vector likely opens another.

Fundamentally I think this—per your problem statement—comes down to "context". Let's say your two charts are A' and B' which are powered by datasets A and B respectively. With dashboard RBAC enabled the user can view chart A' in the "context" of the dashboard, but not in standalone/explorer mode. The challenge is the request is coming from the ChartDataCommand command which is dashboard agnostic.

I here the point you're making. There's an issue where the access to a dashboard is circumventing the need to check whether the user has access to the datasource. So in addition to checking whether the user can access the dashboard we also need to check (if the check is at the datasource, query-context, or viz level) whether the dashboard access check is specific to RBAC, i.e.,

and self.can_access_dashboard(dashboard)

should likely be,

and self.can_access_dashboard(dashboard) and is_feature_enabled("DASHBOARD_RBAC") and dashboard.roles

Again we're adding complex logic to try to model a complex security manager which likely exposes more corner cases/issues.

There's a few scenarios we need to enumerate and document, this is what makes the most sense to me.

DASHBOARDS_RBAC disabled: superset should behave the same way it did before this feature was introduced and permissions are applied at the dataset level via "datasource access on x"

DASHBOARD_RBAC enabled:

The dashboard has roles: If the user has rbac access via their role, all charts/fitlers/etc in the dashboard should render and the user has access to all of them.

The dashboard doesn't have any attached roles: the access perms should work the same way they do as if the feature is disabled: charts only render if the user has dataset access, users see the dashboard in their list if they have the "can read Dashboard" permission

@nytai I think that makes a lot of sense! Just a quick question, if DASHBOARD_RBAC is enabled and the dashboard has roles, the regular/default access validation should still work, right? For example:

If the FF is enabled and the dashboard has roles, both someone with role-based access and a user with regular access (granted via datasource access on x/schema access on x) should work.

Another thing to consider is if we still want to handle guest users (embedded) and DASHBOARD_RBAC access similarly. Guest users can't access explore, they are only granted access to dashboards during the token generation, and should be able to access the dashboard entirely (including filters and etc). On the other hand, DASHBOARD_RBAC users are authenticated users that can access explore/other parts of the app, so here the context is really important.

betodealmeida · 2023-08-16T20:51:13Z

superset/security/manager.py

@@ -1858,9 +1858,13 @@ def raise_for_access(
                or self.can_access("datasource_access", datasource.perm or "")
                or self.is_owner(datasource)
                or (
+                    # Check whether the datasource is associated with a dashboard in a
+                    # trustworthy manner, i.e., we need to validate that the specified
+                    # dashboard is actually associated with said datasource.


I don't think I understand this logic completely, but if dashboardId can be spoofed in the form what's preventing users from reading any datasource?

There's validation in the lines below that:
a. The user has access the the dashboard, and
b. The datasource being requested is present on the dashboard

john-bodley · 2023-08-16T21:22:19Z

@betodealmeida and @jfrag1 the caller for this check is coming from here, which mentions,

Take a query context constructed in the client and return payload data response for the given query

where access is checked here. The challenge here is given that the query context is constructed in the client then isn't any self._query_context.raise_for_access() check is subject to being spoofed?

john-bodley · 2023-08-17T19:39:22Z

superset/security/manager.py

-                    and (dashboard := DashboardDAO.find_by_id(dashboard_id))
-                    and self.can_access_dashboard(dashboard)
+                    and (
+                        dashboard_ := self.get_session.query(Dashboard)


The DashboardDAO and ChartDAO have additional filtering and/or security checks when fetching objects and thus I had to revert to using the raw SQLAlchemy ORM for fetching the dashboard and chart associated with the specified IDs.

john-bodley · 2023-08-17T19:47:13Z

Thanks—drum roll please—to @betodealmeida, @eschutho, @jfrag1 , @michael-s-molina, @nytai , and @villebro for jumping on the call earlier to discuss an interim solution to this regression which should (🤞) allow us to roll forward.

I've updated the PR to extend the previous logic (stemming from @nytai's suggestion) I had drafted to ensure that:

Per @villebro's suggestion we also leverage the chart information and verify that i) the chart is associated with the dashboard, and ii) the chart is associated with the said datasource. This is a tighter handshake than what we had previously, i.e., ensuring that the datasource was associated with said dashboard.
Per @nytai's comment I added a check that made sure the dashboard fallback is only invoked when the DASHBOARD_RBAC feature is enabled and the dashboard has associated roles.

Note this PR does not address granting access for native filters when the DASHBOARD_RBAC feature is enabled. That issue seems orthogonal to the regression this PR is trying to address.

jfrag1

This updated logic looks good to me - I think there's just one more loose end other than native filters, which is ensuring embedded guest users are granted access to datasources (for charts and native filters) with the proper dashboard context. I'm happy to work on this as a follow-up once this PR is merged

(cherry picked from commit 3f93755)

pull-request-size bot added the size/S label Aug 16, 2023

john-bodley requested review from eschutho, nytai and michael-s-molina August 16, 2023 19:49

john-bodley added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Aug 16, 2023

john-bodley mentioned this pull request Aug 16, 2023

fix: Revert "fix: Dashboard aware RBAC dataset permission (#24789)" #24996

Closed

9 tasks

john-bodley marked this pull request as ready for review August 16, 2023 19:53

jfrag1 reviewed Aug 16, 2023

View reviewed changes

betodealmeida reviewed Aug 16, 2023

View reviewed changes

john-bodley force-pushed the john-bodley--fix-24789 branch from f2e6fb7 to c3e19d2 Compare August 17, 2023 17:31

pull-request-size bot added size/M and removed size/S labels Aug 17, 2023

john-bodley force-pushed the john-bodley--fix-24789 branch from c3e19d2 to 84ffd94 Compare August 17, 2023 19:37

pull-request-size bot added size/L and removed size/M labels Aug 17, 2023

john-bodley commented Aug 17, 2023

View reviewed changes

john-bodley requested review from betodealmeida, Vitor-Avila and jfrag1 August 17, 2023 19:47

fix: Address regression introduced in apache#24789

c465fc5

john-bodley force-pushed the john-bodley--fix-24789 branch from 84ffd94 to c465fc5 Compare August 17, 2023 19:49

jfrag1 approved these changes Aug 17, 2023

View reviewed changes

eschutho approved these changes Aug 17, 2023

View reviewed changes

john-bodley merged commit 3f93755 into apache:master Aug 18, 2023
29 checks passed

john-bodley mentioned this pull request Aug 18, 2023

fix: Native filter dashboard RBAC aware dataset permission #25029

Merged

9 tasks

michael-s-molina pushed a commit that referenced this pull request Aug 21, 2023

fix: Address regression introduced in #24789 (#25008)

fad872f

(cherry picked from commit 3f93755)

mistercrunch added 🍒 3.0.0 🍒 3.0.1 labels Mar 8, 2024

mistercrunch added 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Address regression introduced in #24789 #25008

fix: Address regression introduced in #24789 #25008

john-bodley commented Aug 16, 2023 •

edited

Loading

jfrag1 Aug 16, 2023

john-bodley Aug 16, 2023

jfrag1 Aug 16, 2023 •

edited

Loading

jfrag1 Aug 16, 2023

john-bodley Aug 16, 2023 •

edited

Loading

jfrag1 Aug 16, 2023

john-bodley Aug 16, 2023 •

edited

Loading

nytai Aug 17, 2023 •

edited

Loading

Vitor-Avila Aug 17, 2023

betodealmeida Aug 16, 2023

jfrag1 Aug 16, 2023

john-bodley commented Aug 16, 2023

john-bodley Aug 17, 2023

john-bodley commented Aug 17, 2023 •

edited

Loading

jfrag1 left a comment

fix: Address regression introduced in #24789 #25008

fix: Address regression introduced in #24789 #25008

Conversation

john-bodley commented Aug 16, 2023 • edited Loading

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfrag1 Aug 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

john-bodley Aug 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

john-bodley Aug 16, 2023 • edited Loading

Choose a reason for hiding this comment

nytai Aug 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

john-bodley commented Aug 16, 2023

Choose a reason for hiding this comment

john-bodley commented Aug 17, 2023 • edited Loading

jfrag1 left a comment

Choose a reason for hiding this comment

john-bodley commented Aug 16, 2023 •

edited

Loading

jfrag1 Aug 16, 2023 •

edited

Loading

john-bodley Aug 16, 2023 •

edited

Loading

john-bodley Aug 16, 2023 •

edited

Loading

nytai Aug 17, 2023 •

edited

Loading

john-bodley commented Aug 17, 2023 •

edited

Loading