Fix task cancellation authz on fulfilling cluster #109357

albertzaharovits · 2024-06-04T16:36:39Z

This fixes task cancellation actions (i.e. internal:admin/tasks/cancel_child
and internal:admin/tasks/ban) not being authorized by the fulfilling cluster.
This can result in orphaned tasks on the fulfilling cluster.

…asks/ban] is not an index or cluster action

elasticsearchmachine · 2024-06-04T16:37:07Z

Pinging @elastic/es-security (Team:Security)

elasticsearchmachine · 2024-06-04T16:37:30Z

Hi @albertzaharovits, I've created a changelog YAML for you.

albertzaharovits · 2024-06-04T16:45:49Z

@n1v0lg The root cause here is that internal: actions should be authorized by the local system internal user in

elasticsearch/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authz/AuthorizationService.java

Lines 315 to 317 in 6955427

    
           if (SystemUser.is(authentication.getEffectiveSubject().getUser())) { 
        
               // this never goes async so no need to wrap the listener 
        
               authorizeSystemUser(authentication, action, auditId, unwrappedRequest, listener);

, because they are otherwise rejected in

elasticsearch/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authz/AuthorizationService.java

Lines 535 to 539 in be502cc

    
           } else { 
        
               logger.warn("denying access as action [{}] is not an index or cluster action", action); 
        
               auditTrail.accessDenied(requestId, authentication, action, request, authzInfo); 
        
               listener.onFailure(actionDenied(authentication, authzInfo, action, request)); 
        
           }

But internal: actions from a remote cluster are not perceived as originating from the local system user (it is the remote system user), in which case they appear as internal: actions from a non-system user, which is rejected.

n1v0lg

LGTM 👍 We also discussed on Slack:

We should backport this to 8.14.1
This "breaks" in a mixed cluster setting (e.g., in an 8.13.x and 8.15.0 cluster). However, if the failure mode is simply a new exception (instead of the current authz failed one) it's not worth addressing this, esp. since RCS 2.0 is in beta before 8.13
An assertion/log for when a "remote" system user fails authz on an internal action would to nice (either in this PR or a follow-up) to make this easier to detect in the future.

Happy to re-review if point 2 above ends up requiring some additional changes, but as it stands this looks ready to me. Thanks for tracking this down!

n1v0lg · 2024-06-05T12:17:31Z

...src/javaRestTest/java/org/elasticsearch/xpack/remotecluster/RemoteClusterSecurityRestIT.java

+            String asyncSearchId = (String) submitAsyncSearchResponseMap.get("id");
+            assertThat(asyncSearchId, notNullValue());
+            // wait for the tasks to show up on the querying cluster
+            assertTrue(waitUntil(() -> {


Nit: based on Javadoc and this issue, assertBusy is apparently more canonical.

I'm slightly in favor of following that convention (and using assertBusy with an inner assertTrue) but I'm not pushy here. To be controversial, I think assertTrue(waitUntil(...)) is actually more readable... Still, I think since there is a convention and it's easy to follow, it's probably the right move to follow it (i.e., use assertBusy).

Nice, thanks for the pointer! Pushed bb508f5

n1v0lg · 2024-06-05T12:19:44Z

...src/javaRestTest/java/org/elasticsearch/xpack/remotecluster/RemoteClusterSecurityRestIT.java

+                        {
+                          "name": "*:*",
+                          "error_type": "exception",
+                          "stall_time_seconds": 30


Given how slow everything is I wonder if we want to start out more generous and go for a whole minute (or even longer). This is a pretty arbitrary suggestion and I don't have concrete data to back it, but I have a hunch that 30s will be exceeded in a slow CI run shortly after we merge this...

Agreed, pushed 03807fb

albertzaharovits · 2024-06-05T16:11:20Z

@n1v0lg

This "breaks" in a mixed cluster setting (e.g., in an 8.13.x and 8.15.0 cluster). However, if the failure mode is simply a new exception (instead of the current authz failed one) it's not worth addressing this, esp. since RCS 2.0 is in beta before 8.13

As discussed, I did a manual check for this.
On the querying cluster, if the cancellation action is either missing or is unauthorized, the log messages are almost identical:

elasticsearch/server/src/main/java/org/elasticsearch/tasks/TaskCancellationService.java

Line 200 in 17c6230

final Throwable cause = ExceptionsHelper.unwrapCause(exp);

. On the fulfilling cluster, if the action is un-authorized we see a log entry

elasticsearch/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authz/AuthorizationService.java

Line 536 in 17c6230

    
           logger.warn("denying access as action [{}] is not an index or cluster action", action);

but if the action is missing there's no corresponding log entry

elasticsearch/server/src/main/java/org/elasticsearch/transport/InboundAggregator.java

Line 47 in 17c6230

throw new ActionNotFoundTransportException(actionName);

That means that, given the the fix as it currently stands, it won't make it a worse experience when communicating with clusters that don't have the fix (e.g. 8.14.0). Just the log message will be slightly different.

albertzaharovits · 2024-06-05T16:33:04Z

@n1v0lg

An assertion/log for when a "remote" system user fails authz on an internal action would to nice (either in this PR or a follow-up) to make this easier to detect in the future.

I pushed 1f700ab such that the log message on the fulfilling cluster includes the authentication. From that, a keen eye can spot if it's a remote system user or not. For example, this is a sample log error:

denying access for [Authentication[effectiveSubject=Subject{version=8676000, user=User[username=test_user,roles=[],fullName=null,email=null,metadata={}], realm={Realm[_es_cross_cluster_access._es_cross_cluster_access] on Node[fulfilling-cluster-0]}, type=CROSS_CLUSTER_ACCESS, metadata={_security_api_key_creator_realm_name=default_file, _security_api_key_limited_by_role_descriptors=org.elasticsearch.common.bytes.BytesArray@1323, _security_api_key_id=0wYw6Y8BRt8c6GrAors2, _security_api_key_type=cross_cluster, _security_cross_cluster_access_authentication=Authentication[effectiveSubject=Subject{version=8676000, user=User[username=_system,roles=[],fullName=null,email=null,metadata={}], realm={Realm[__attach.__attach] on Node[query-cluster-0]}, type=USER, metadata={}},type=INTERNAL], _security_api_key_creator_realm_type=file, _security_api_key_name=cross_cluster_access_key, _security_api_key_role_descriptors=org.elasticsearch.common.bytes.BytesArray@8def201, _security_cross_cluster_access_role_descriptors=[]}},type=API_KEY]] as action [internal:admin/tasks/ban] is not an index or cluster action

It's verbose AF, but _security_cross_cluster_access_authentication=Authentication[effectiveSubject=Subject{version=8676000, user=User[username=_system tells one that this is a remote system user.

I think this should aid debugging and covers the case of a remote system user getting rejected. WDYT?

n1v0lg · 2024-06-06T08:28:05Z

I think this should aid debugging and covers the case of a remote system user getting rejected. WDYT?

@albertzaharovits sounds good!

This fixes task cancellation actions (i.e. internal:admin/tasks/cancel_child and internal:admin/tasks/ban) not being authorized by the fulfilling cluster. This can result in orphaned tasks on the fulfilling cluster.

…109422) This fixes task cancellation actions (i.e. internal:admin/tasks/cancel_child and internal:admin/tasks/ban) not being authorized by the fulfilling cluster. This can result in orphaned tasks on the fulfilling cluster. Backport of #109357

albertzaharovits added 6 commits June 2, 2024 15:21

WIP [fulfilling-cluster-0] denying access as action [internal:admin/t…

48bae1d

…asks/ban] is not an index or cluster action

Polished test

1f8a96c

Merge branch 'main' into test-for-cancelling-tasks

f030d63

remove dummy test.txt

645c39c

Nit

a7ef715

Spotless

e6777f3

albertzaharovits added >bug :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC labels Jun 4, 2024

albertzaharovits requested review from original-brownbear and n1v0lg June 4, 2024 16:36

albertzaharovits self-assigned this Jun 4, 2024

albertzaharovits marked this pull request as ready for review June 4, 2024 16:36

albertzaharovits requested a review from a team as a code owner June 4, 2024 16:36

elasticsearchmachine added the v8.15.0 label Jun 4, 2024

elasticsearchmachine added the Team:Security Meta label for security team label Jun 4, 2024

Update docs/changelog/109357.yaml

49c83b3

Merge branch 'main' into test-for-cancelling-tasks

f89510b

n1v0lg approved these changes Jun 5, 2024

View reviewed changes

albertzaharovits added 3 commits June 5, 2024 19:21

log msg includes authn

1f700ab

assertBusy

bb508f5

Stall time 60

03807fb

albertzaharovits added the v8.14.1 label Jun 5, 2024

Merge branch 'main' into test-for-cancelling-tasks

a2eb500

albertzaharovits merged commit ed0febb into elastic:main Jun 6, 2024
20 checks passed

albertzaharovits mentioned this pull request Jun 6, 2024

[8.14] Fix task cancellation authz on fulfilling cluster (#109357) #109422

Merged

albertzaharovits deleted the test-for-cancelling-tasks branch June 6, 2024 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix task cancellation authz on fulfilling cluster #109357

Fix task cancellation authz on fulfilling cluster #109357

albertzaharovits commented Jun 4, 2024

elasticsearchmachine commented Jun 4, 2024

elasticsearchmachine commented Jun 4, 2024

albertzaharovits commented Jun 4, 2024

n1v0lg left a comment

n1v0lg Jun 5, 2024

albertzaharovits Jun 5, 2024

n1v0lg Jun 5, 2024

albertzaharovits Jun 5, 2024

albertzaharovits commented Jun 5, 2024

albertzaharovits commented Jun 5, 2024

n1v0lg commented Jun 6, 2024

Fix task cancellation authz on fulfilling cluster #109357

Fix task cancellation authz on fulfilling cluster #109357

Conversation

albertzaharovits commented Jun 4, 2024

elasticsearchmachine commented Jun 4, 2024

elasticsearchmachine commented Jun 4, 2024

albertzaharovits commented Jun 4, 2024

n1v0lg left a comment

Choose a reason for hiding this comment

n1v0lg Jun 5, 2024

Choose a reason for hiding this comment

albertzaharovits Jun 5, 2024

Choose a reason for hiding this comment

n1v0lg Jun 5, 2024

Choose a reason for hiding this comment

albertzaharovits Jun 5, 2024

Choose a reason for hiding this comment

albertzaharovits commented Jun 5, 2024

albertzaharovits commented Jun 5, 2024

n1v0lg commented Jun 6, 2024