Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extensions.service for all subgraph errors #6191

Merged
merged 10 commits into from
Nov 20, 2024

Conversation

IvanGoncharov
Copy link
Member

@IvanGoncharov IvanGoncharov commented Oct 24, 2024

If include_subgraph_errors is true for a particular subgraph, all errors originating in this subgraph will have the subgraph's name exposed as a service extension.

For example, if subgraph errors are enabled, like so:

include_subgraph_errors:
  all: true # Propagate errors from all subgraphs

Note: This option is enabled by default in the dev mode.

And this products subgraph returns an error, it will have a service extension:

{
  "data": null,
  "errors": [
    {
      "message": "Invalid product ID",
      "path": [],
      "extensions": {
        "service": "products"
      }
    }
  ]
}

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Oct 24, 2024

✅ Docs Preview Ready

No new or changed pages found.

This comment has been minimized.

@router-perf
Copy link

router-perf bot commented Oct 24, 2024

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

@IvanGoncharov IvanGoncharov force-pushed the i1g/error_extensions_service branch from 421c87c to bc618f9 Compare November 5, 2024 22:23
@IvanGoncharov IvanGoncharov marked this pull request as ready for review November 5, 2024 22:24
@IvanGoncharov IvanGoncharov requested review from a team as code owners November 5, 2024 22:24
@IvanGoncharov IvanGoncharov force-pushed the i1g/error_extensions_service branch from 3dddc1d to c3eb6f4 Compare November 6, 2024 22:50
@IvanGoncharov IvanGoncharov requested a review from a team as a code owner November 6, 2024 22:50
@@ -42,36 +42,47 @@ impl Plugin for IncludeSubgraphErrors {
}

fn subgraph_service(&self, name: &str, service: subgraph::BoxService) -> subgraph::BoxService {
Copy link
Member Author

@IvanGoncharov IvanGoncharov Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include_subgraph_errors.rs is the best place I could find to add this functionality:

  1. It's guaranteed that if include_subgraph_errors: false nothing will leak. So, we can't add this code later in the pipeline.
  2. Also, we need to add it as late as possible to maximize the number of errors related to the subgraph handled by this code.

@Samjin
Copy link

Samjin commented Nov 16, 2024

It maybe vulnerable to expose backend service name to public clients for some users. Is it possible to have error extensions allowlist/blacklist to control what properties should expose to clients?

The allowlist/blacklist would also help if subgraph wants to expose non-vulnerable data to clients via errors. There maybe more than expected data they want to expose to client. In this case, a blacklist maybe more proper to have. We have few services whose clients depend on some data in the extensions. While they shouldn't, correcting it would take time. Meanwhile, their error message should be redacted.

nit: I noticed service is used instead of subgraph_name or subgraph. Is this consistent naming from Apollo? If so we'll use service as the name across telemetry and also in resp.

@IvanGoncharov
Copy link
Member Author

It maybe vulnerable to expose backend service name to public clients for some users. Is it possible to have error extensions allowlist/blacklist to control what properties should expose to clients?

@Samjin Errors exposed only if include_subgraph_errors is set.
Nothing is leaked by default, see docs: https://www.apollographql.com/docs/graphos/routing/observability/subgraph-error-inclusion

nit: I noticed service is used instead of subgraph_name or subgraph. Is this consistent naming from Apollo? If so we'll use service as the name across telemetry and also in resp.

Yes, it is consistent with existing subgraphs errors, for example:

extensions
.entry("service")
.or_insert_with(|| service.clone().into());

This PR just make this field consistently present for all errors related to particular subgraph.

@IvanGoncharov IvanGoncharov merged commit 24c4998 into dev Nov 20, 2024
14 checks passed
@IvanGoncharov IvanGoncharov deleted the i1g/error_extensions_service branch November 20, 2024 16:36
@abernix abernix mentioned this pull request Nov 26, 2024
@yanns
Copy link
Contributor

yanns commented Nov 28, 2024

When subgraphs return a user-friendly errors that we want to propagate, then we need to set include_subgraph_errors to true. But now we're exposing the service name that is an internal information.
Is there a way to use the user-friendly errors from subgraphs without exposing internal information?

@GwlCtovic
Copy link

But now we're exposing the service name that is an internal information.

@yanns The service name propagation through subgraph errors was the default behavior in@apollo/gateway if I'm not mistaken.

As a matter of fact, I'm watching this issue because migrating from @apollo/gateway to the router broke some of our monitoring tools, as they rely on subgraph automatically reporting their service

@yanns
Copy link
Contributor

yanns commented Nov 29, 2024

But now we're exposing the service name that is an internal information.

@yanns The service name propagation through subgraph errors was the default behavior in@apollo/gateway if I'm not mistaken.

I've never used @apollo/gateway so I cannot tell.
I've raised #6358 for better visibility.

@Samjin
Copy link

Samjin commented Dec 4, 2024

Relate to proposal PR #6344

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants