-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move QP compatibility checks into constructor and add metric #5811
Conversation
@SimonSapin, please consider creating a changeset entry in |
CI performance tests
|
Edit: metrics can be a follow-up PR |
field | ||
.directives | ||
.iter() | ||
.filter(|d| d.name.as_str() == JOIN_FIELD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we look up the correct name of the directive on the supergraph's .metadata()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On one hand, probably yes. On the other hand, we have enough occurrences of "join__
in the code base that I doubt anything works if a supregraph uses import renames.
Maybe composition supports renames when reading subgraphs be never emits them in a supergraph?
This adds a function to capture metrics for rust qp intialisation. We want to capture whether or not initialisation was a success. We also want to capture which unsupported feature triggered a failure - `@context`, progressive `@overrides` or a fed1 supergraph. There is an addition of "internal init error" to capture the remaining possible match arms. In order to classify unsupported features, this commit also adds an `UnsupportedFeatureKind` enum to apollo-federation errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the single attribute!
One thing we should keep in mind: does this give users a scary warning on startup if they're using default configuration (both_best_effort
) and new federation features (like prog. override)? It should be clear that they don't need to worry about anything except fed 1 issues, I think.
} | ||
} | ||
|
||
metric_rust_qp_init(None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be in an else or we get duplicate metics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this made me realise we are missing the other two kinds of errors in this match - AggregateFederationError and MultipleFederationErrors. They would have been counted as a "success" 🙈
Co-authored-by: Renée <renee.kooi@apollographql.com>
additionally adds integration tests with reloading a schema with a broken one for `new`, `both_best_effort` and `both` query planner modes.
@goto-bus-stop at the moment we get a 2024-08-16T13:45:45.605895Z WARN Failed to initialize the new query planner, falling back to legacy: The supergraph schema failed to produce a valid API schema: `experimental_query_planner_mode: new` or `both` cannot yet be used with progressive overrides. Remove uses of progressive overrides to try the experimental query planner, otherwise switch back to `legacy` or `both_best_effort`.", Followed by a log saying the reload is complete: 2024-08-16T13:45:45.777923Z INFO reload complete Should we soften the wording a bit more in the |
} | ||
|
||
#[tokio::test(flavor = "multi_thread")] | ||
async fn fed1_schema_with_both_best_effort_qp() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added reload to both_best_effort integration tests for all 3 features (fed1, context, progressive overrides) so we can also check that the log differs between setting the config to new and both_best_effort
use crate::integration::common::graph_os_enabled; | ||
use crate::integration::IntegrationTest; | ||
|
||
const PROMETHEUS_METRICS_CONFIG: &str = include_str!("telemetry/fixtures/prometheus.router.yaml"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added prometheus to the config so we can also check that the metrics are emitted for the reload integration tests.
This is ready for a re-review on Monday @goto-bus-stop. The failing CI is due to a transitive dependency that requires a rustc upgrade. |
if let Some(init_error_kind) = init_error_kind { | ||
u64_counter!( | ||
"apollo.router.lifecycle.query_planner.init", | ||
"Rust query planner initialization", | ||
1, | ||
"init.error_kind" = init_error_kind, | ||
"init.is_success" = false | ||
); | ||
} else { | ||
u64_counter!( | ||
"apollo.router.lifecycle.query_planner.init", | ||
"Rust query planner initialization", | ||
1, | ||
"init.is_success" = true | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok for a metric attribute to be sometimes missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. but it could maybe be collapsed to a single attribute (init.error_kind = "none"
). Though it's the same cardinality either way
We can make the warning less scary by downgrading it to |
router | ||
.assert_log_contains( | ||
"could not create router: \ | ||
The supergraph schema failed to produce a valid API schema: \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mention of API schema here is not quite correct. It comes from apollo-router/src/error.rs
:
impl From<FederationError> for ServiceBuildError {
fn from(err: FederationError) -> Self {
ServiceBuildError::ApiSchemaError(err)
}
}
QP initialization is the only place where this enum variant is used (since API schema generation has moved to apollo_router::spec::Schema
creation), so it should be renamed to QueryPlannerInit
or something, and the message reworded
This aligns better with
experimental_query_planner_mode: both_best_effort
falling back tolegacy
when theQueryPlanner::new
constructor returns an error.Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩