Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adapter: introspection subscribes (coordinator-managed) #27795

Merged
merged 5 commits into from
Jun 26, 2024

Conversation

teskje
Copy link
Contributor

@teskje teskje commented Jun 21, 2024

This PR introduces the overall infrastructure for introspection subscribes, implemented by a new coord::introspection module that extends the coordinator with methods to install introspection subscribes and process updates for them.

The results of introspection subscribes are still discarded here. Follow-up PRs will add the final pieces that writes them to storage.

The relevant design doc is #27548, except that the design proposes to let the controller install the subscribes, to cut down on implementation effort. An implementation of the controller-managed approach is provided in #27709. This PR is an attempt to implement the coordinator-managed approach after all, which has two major benefits over the controller-managed one:

  • Introspection subscribe queries can be written in SQL rather than in LIR, making the addition of new subscribes significantly easier.
  • Introspection subscribes can read from all builtin sources, not just per-replica introspection sources.

Motivation

  • This PR adds a known-desirable feature.

Part of https://github.com/MaterializeInc/database-issues/issues/7898

Tips for reviewer

This seems to work, but I don't understand the coordinator code well enough to say whether it's a reasonable implementation.

I initially tried to re-use the existing SUBSCRIBE sequencing code, but couldn't get it to work. This code implicitly assumes the existence of a client connection and a transaction, and I wasn't able to work around that. So instead I introduced dedicated sequencing stages for the introspection subscribes, which take inspiration from the SUBSCRIBE ones but are kept simpler.

Checklist

@teskje teskje force-pushed the introspection-subscribes-coord branch 5 times, most recently from 1c4b77e to 07ab2a0 Compare June 23, 2024 18:59
@teskje teskje marked this pull request as ready for review June 24, 2024 08:07
@teskje teskje requested review from a team as code owners June 24, 2024 08:07
@teskje teskje requested a review from ParkMyCar June 24, 2024 08:07
Copy link

shepherdlybot bot commented Jun 24, 2024

Risk Score:83 / 100 Bug Hotspots:8 Resilience Coverage:100%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test 🔍 Detected
  • (Required) Observability 🔍 Detected
  • (Required) QA Review 🔍 Detected
  • (Required) Run Nightly Tests
  • Unit Test
Risk Summary:

The risk associated with this pull request is high, with a score of 83. It is important to note that historically, pull requests with similar characteristics to the predictors used here—namely, average line count in files and executable lines within files—are 132% more likely to introduce bugs compared to the repository's baseline. Additionally, while the repository's observed bug trend is decreasing, the predicted bug trend is on the rise. There are also 8 modified files in this pull request that have recently had a high number of bug fixes, indicating potential risk areas.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../controller/instance.rs 99
../inner/create_index.rs 93
../inner/peek.rs 97
../src/coord.rs 98
../sequencer/inner.rs 100
../src/lib.rs 98
../inner/create_view.rs 94
../inner/create_materialized_view.rs 95

Copy link
Contributor

@maddyblue maddyblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this LGTM, but we need to do more thinking about how to handle session-less execution of sequence_staged. To me, doing that in a considered way should be a prerequisite for this PR because we haven't thought through the edge cases of that, and I'm concerned about unexpected errors popping up due to using a dummy session and the coordinator panicing (as you discovered and commented about in this PR).

One initial brainstorm idea: sequence_staged gets its first argument changed to like

enum StagedContext {
    Execute(ExecuteContext),
    System(ClientTransmitter<ExecuteResponse>),
}

then the stage() fns of each impl can match on that and complain if they require an Execute variant (i.e., require a Session), or merely a ClientTransmitter which both variants have. Probably:

impl StagedContext {
    fn session(&self) -> Option<&Session> {
        match self {
            StagedContext::Execute(ctx) => Some(ctx.session()),
            StagedContext::System(_) => None,
        }
    }

    fn tx(&self) -> &ClientTransmitter<ExecuteResponse> {
        match self {
            StagedContext::Execute(ctx) => ctx.tx(),
            StagedContext::System(tx) => tx,
        }
    }
}

with _mut variations too? Will almost certainly need other stuff, but it's one idea that would make this much more typesafe and avoid any assumption about a session existing.

src/adapter/src/coord/message_handler.rs Show resolved Hide resolved
replica_id,
spec,
};
self.introspection_subscribes.insert(id, subscribe);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should assert it returns None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we just minted the GlobalId at the beginning of the method, so this assert would be equivalent to one that checks that allocate_transient_id produces unique IDs. I'm not against adding it if you think it's valuable, but it might be more confusing than helpful.

src/adapter/src/coord/introspection.rs Outdated Show resolved Hide resolved
src/adapter/src/coord/introspection.rs Outdated Show resolved Hide resolved
src/adapter/src/coord/introspection.rs Outdated Show resolved Hide resolved

fn drop_introspection_subscribe(&mut self, id: GlobalId) {
let Some(subscribe) = self.introspection_subscribes.remove(&id) else {
soft_panic_or_log!("attempt to remove unknown introspection subscribe (id={id})");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it safe to log in production instead of panic here? Panicing is bad, but so is continuing to run with a corrupted internal state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have become very careful of panicking in envd. In the past it happened a couple times that we got into a crashloop due to some bug in the code, and then could resolve this only be pushing a patch that changed the panicking assert to a soft one. Even if we don't crashloop, a panic now and then might make the environment unstable, causing a maybe harmless bug in the code to be elevated to a high-urgency one. There are of course still asserts that need to be hard because they protect correctness.

In this case I don't see how correctness could be impacted. Most likely we get here by trying to drop an introspection subscribe twice, in which case ignoring the second drop is probably what we want. Less likely, we have somehow mixed up subscribe IDs and forget to drop the subscribe we really wanted to drop, in which case some coordinator memory will be wasted and some entries will not be cleaned up from an introspection relation until the next restart. This seems preferable to panicking for me.

@maddyblue maddyblue force-pushed the introspection-subscribes-coord branch from 07ab2a0 to b22445e Compare June 25, 2024 06:22
@teskje teskje force-pushed the introspection-subscribes-coord branch from b22445e to f01cf37 Compare June 25, 2024 10:44
def-
def- previously requested changes Jun 25, 2024
test/testdrive/mzcompose.py Outdated Show resolved Hide resolved
This commit introduces the overall infrastructure for introspection
subscribes, implemented by a new `coord::introspection` module that
extends the coordinator with methods to install introspection subscribes
and process updates for them.

This commit also introduces a first introspection subscribe that
surfaces dataflow error counts, as the first test case for this
infrastructure.
@teskje teskje force-pushed the introspection-subscribes-coord branch from f01cf37 to 8e73854 Compare June 25, 2024 11:50
@teskje teskje force-pushed the introspection-subscribes-coord branch from 8e73854 to 359c84a Compare June 25, 2024 11:52
@teskje teskje requested a review from def- June 25, 2024 11:52
@teskje teskje force-pushed the introspection-subscribes-coord branch from 359c84a to 7359ff4 Compare June 25, 2024 12:01
@teskje teskje dismissed def-’s stale review June 26, 2024 08:15

Change request has been addressed

@teskje
Copy link
Contributor Author

teskje commented Jun 26, 2024

TFTRs!

@teskje teskje merged commit 9a1b5a8 into MaterializeInc:main Jun 26, 2024
190 of 194 checks passed
@teskje teskje deleted the introspection-subscribes-coord branch June 26, 2024 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants