Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(project): Treat invalid project states as pending and unify state types #3770

Merged
merged 52 commits into from
Jul 18, 2024

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Jun 27, 2024

We currently deal with different envelope states inconsistently: Because the state of a project is encoded into many separate fields and types, it's hard to enforce that the config is validated correctly everywhere that it is used.

For example, invalid (i.e. unparsable) project configs are sometimes treated the same as pending, sometimes the same as disabled:

// if we recorded an invalid project state response from the upstream (i.e. parsing
// failed), discard the event with a state reason.
if self.invalid() {
return Err(DiscardReason::ProjectState);
}

// If the state is still invalid, return back the taken channel and schedule state update.
if state.invalid() {
self.state_channel = Some(channel);
let attempts = self.backoff.attempt() + 1;

New types

classDiagram

Project "1" *-- "1" ProjectFetchState

  ProjectState <|-- Enabled
  ProjectState <|-- Disabled
  ProjectState <|-- Pending

  Enabled "1" *-- "1" ProjectInfo

ProjectInfo "1" *-- "1" ProjectConfig

  ProjectFetchState *-- ProjectState
  ProjectFetchState: -last_fetch

  ProjectFetchState --> "computes" ExpiryState
  ExpiryState --> "Exposes" ProjectState
Loading

Future Work

  • Replace GetOrFetch by ProjectState.
  • Related to the above: Pending does not currently guarantee that another "fetch" is scheduled. We might be able to enforce this by typing Pending as Pending(StateReceiver).
  • Replace "tight" retry loop for pending with the same backoff mechanism we use for unparsable projects. This might require setting a custom intitial backoff.
  • Introduce an explicit ProjectState::Allowed state to special case proxy relays.

Fixes https://github.com/getsentry/team-ingest/issues/364.

@jjbayer jjbayer changed the title Ref/server state cleanup Unify project states Jul 9, 2024
.enter(
envelope,
state.outcome_aggregator().clone(),
state.test_store().clone(),
group,
)
.map_err(BadStoreRequest::QueueFailed)?;
envelope.scope(scoping);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the split-out envelope was scoped by a call to check_envelope in handle_processing. That function takes a valid ProjectInfo now, so we have to do it here.

#[serde(with = "LimitedProjectInfo")]
#[serde(flatten)]
info: ProjectInfo,
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced a helper struct ParsedProjectState to allow parsing the wire format in different places. Ideally this would be hidden away in the Serialize / Deserialize implementation, but there is one location (load_local_states) where we actually need to be able to parse a disabled state plus project keys.

)
})?;
self.dsc().map(|dsc| dsc.public_key)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed a queue_key() utility, so I decided to make sampling_key() a method as well, instead of a standalone function.

@@ -74,7 +74,7 @@ use crate::service::ServiceError;
use crate::services::global_config::GlobalConfigHandle;
use crate::services::outcome::{DiscardReason, Outcome, TrackOutcome};
use crate::services::processor::event::FiltersStatus;
use crate::services::project::ProjectState;
use crate::services::project::ProjectInfo;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProjectState is now an enum. ProjectInfo now contains most of the old fields of the old ProjectState struct, minus disabled.

@@ -279,7 +279,7 @@ mod tests {
use crate::services::processor::{
ProcessEnvelope, ProcessingExtractedMetrics, ProcessingGroup, SpanGroup,
};
use crate::services::project::ProjectState;
use crate::services::project::ProjectInfo;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProjectState is now an enum. ProjectInfo contains the fields that ProjectState used to have (project_id, config, etc.), but is only accessible if the state is Enabled.

Comment on lines +1195 to +1198
// Unspooled envelopes need to be checked, just like we do on the fast path.
if let Ok(CheckedEnvelope { envelope: Some(envelope), rate_limits: _ }) = metric!(timer(RelayTimers::ProjectCacheTaskDuration), task = "handle_check_envelope", {
broker.handle_check_envelope(CheckEnvelope::new(managed_envelope))
}) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new.

relay-server/src/services/project_cache.rs Outdated Show resolved Hide resolved
let ValidateEnvelope { envelope: context } = message;
// TODO: check_envelope here?

let ValidateEnvelope {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the project checks are now done in handle_validate_envelope. There is no more redundant check in handle_processing.

states.insert(key.public_key, Arc::new(sanitized.clone()));
let mut state = state.clone();
state.info.public_keys = smallvec::smallvec![key.clone()];
states.insert(key.public_key, ProjectState::from(state).sanitize());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it hard to make ParsedProjectState a private concern of the (de)serializer: We need to peek into the config to get the project key.

relay-server/src/services/project_upstream.rs Outdated Show resolved Hide resolved
@jjbayer jjbayer marked this pull request as ready for review July 16, 2024 15:42
@jjbayer jjbayer requested a review from a team as a code owner July 16, 2024 15:42
Copy link
Member

@Dav1dde Dav1dde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do we have an integration test for invalid?.
  • Don't forget to update the PR title.
  • I think we should have a changelog entry for this, quite a big change.

relay-server/src/envelope.rs Outdated Show resolved Hide resolved
relay-server/src/endpoints/project_configs.rs Outdated Show resolved Hide resolved
/// - The upstream returned "pending" for this project (see [`crate::services::project_upstream`]).
/// - The upstream returned an unparsable project so we have to try again.
/// - The project has expired and must be treated as "has not been fetched".
Pending,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if Missing would be a better name, e.g. when a project is expired you return Pending, but there is no gurantuee it's actually pending, it's just missing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit hesitant to call it "missing" because when a projectconfig response does not contain a key, we treat the project as disabled and we previously called this state "missing", see

/// Project state for a missing project.
pub fn missing() -> Self {
ProjectState {
project_id: None,
last_change: None,
disabled: true,

self.project_key
);
let is_enabled = match self.current_state() {
ProjectState::Enabled(info) => info.has_feature(Feature::CustomMetrics),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to implement has_feature on the ProjectState. This match + check happens at least twice already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, but how would you handle the Pending variant? In this file alone we have two different behaviors (treat as enabled vs. early return).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I didn't look that deep, I assumed pending = false. If that is just becomes a footgun, better not do it.

relay-server/src/services/project.rs Outdated Show resolved Hide resolved
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ParsedProjectState {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this private and have the (de)-serialize impl of ProjectState use this?

Possibly tricky with the limited versions though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it hidden in a mod that containted the (De)Serialize implementations, but unfortunately, project_local needs to look inside the config even for disabled states. See https://github.com/getsentry/relay/pull/3770/files/3452ab1d4af042327565421a8297cfa4afbb83e4#r1679462820

@@ -39,6 +39,15 @@ struct VersionQuery {
version: u16,
}

#[derive(Debug, Clone, Serialize)]
#[serde(rename_all = "camelCase", remote = "ParsedProjectState")]
struct LimitedParsedProjectState {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to where ParsedProjectState lives?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only required for serialization in the endpoint, so it should in theory be private. But it makes sense to keep the Limited* types next to their unlimited types to keep them in sync, so 👍.

/// This state is used for forwarding in Proxy mode.
pub fn allowed() -> Self {
Self::enabled(ProjectInfo {
project_id: None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to eliminate this in a follow-up, I think a ProjectInfo with a non optional project_id and organization_id could reduce some cruft and possibly also help solve our problem where feature flags in proxy mode are treated as disabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will note this as a follow-up.

}

/// Sanitizes the contained project state. See [`ProjectState::sanitize`].
pub fn sanitize(self) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn sanitize(self) -> Self {
pub fn sanitized(self) -> Self {

Seems more like a transition than a mutation

relay-server/src/services/project_upstream.rs Outdated Show resolved Hide resolved
@jjbayer jjbayer requested a review from Dav1dde July 18, 2024 09:10
@jjbayer
Copy link
Member Author

jjbayer commented Jul 18, 2024

Do we have an integration test for invalid?

@Dav1dde there is test_unparsable_project_config, but on closer look I don't understand why it passed before on master, will investigate.

Don't forget to update the PR title.

Any suggestions? Something like "Treat invalid project states as pending"? Or rather "ProjectState enum"?

Copy link
Member

@Dav1dde Dav1dde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Dav1dde
Copy link
Member

Dav1dde commented Jul 18, 2024

Any suggestions? Something like "Treat invalid project states as pending"? Or rather "ProjectState enum"?

Maybe something like: ref(project): Treat invalid project states as pending and unify project states. The functional change and the motivation.

}
} == data
data = request_config(relay, packed, signature, version="3").json()
assert data == {"configs": {}, "pending": [public_key]}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relay was already configured to buffer envelopes for unparsable projects, but downstream relays would get them as disabled and drop data anyway. This is now fixed.

@jjbayer jjbayer changed the title Unify project states ref(project): Treat invalid project states as pending and unify state types Jul 18, 2024
@jjbayer jjbayer merged commit 87ceb2c into master Jul 18, 2024
24 checks passed
@jjbayer jjbayer deleted the ref/server-state-cleanup branch July 18, 2024 12:01
jjbayer added a commit that referenced this pull request Jul 18, 2024
This started buffering all envelopes to memory on S4S, need to
investigate why.

Reverts #3770.
jjbayer added a commit that referenced this pull request Jul 23, 2024
… types (#3770)

We currently deal with different envelope states inconsistently: Because
the state of a project is encoded into many separate fields and types,
it's hard to enforce that the config is validated correctly everywhere
that it is used.

For example, invalid (i.e. unparsable) project configs are sometimes
treated the same as pending, sometimes the same as disabled.
jjbayer added a commit that referenced this pull request Jul 24, 2024
Fixes and reapplies #3770.

Above PR had a bug that created a tight spool-unspool loop for disabled
projects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants