Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[baggage] validate key and values during baggage injection/extraction #5647

Closed

Conversation

lachmatt
Copy link
Contributor

@lachmatt lachmatt commented May 23, 2024

Fixes #5479
Design discussion issue #

Changes

  • Baggage item key is no longer encoded and decoded - it is validated against token requirement from the spec instead
  • Baggage item value is validated during extraction
  • Only invalid baggage items are rejected during injection
  • All of the baggage items are rejected during extraction if any of them is invalid (more strict than injection)

NOTE:
Amount of allocations (e.g in extract part) can be reduced, I'd prefer to address it in a separate PR.

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • Changes in public API reviewed (if applicable)

Copy link

codecov bot commented May 23, 2024

Codecov Report

Attention: Patch coverage is 94.28571% with 4 lines in your changes missing coverage. Please review.

Project coverage is 86.01%. Comparing base (6250307) to head (6b88989).
Report is 251 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5647      +/-   ##
==========================================
+ Coverage   83.38%   86.01%   +2.63%     
==========================================
  Files         297      254      -43     
  Lines       12531    11114    -1417     
==========================================
- Hits        10449     9560     -889     
+ Misses       2082     1554     -528     
Flag Coverage Δ
unittests ?
unittests-Project-Experimental 86.01% <94.28%> (?)
unittests-Project-Stable 86.01% <94.28%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...emetry.Api/Internal/OpenTelemetryApiEventSource.cs 83.33% <100.00%> (+2.08%) ⬆️
...metry.Api/Context/Propagation/BaggagePropagator.cs 88.98% <93.93%> (+3.49%) ⬆️

... and 128 files with indirect coverage changes

}

[Event(13, Message = "Baggage item value is invalid, value = '{0}'", Level = EventLevel.Warning)]
public void BaggageItemValueIsInvalid(string value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Should we inform what was the key for the given value? It might be useful for debugging.

@lachmatt lachmatt marked this pull request as ready for review May 23, 2024 09:36
@lachmatt lachmatt requested a review from a team May 23, 2024 09:36
@vishweshbankwar
Copy link
Member

@lachmatt - Could you please clarify where these requirements are coming from?

  • Only invalid baggage items are rejected during injection
  • All of the baggage items are rejected during extraction if any of them is invalid (more strict than injection)

@lachmatt
Copy link
Contributor Author

@lachmatt - Could you please clarify where these requirements are coming from?

  • Only invalid baggage items are rejected during injection
  • All of the baggage items are rejected during extraction if any of them is invalid (more strict than injection)

This is not a requirement, but a choice made during implementation (I'm open for suggestions how to improve it).

For the injection part , Baggage API allows arbitrary strings as names. This is less restrictive than key requirements from the W3C Baggage specification.
What would be the preferred behavior for situation when one of the names previously accepted in Baggage API Set operation doesn't meet token requirement from the W3C Baggage specification?
I assumed it might be beneficial to propagate valid items in that case.

For the extraction part, current behavior is consistent with e.g. how invalid items are handled during tracestate extraction.

As far as I know, invalid values are handled differently by different language implementations, e.g.:

  • in Java, only invalid baggage items are ignored, during both injection and extraction (valid items are propagated)
  • in Go, all baggage items are ignored during extraction if invalid item is recognized

@vishweshbankwar
Copy link
Member

vishweshbankwar commented May 29, 2024

@lachmatt - Could you please clarify where these requirements are coming from?

  • Only invalid baggage items are rejected during injection
  • All of the baggage items are rejected during extraction if any of them is invalid (more strict than injection)

This is not a requirement, but a choice made during implementation (I'm open for suggestions how to improve it).

For the injection part , Baggage API allows arbitrary strings as names. This is less restrictive than key requirements from the W3C Baggage specification. What would be the preferred behavior for situation when one of the names previously accepted in Baggage API Set operation doesn't meet token requirement from the W3C Baggage specification? I assumed it might be beneficial to propagate valid items in that case.

For the extraction part, current behavior is consistent with e.g. how invalid items are handled during tracestate extraction.

As far as I know, invalid values are handled differently by different language implementations, e.g.:

  • in Java, only invalid baggage items are ignored, during both injection and extraction (valid items are propagated)
  • in Go, all baggage items are ignored during extraction if invalid item is recognized

@lachmatt - Shouldn't this be coming from spec?

Looking at the spec

However, the specific Propagators that are used to transmit baggage entries across component boundaries may impose their own restrictions on baggage names. For example, the W3C Baggage specification restricts the baggage keys to strings that satisfy the token definition from RFC7230, Section 3.2.6

This seems to suggest it is not a MUST?

Also, looking at this https://www.w3.org/TR/baggage/#mutating-baggage. Looks like there is already a suggested approach for values.

If a system determines that the value of a baggage entry is not in the format defined in this specification, it MAY remove that entry before propagating the baggage header as part of outgoing requests.

But a different approach is proposed here?

@lachmatt
Copy link
Contributor Author

lachmatt commented Jun 3, 2024

@vishweshbankwar I created an issue in the spec

@xiang17
Copy link
Contributor

xiang17 commented Jun 6, 2024

@vishweshbankwar I created an issue in the spec

I think the spec says to remove that (baggage) entry if conditions are met, rather than dropping the entire baggage.

If a system determines that the value of a baggage entry is not in the format defined in this specification, it MAY remove that entry before propagating the baggage header as part of outgoing requests.

}
while (e.MoveNext() && itemCount < MaxBaggageItems);

if (baggage is not null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected behavior if baggage is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

baggage is null only if there are no valid entries in a baggage, in which case nothing will be injected (i.e setter won't be called). The same happens for empty baggage.

Comment on lines +134 to +135
baggage = null;
return false;
Copy link
Contributor

@xiang17 xiang17 Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels very strict to ignore the entire baggage collection if any entry is invalid, or in this case an entry is empty. I'm not so familiar with the baggage spec, but the tracestate spec specifically calls out that empty list members are allowed:

Empty and whitespace-only list members are allowed. Vendors MUST accept empty tracestate headers but SHOULD avoid sending them. Empty list members are allowed in tracestate because it is difficult for a vendor to recognize the empty value when multiple tracestate headers are sent. Whitespace characters are allowed for a similar reason, as some vendors automatically inject whitespace after a comma separator, even in the case of an empty header.

Copy link
Contributor Author

@lachmatt lachmatt Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to W3C Baggage spec empty entries are invalid - entry needs to at least contain a key satisfying a token requirement (note at least one char from tchar range) followed by a =.

Also note that current implementation of tracestate extraction ignores all of the entries if any of them is invalid (i.e behaves similarly to what is proposed in this PR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. They are indeed different on whether allowing list-member to be OWS.

{
continue;
}

baggage.Append(Uri.EscapeDataString(item.Key)).Append('=').Append(Uri.EscapeDataString(item.Value)).Append(',');
if (baggage == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to differentiate null vs empty? This would get checked in every iteration of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a need to differentiate between these, I will modify it.

// https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.6
// token = 1*tchar
// tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
// / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the code logic is correct but tchar definition also contains / DIGIT / ALPHA which you split into another method.


if (c == '%')
{
if (!ValidatePercentEncoding(index, encodedValue))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the % character is part of the literal text, i.e. not used as the special character to indicate the following two characters are a percent-encoding?

Relevant spec:

Any code points outside of the baggage-octet range MUST be percent-encoded. The percent code point (U+0025) MUST be percent-encoded. Code points which are not required to be percent-encoded MAY be percent-encoded.

For example, %%, %OK, %3Q or % at the end of string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realized this is assuming the string is encoded value. You can ignore above comment's case.

However, the next if branch checks ValidateValueCharInRange which is assuming the string is already decoded. The two different assumptions here smells not good. i.e. Could there be a char that's invalid to be a baggage-octet but won't be detected here by ValidateValueCharInRange?

@@ -131,33 +147,146 @@ internal static bool TryExtractBaggage(string[] baggageCollection, out Dictionar

if (pair.IndexOf('=') < 0)
{
continue;
baggage = null;
return false;
}

var parts = pair.Split(EqualSignSeparator, 2);
if (parts.Length != 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List-member could contain multiple = signs. For example,

  1. in property.

list-member = key OWS "=" OWS value *( OWS ";" OWS property )
property = key OWS "=" OWS value
property =/ key OWS

  1. A value can also contain equal signs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ad 1. Properties are not handled/recognized properly currently, this is something that needs to be addressed, but as it requires changes in both Baggage API and propagator, I'd prefer to address it in a separate PR.
Ad 2. This will be handled properly - see test case added in 6b88989

Copy link
Contributor Author

@lachmatt lachmatt Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created issue to track handling of properties.
Currently, properties attached to a baggage entry are treated as a part of a value.
With baggage entry value validation added in this PR (ValidateValue), such entries will be considered invalid due to the existence of ; char. This might need to be adjusted, based on how we decide to proceed with the fixes (e.g separate PR for handling properties)

[InlineData("key=%gg")]
[InlineData("key=val%2")]
[InlineData("key=val ue")]
[InlineData("key=val%IJK")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a case which has valid percent coding triplets, but invalid as value in list-member once URI escaped?

@Kielek Kielek added the pkg:OpenTelemetry.Api Issues related to OpenTelemetry.Api NuGet package label Jun 10, 2024
@lachmatt lachmatt closed this Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:OpenTelemetry.Api Issues related to OpenTelemetry.Api NuGet package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BaggagePropagator] Do not encode and decode keys
4 participants