Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fisehara/decode uri before parsing #77

Closed
wants to merge 3 commits into from

Conversation

fisehara
Copy link
Contributor

@fisehara fisehara commented Jan 3, 2024

No description provided.

@fisehara fisehara marked this pull request as draft January 3, 2024 15:17
@fisehara fisehara force-pushed the fisehara/decode-uri-before-parsing branch from 4107312 to 3143f0c Compare January 4, 2024 09:32
@fisehara fisehara requested a review from Page- January 4, 2024 09:33
@fisehara fisehara force-pushed the fisehara/decode-uri-before-parsing branch 5 times, most recently from 7973d74 to b08feb7 Compare January 4, 2024 10:24
@fisehara fisehara marked this pull request as ready for review January 4, 2024 10:24
@flowzone-app flowzone-app bot enabled auto-merge January 4, 2024 10:26
@fisehara fisehara force-pushed the fisehara/decode-uri-before-parsing branch from b08feb7 to 86af243 Compare January 4, 2024 11:37
Copy link
Contributor

@thgreasi thgreasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but I would prefer Page to make the call since I'm not familiar w/ this codebase.
The only weird case I could think was whether it's fine to decode the url part before the ?.

@fisehara fisehara force-pushed the fisehara/decode-uri-before-parsing branch 3 times, most recently from 38816c1 to 721f272 Compare January 4, 2024 14:18
Copy link
Collaborator

@Page- Page- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still want to be confident that the changed parsing matches what the odata spec expects, but at least with the change as-is there's a bunch of pegjs level stuff to improve

odata-parser.pegjs Show resolved Hide resolved
odata-parser.pegjs Outdated Show resolved Hide resolved
@@ -641,19 +642,17 @@ DurationNumber =
Text =
// This regex is equivalent to `(!ReservedUriComponent)`
text:$[^:/?#\[\]@!$*&()+,;=]*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
text:$[^:/?#\[\]@!$*&()+,;=]*
$[^:/?#\[\]@!$*&()+,;=]*

odata-parser.pegjs Outdated Show resolved Hide resolved
@@ -551,7 +552,7 @@ SubPathSegment =
ResourceName =
// This regex is equivalent to `!(ReservedUriComponent / [ %])`
resourceName:$[^:/?#\[\]@!$*&()+,;= %]+
{ return decodeURIComponent(resourceName) }
{ return resourceName }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{ return resourceName }

@@ -551,7 +552,7 @@ SubPathSegment =
ResourceName =
// This regex is equivalent to `!(ReservedUriComponent / [ %])`
resourceName:$[^:/?#\[\]@!$*&()+,;= %]+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resourceName:$[^:/?#\[\]@!$*&()+,;= %]+
$[^:/?#\[\]@!$*&()+,;= %]+

odata-parser.pegjs Outdated Show resolved Hide resolved

Sign =
'+'
/ '%2B'
{ return '+' }
/ '-'
/ ''

// TODO: This should really be done treating everything the same, but for now this hack should allow FF to work passably.
Apostrophe =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be in-lined now that it only checks a single character and there (probably?) won't be any benefit from memoizing, and definitely no benefit from deduplicating

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still stands

Parser is not parsing for escaped characters like %27='(' or %28=')'

Change-type: patch
Signed-off-by: fisehara <harald@balena.io>
@fisehara fisehara force-pushed the fisehara/decode-uri-before-parsing branch 2 times, most recently from 5b7895c to 6f762e8 Compare January 5, 2024 10:25
@fisehara
Copy link
Contributor Author

fisehara commented Jan 5, 2024

@Page-
The OData spec defines that the percent-decode should happen only on:

  • path segment
  • query option name
  • query option value

Given this syntax:

QueryOptions =
options:QueryOption|1..,'&'|
{ return CollapseObjectArray(options) }
QueryOption =
Dollar
@( SelectOption
/ FilterByOption
/ ExpandOption
/ SortOption
/ TopOption
/ SkipOption
/ CountOption
/ InlineCountOption
/ FormatOption
)
/ OperationParam
/ ParameterAliasOption

The QueryOption should first split into QueryName and QueryValue and percent-decode both before parsing the data specific queryValue.
Should we implement it this way?

@fisehara fisehara disabled auto-merge January 5, 2024 10:28
@fisehara fisehara requested a review from Page- January 5, 2024 10:28
)

spaces =
space*
[\ +]*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to escape ?

Suggested change
[\ +]*
[ +]*

@@ -583,7 +583,7 @@ Duration =
'duration'
Apostrophe
// Sign must appear first if it appears
sign:Sign
sign:$[+\-]?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw the - doesn't actually need to be escaped if it's the first or last character in the character class

@@ -132,7 +134,6 @@ QueryOption =

Dollar '$ query options' =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this only checks a single character it makes sense to in-line imo


Sign =
'+'
/ '%2B'
{ return '+' }
/ '-'
/ ''

// TODO: This should really be done treating everything the same, but for now this hack should allow FF to work passably.
Apostrophe =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still stands


space =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw I don't mind leaving this as an alias to avoid issues where in future people may add an extra thing that wants to check for spaces but don't realize that in a url a + can be equivalent to a space and so only check ' ' instead of [ +]. Unless of course there's a significant performance benefit

@Page-
Copy link
Collaborator

Page- commented Jan 5, 2024

Yeah, I think we should match what the OData spec expects @fisehara because the better we match the more tooling/documentation/etc will be cross compatible

Signed-off-by: fisehara <harald@balena.io>
Signed-off-by: fisehara <harald@balena.io>
@fisehara fisehara force-pushed the fisehara/decode-uri-before-parsing branch from 6f762e8 to cdc901d Compare January 5, 2024 15:56
@flowzone-app flowzone-app bot enabled auto-merge January 5, 2024 15:58
@fisehara
Copy link
Contributor Author

fisehara commented Jan 9, 2024

Pre-percent decoding will violate OData parsing, case:

  1. odata separates on '&' and '='
  2. If a request percent encodes '&' to '%26' in a filter string and pre-decoding is applied a violating & is in the request

As of know this module is the most loaded module in pinejs stack and an OData / percent-encoded parity change would increase the load by >= 10% which is not desirable.
Postponing work on spec parity for now.

@fisehara fisehara closed this Jan 9, 2024
auto-merge was automatically disabled January 9, 2024 08:37

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support all RFC3986 § 2.2 reserved characters and § 2.1 percent-encoding uppercase and lowercase
3 participants