-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pass-style): feature flag: only well-formed strings are passable #2002
Conversation
068d8f5
to
6e30e57
Compare
Under compatibility considerations, this will cause any existing program that is transmitting unpaired surrogates to break. I do not think we need to treat this as a breaking change, but we should note in |
I think this is a good change but it is marked as draft. Is that intentional? This is in my review inbox. Is this ready for review? |
It lacks only tests, filling out the PR comment and checklist, and
Should I remove the "!"? |
It also lacks adequate comments in the code |
20c2a97
to
301077e
Compare
"!" removed |
47d0bb9
to
3104dc4
Compare
3104dc4
to
6f656b4
Compare
I removed the "!". Done.
Done. Given that I removed the "!", just confirming that I should not Includes
yes
Done
Done, with the first box unchecked.
Done
Done |
ff16bf9
to
e615774
Compare
How often? What's the wall-clock impact on the Agoric blockchain? I hope we don't land this until we have pretty clear data on that. |
I don't understand why not. It's clearly a change that's observable from clients. |
@kriskowal , I leave that to you. I'm willing to go back to "!" if you think I should. |
While I agree, can someone else take on actually measuring this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't predict impact on the Agoric blockchain, but the effects on toCapData
(which includes passStyleOf
on the full input data structure) in isolation seem to be negligible in V8 and significant in XS (but only for long strings). Measuring the time to smallcaps-encode harden({ strings: ["", "foo", "A little bit longer now...", "A".repeat(1000), "𝌆".repeat(500)] })
, an array of 10 copies thereof, and a clone without the strings of length > 100:
Before
#### Moddable XS
encode: 10.00 ops/ms
encode10: 1.23 ops/ms
encodeShortStrings: 8.70 ops/ms
#### V8
encode: 28.57 ops/ms
encode10: 5.71 ops/ms
encodeShortStrings: 25.00 ops/ms
After
#### Moddable XS
encode: 0.98 ops/ms
encode10: 0.18 ops/ms
encodeShortStrings: 9.52 ops/ms
#### V8
encode: 20.00 ops/ms
encode10: 5.88 ops/ms
encodeShortStrings: 40.00 ops/ms
I'm in favor of this change... the hit in a narrow operation is noticeable only for big input, and is likely to be replaced by native code in the relatively near future.
e615774
to
856b6af
Compare
@gibson042 , thanks for measuring that! |
25a07a1
to
b0051b9
Compare
At #2008 (review) we decided to add the "!" back in to this PR. Done. Now I need to add an appropriate |
I'm interested to learn whether the breaking change judgement is more of a math problem (does there exist a test by which I could observe it?) or a cost-benefit / policy sort of thing.
A guarantee would involve a closed-world assumption, yes? i.e. an assumption that we can somehow find all the clients. Is our working model that we can, in fact do that? Thought experiment: have clients send a sort of "warrantee registration card" - if you let us know that you depend on our stuff, we'll make every effort to consider your stuff when judging breaking changes etc. |
I'd like to see 2 issues:
Maybe those (or issues that subsume them) exist already? |
b0051b9
to
ffec363
Compare
73c10e6
to
241dfc0
Compare
ce6f952
to
a7a8d37
Compare
a7a8d37
to
03f60e8
Compare
03f60e8
to
953cd6d
Compare
953cd6d
to
d731726
Compare
@kriskowal @gibson042 , I hid this new feature behind a feature flag, defaulting to disabled, so we could include it in the upcoming endo release, to start experimenting with in from agoric-sdk. Even though you already approved, it is a big enough change that I'd appreciate it if you could PTAL before I merge. Thanks! |
closes: #XXXX refs: endojs/endo#2002 endojs/endo#1860 ## Description Now that agoric-sdk depends on a version of endo that switches on these environment variables, it is time for `env.md`, where we gather all such explanations, to explain these additional env vars. ### Security Considerations none, since this PR only documents the existing situation. For the environment variable sensitive behavior itself, I think none as well. We should examine more closely in general whether env var sensitivity in our code opens up any opportunity for attackers. We think not though, because attackers should only ever execute in environments where these are not set to non-defensive settings. And attackers should never be in a position to set these. ### Scaling Considerations none. But the new text does explain the existing scaling consideration around the `ONLY_WELL_FORMED_STRINGS_PASSABLE` environment variable. ### Documentation Considerations the point. Since we gather all these explanations in agoric-sdk's env.md file, there is an inter-repo coordination problem after introducing new env vars into endo. ### Testing Considerations none ### Upgrade Considerations none
closes: #1739
refs: ocapn/ocapn#47 https://github.com/tc39/proposal-is-usv-string
Description
At an OCapN meeting, we resolved ocapn/ocapn#47 by deciding that only well-formed Unicode strings may be passed. A well-formed Unicode string does not contain any unpaired surrogates. It consists only of a sequence of Unicode code points. A conforming OCapN implementation MUST only emit well-formed Unicode strings, and MUST validate that incoming strings are well-formed, and reject those that are not.
Within the Agoric implementation, the way to implement these restrictions is by having
passStyleOf(str)
only judge well-formed string to be passable, rejecting all other strings as not passable.If the underlying engine does not yet implement
isWellFormed
, we fall back to @gibson042 's shim implementation at ocapn/ocapn#47 (comment) .Update: Because we do not yet know the performance impact, this PR hides this new feature behind a feature flag
export ONLY_WELL_FORMED_STRINGS_PASSABLE=enabled
that defaults to disabled. Unless enabled, there should be no observable difference for now, i.e., any JavaScript string would still be considered Passable. See this PR's NEWS.md update for more.
Security Considerations
If enabled, this change improves integrity because it reduces the attack surface. By rejecting incoming strings that are not well-formed, a counterparty cannot use such a string to push internal algorithms into cases they may not have tested.
Scaling Considerations
Before this PR or with this feature disabled (currently the default),
passStyleOf
would judge a string to be Passable based simply ontypeof
being'string'
, which is O(1). With this feature enabled, the check will often be O(n) in the length of the string, depending on whether and how the underlying engine implementsisWellFormed
. If the underlying engine does not yet implement it, our shim implementation takes O(n). In theory, a builtin implementation might remember that a string is well-formed, enabling an O(1) test, at least after the first time. However, we are not aware of any such engine optimizations.When the argument to
passStyleOf
is an object that it judges Passable,passStyleOf
memoizes its judgement so it need only make the expensive check once. However, because of the impossibility of having a user-level weak data structure weakly indexed by strings, it is impossible for user-level code to do such memoization for strings. We should measure to see if this is a problem in practice.Documentation Considerations
https://github.com/Agoric/agoric-sdk/blob/master/docs/env.md should be updated to explain the feature flag and the resulting restrictions on passable strings.
Testing Considerations
The first time this PR ran through CI with code enforcing the restriction, but not yet any code to test the restriction, it is unfortunate that CI came up green. This means that before this PR, when we were accepting non-well-formed strings by design, nothing tested that case, at least in a way that caused a test to break because of the added enforcement.
Compatibility Considerations
Enabling this feature is in theory a compatibility break, in that previously non-well-formed strings were supposed to work. However, aside from possible tests specifically about non-well-formed strings, we do not expect any actual code to break. See @kriskowal 's note below at #2002 (comment)
With the feature disabled, which is currently the default, there should be no compat issue at all.
Upgrade Considerations
With the feature defaulting to disabled, there is not yet an breaking change or upgrade issue.
[ ] Includes*BREAKING*:
in the commit message with migration instructions for any breaking change.NEWS.md
for user-facing changes.