-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: faster stream verification #76
Conversation
afba080
to
a70f2bb
Compare
lib/index.js
Outdated
this.algorithm = this.goodSri ? this.sri.pickAlgorithm(this.opts) : null | ||
this.goodSri = this.sri instanceof Integrity | ||
? !!Object.keys(this.sri).length | ||
: this.sri instanceof Hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What use case is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I see from the NPM data, the integrity usually is a 256 hash, and it only is one hash.
So, instead of needing to instantiate an Integrity
class, we can just instantiate the Hash
class and save some operations when single=true
.
But thinking a little further, instead of supporting this behavior, maybe we can create a new API specifically to verify only one hash at a time like I put in the benchmark
(which gives us almost 2x perf), intended for cases like this.
What do you think? Keep or create a new API?
We can also keep both since checkStream today breaks when we pass the single=true
option, so it's not bad behavior at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'm following you now. The single: true
option is already the signal we use to inform ssri that we are only parsing a single hash. Having checkStream
not break with this param seems preferable to creating a whole new single hash API. Once that works in and of itself we can then optimize it as we would have new API.
Does this make sense? Feel free to advocate for your own opinion here, it is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, my initial idea with the new API was to avoid the creation of IntegrityStream
class and skip a bunch of verifications.
But actually, maybe we can do as you said and use single: true
to perform those optimizations.
The nested ternaries in this.algorithm = null
if (this.sri instanceof Integrity) {
this.goodSri = !!Object.keys(this.sri).length
if (this.goodSri) {
this.algorithm = this.sri.pickAlgorithm(this.opts)
}
} else if (this.sri instanceof Hash) {
this.goodSri = this.sri
this.algorithm = this.sri.algorithm
} |
a70f2bb
to
00ce92e
Compare
00ce92e
to
5aab2dc
Compare
I was reading the code, check if I'm not wrong, this piece of code: Lines 82 to 86 in 8e80eca
And then, the implementation of Lines 283 to 298 in 8e80eca
What this code is doing:
If the assumptions are:
So, the code is not implemented correctly. If the assumptions are:
So, the code is doing more than needed, we could just parse the sent digest, pick one, and then parse the stream for just that hash. In the final, I think we could remove entirely the function of |
Reusing it in As far as what Integrity can have one hash or multiple hashes, sri can have one hash or multiple hashes.
|
@wraithgar Okay, I'll look into this more over the weekend to see if I can get more optimizations out of this. But those optimizations will be along with the About this PR, the only thing missing is the tests, right? For the bug on |
Yes by all means let's fix |
Pinging "Always happy to be pinged" @jakebailey. Not sure if your interests are limited to semver or if they expand to other libs used by pnpm. Two PRs have already landed here and were published as If you are involved in updating ssri in pnpm please note that |
In my real-world DT case, it doesn't seem to really do much. If anything, 10.0.2 itself is a little slower? Hard to say; there is variance here.
What do you mean by "no longer returning identical strings"? That sort of thing will have an impact for lockfiles; IIRC pnpm's does contain this string, so it probably will be a problem because two different versions of pnpm which are supposed to use the same lockfile format will instead not agree on the integrety. So, hopefully that's not what you mean. |
@jakebailey The PR that changes the behavior of Essentially, if the hash was created from a string like The only time So I think it will not be a problem for PNPM, for other package managers, I don't know if could be a problem because usually the integrity of the packages is just stored using a single algorithm, this little breaking change only affects when we see more than one algorithm with |
Yes that's correct. As long as folks are using the literal stringified representation to test equality w/ existing integrities but are instead letting ssri parse it they will be fine. |
Sure, I'll defer to you all and @zkochan; my knowledge of this is limited to "I profiled the code and made it faster" 😄 |
5aab2dc
to
2acfc3a
Compare
match (integrity, opts) { | ||
const other = parse(integrity, opts) | ||
if (!other) { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we need a test that hits this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will take more time to add, I will try to add more test tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! Really appreciate your contributions lately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I push the tests, I also changed a little bit the implementation because I was doing the wrong verification to compare the integrity, tests are always good ahhaaha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I still have some lines without coverage, but I'm not sure how I can properly test these lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to tackle the match
algorithm anyways. I'll fork from this branch to fix it and when it lands it'll land this also. Test coverage can be our problem.
b382e86
to
6b4237f
Compare
6b4237f
to
fbe3f6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved pending test coverage, to be done in a new PR that builds off of this.
This landed in #79! |
Faster integrity check when is stream
I also take a look at streams mode because PNPM also verify the integrity of the files using streams.
The initial version was already fast compare to the main:
I also saw that
checkStream
doesn't support the optionsingle
and almost all verifications that are done by PNPM only verify a single hash, so I see an opportunity to push the performance a little bit further.But I did an experiment, If we ignore all the checkStream codes and jump to the final verification, we can achieve this performance:
I put the code in the file above, the assumption is: if we verify only one hash, we can skip a lot of verifications.
So I think I could be good to
ssri
to export single hash verifications, what do you think?benchmark-stream.js
References
Related to #71