-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix out of memory in hash computation #550
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. One more test would be good and then 🚢 .
"f1b89f20de0d133:1", | ||
"c129715d7a2bc9a3:1", | ||
] | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs another test that includes a sarif file without line numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fingerprinting2.input.sarif
test file was already missing line numbers, so that's showing the new behaviour of omitting the fingerprints (see change in expected output). I'm not sure if that was testing any other behaviour that is now being missed, it didn't seem to be doing anything particularly different from the first test case to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah...yes. Was re yon my phone and didn't see the whole file.
const readStream = fs.createReadStream(filepath, "utf8"); | ||
readStream.on("close", fulfill); | ||
readStream.on("end", () => { | ||
processCharacter(65535); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is 65535?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was being appended at the end of the files when computing their fingerprint in the existing code. I don't know quite what its purpose is, but in the interest of keeping the fingerprints the same I think it makes sense to keep it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it doesn't seem like a reasonable thing to change given that it would affect the hashes. It might be best to move it into a constant so it doesn't remain a magic number in that case.
await new Promise((fulfill) => { | ||
const readStream = fs.createReadStream(filepath, "utf8"); | ||
readStream.on("close", fulfill); | ||
readStream.on("end", () => { | ||
processCharacter(65535); | ||
}); | ||
readStream.on("data", (data) => { | ||
for (let i = 0; i < data.length; ++i) | ||
processCharacter(data.charCodeAt(i)); | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful about mixing async-await and callback patterns. Error handling gets tricky because thrown errors don't necessarily correlate to promise rejections.
In this case, this could be rewritten entirely in async-await like so:
const readStream = fs.createReadStream(filepath, "utf8");
for await (const chunk of readStream) {
// Handle the chunk
}
processCharacter(65535);
Any potential errors would be caught as normal because synchronous functions would throw and asynchronous functions would be rejected (because of the await
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this appears to be used for iterating over lines, the Readline API could be useful for consuming the stream:
const reader = readline.createInterface({
input: fs.createReadStream(filepath, "utf8")
});
for await (const line of reader) {
// Handle the line
}
processCharacter(65535);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions! I've applied your first one in #550, it is indeed nicer than what I wrote. The second one doesn't quite fit our use case nicely because we actually want the new lines in the hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[…] we actually want the new lines in the hash.
Good catch. I figured it would contain the line endings since you're able to redirect the stream directly to {stdout|stderr}.write but apparently it doesn't. 🤔
Closes #528 by making two changes:
Either change individually would fix the issue above, but having both seems the most robust solution.
Merge / deployment checklist