-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code extraction performance improvements #604
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This also uses 'to_a' to ensure the order of code is correct. Hashes are compared without regard for order because usually the order doesn't matter. I don't think the API cares about the order because we explicitly number the lines, but it's useful to validate our code is doing what is expected
We now open each file in a stacktrace once to save opening the same file multiple times if it is in the stacktrace multiple times. It's quite common to have the same file in multiple stacktrace lines, e.g. when a private method raises. The performance gain is especially noticeable when a a really long file is repeated in the stacktrace as we have to iterate through each line to find the ones we're interested in e.g. a trace may look like this: '/some/file line 5 in "foo"' '/some/file line 10 in "bar"' '/some/file line 15 in "baz"' Previously we would: 1. Open the file for frame 1 2. Read each line up to line 11 3. Trim lines down to lines 2-8 4. Open the file for frame 2 5. Read each line up to line 14 6. Trim lines down to lines 7-13 7. Open the file for frame 3 8. Read each line up to line 19 9. Trim lines down to lines 12-18 Now we: 1. Open the file once 2. Read from line 1 until line 19 3. Add the lines each frame cares about: a. Lines 1-11 for frame 1 b. Lines 3-14 for frame 2 c. Lines 8-19 for frame 3 3. Trim lines down: a. To lines 2-8 for frame 1 b. To lines 7-13 for frame 2 c. To lines 12-18 for frame 3 Trimming is necessary because we need to record more lines than we actually want to keep, so that we can always record 7 lines of context, even when the frame starts at line 1 or the last line in the file The new way is still quite inefficient and requires iterating over the stack frames multiple times, however this passes all tests and can be improved after profiling it against the old method
There's no Stacktrace.new anymore, so we get one less frame
These will end up being converted into unreadable paths, e.g. "spec/fixtures/crashes/file1.rb" is readable from the root of bugsnag-ruby but will be stripped to "file1.rb" by the project_root, which is unreadable as it's relative to a different directory This test ensures we read files based on the expanded file path, rather than the final path we send to the Bugsnag API
This exposed a bug in the library, which may have never been reported because it was never hit or because it's quite subtle (the stacktrace would be truncated, but that may not be obvious) Previously we were using 'return' which returns from the method, not the 'map' so we'd remove the entire stacktrace if one of the files ended up empty It's possible this condition is never hit and therefore no one has encountered the bug to report it, but it's easy enough to fix and now there's a test to ensure it does the right thing
Most other notifiers set 'code' to null/nil in this case and there's not really a reason not to. If we omit it entirely we _could_ break someone's code if they assume the key 'code' always exists
:s I think this used to make sense in one permutation, but: 11 / 2 == 5.5 ceil(5.5) == 6 6 + 1 == 7 So we were previously using MAXIMUM_LINES_TO_KEEP in a very obsfucated way...
This is largely covered by 'stacktrace_spec', but these tests can be a bit more specific
These were required when Stacktrace was a class and needed to be 'new'-ed, but now it returns an array already
kattrali
reviewed
Jul 17, 2020
This is no longer needed because, other than misconfiguration, there's no way for the filename to be empty here it was implemented in a479b6e, which also let users add their own filters for the stacktrace file name. This would have been necessary then, but seems like it's not anymore — the only way to trigger this in a real backtrace is by setting the project root to a filename, which isn't a realistic use-case See #604 (review)
kattrali
approved these changes
Jul 17, 2020
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Goal
This PR improves the performance of extracting code from files in a stacktrace
The performance improvements are especially relevant when a stacktrace has the same file multiple times (e.g. when chaining private methods) and when the files are long
Exactly how much better this is is impossible to say without looking at a specific stacktrace — it can be more than a 3x speedup if it mostly consists of the same few files and they are very long, but every stacktrace is different 🌈. I didn't observe any noticable slow down when a stackrace contained exclusively different files, but the new code does involve a bit more work in the worst case scenario
Changeset
Added
CodeExtractor
class. This exists to keep track of which files in the stacktrace we need to get code from (we will sometimes remove files from the processed stacktrace) and so that it can do a single pass per file, extracting all the regions of code that it needs in one goStacktrace
class is now a module as it was always basically one method and was immediately converted to an array when it was calledLinked issues
Related to #481
Review
For the submitter, initial self-review:
For the pull request reviewer(s), this changeset has been reviewed for: