-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate strings in binlogs #6017
Merged
Merged
Commits on Jan 11, 2021
-
Deduplicate strings in binlogs
When writing out a binary log we now deduplicate strings and dictionaries. This results in a significant performance increase and binlog size reduction. Performance increase is about 2x on average, size reduction is about 4x on average, but up to 30x for large binlogs. Add two new record kinds: String and NameValueList. A string record is written the first time we encounter a string we need to serialize. The next time we see the string we only write its index in the total list of strings. Similarly, NameValueList is a list of key and value strings, used for Properties, environment variables and Item metadata. The first time we're writing out a list we write a record, and subsequent times just the index. This keeps the binlog format streaming, so if the logging is interrupted midway, we will be able to read everything up to that point. We do not hold on to strings we encountered. Instead we hash them and only preserve the hash code. We rely on the hash function resulting in no collisions, otherwise it could happen that a random string in a binlog would be substituted instead of another one. The hashtables do not significantly add to MSBuild memory usage (20-30 MB for our largest binlogs). FNV-1a (64-bit hash size) was a carefully chosen hash function for its simplicity, performance, and lack of collisions on all binlogs tested so far. 32-bit hash size (such as string.GetHashCode() was not sufficient and resulted in ~800 collisions for our largest binlog with 2.7 million strings. This change includes other performance improvements such as inserting a BufferedStream around the stream we're reading or writing. This results in a significant performance improvement. We introduce a new StringStorage data structure in the binlog reader, for storing the strings on disk instead of reading them all into memory. Strings are loaded into memory on demand. This prevents OOM in 32-bit MSBuild processes when playing back large binlogs. This keeps the memory usage relatively flat when reading.
Configuration menu - View commit details
-
Copy full SHA for aa6fbab - Browse repository at this point
Copy the full SHA aa6fbabView commit details
Commits on Jan 15, 2021
-
Configuration menu - View commit details
-
Copy full SHA for b324bad - Browse repository at this point
Copy the full SHA b324badView commit details -
Configuration menu - View commit details
-
Copy full SHA for e6ee9a5 - Browse repository at this point
Copy the full SHA e6ee9a5View commit details
Commits on Jan 17, 2021
-
Reduce maximum strings allocated in memory to 2GB (1 billion chars).
Configuration menu - View commit details
-
Copy full SHA for 9d8f857 - Browse repository at this point
Copy the full SHA 9d8f857View commit details -
Introduce a RedirectionScope in BuildEventArgsWriter
This avoids manually switching from currentRecordWriter to originalBinaryWriter in three different places. It's also easier this way to find the places where the switch happens.
Configuration menu - View commit details
-
Copy full SHA for 9ba9c9e - Browse repository at this point
Copy the full SHA 9ba9c9eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 77a4ff3 - Browse repository at this point
Copy the full SHA 77a4ff3View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.