Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build interval tree from sorted nodes #73819

Merged
merged 17 commits into from
Jun 4, 2024
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,15 @@ internal sealed partial class TagSpanIntervalTree<TTag>(SpanTrackingMode spanTra
public TagSpanIntervalTree(
ITextSnapshot textSnapshot,
SpanTrackingMode trackingMode,
IEnumerable<TagSpan<TTag>>? values1 = null,
IEnumerable<TagSpan<TTag>>? values2 = null)
SegmentedList<TagSpan<TTag>> values)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made nothing optional. made it so only one set of values is passed. made it so it has to be a SegmentedList (so we can sort it easily).

: this(trackingMode)
{
_tree = IntervalTree.Create(
new IntervalIntrospector(textSnapshot, trackingMode),
values1, values2);
// Sort the values by their start position. This is extremely fast (defer'ing to the runtime's sorting
// routines), and allows us to build the balanced tree directly without having to do any additional work.
values.Sort(static (t1, t2) => t1.Span.Start.Position - t2.Span.Start.Position);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t1.Span.Start.Position - t2.Span.Start.Position

Just so I'm clear here, spanEnd has no play in this, even if the spanStarts are the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The is to figure out left/right solely on start.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which you'll see is what Insert does as well.


_tree = IntervalTree<TagSpan<TTag>>.CreateFromSorted(
new IntervalIntrospector(textSnapshot, trackingMode), values);
}

private static SnapshotSpan GetTranslatedSpan(TagSpan<TTag> originalTagSpan, ITextSnapshot textSnapshot, SpanTrackingMode trackingMode)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ private sealed partial class TagSource
/// </summary>
private const int CoalesceDifferenceCount = 10;

private static readonly ObjectPool<SegmentedList<TagSpan<TTag>>> s_tagSpanListPool = new(() => new(), trimOnFree: false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to an existing pool we already have for SegmentedLists (which already has the "do not trim on free" concept).

private readonly ObjectPool<HashSet<TagSpan<TTag>>> _tagSpanSetPool;

#region Fields that can be accessed from either thread
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,16 +95,21 @@ private void RemoveTagsThatIntersectEdit(TextContentChangedEventArgs e)
return;

using var _1 = SegmentedListPool.GetPooledList<TagSpan<TTag>>(out var tagsToRemove);
using var _2 = _tagSpanSetPool.GetPooledObject(out var allTags);
using var _2 = SegmentedListPool.GetPooledList<TagSpan<TTag>>(out var allTagsList);
using var _3 = _tagSpanSetPool.GetPooledObject(out var allTagsSet);

// Everything we're passing in here is synchronous. So we can assert that this must complete synchronously
// as well.
var (oldTagTrees, newTagTrees, _) = CompareAndSwapTagTreesAsync(
static (oldTagTrees, args, _) =>
{
var (@this, e, tagsToRemove, allTags) = args;
args.tagsToRemove.Clear();
args.allTags.Clear();
var (@this, e, tagsToRemove, allTagsSet, allTagsList) = args;

// Compre-and-swap loops until we can successfully update the tag trees. Clear out the collections
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compre

nit: typo

// so we're back in an initial state before performing any work in this lambda.
tagsToRemove.Clear();
allTagsSet.Clear();
allTagsList.Clear();

var snapshot = e.After;
var buffer = snapshot.TextBuffer;
Expand All @@ -116,22 +121,27 @@ private void RemoveTagsThatIntersectEdit(TextContentChangedEventArgs e)

if (tagsToRemove.Count > 0)
{
treeForBuffer.AddAllSpans(snapshot, allTags);
// Determine the final tags for the interval tree, using a set so that we can efficiently
// remove the intersecting tags.
treeForBuffer.AddAllSpans(snapshot, allTagsSet);
allTagsSet.RemoveAll(tagsToRemove);

allTags.RemoveAll(tagsToRemove);
// Then, copy into a list so we can efficiently sort them and create the interval tree from
// those sorted items.
allTagsList.AddRange(allTagsSet);

var newTagTree = new TagSpanIntervalTree<TTag>(
snapshot,
@this._dataSource.SpanTrackingMode,
allTags);
allTagsList);
return ValueTaskFactory.FromResult((oldTagTrees.SetItem(buffer, newTagTree), default(VoidResult)));
}
}

// return oldTagTrees to indicate nothing changed.
return ValueTaskFactory.FromResult((oldTagTrees, default(VoidResult)));
},
args: (this, e, tagsToRemove, allTags),
args: (this, e, tagsToRemove, allTagsSet, allTagsList),
_disposalTokenSource.Token).VerifyCompleted();

// Can happen if we were canceled. Just bail out immediate.
Expand Down Expand Up @@ -450,21 +460,21 @@ private ImmutableDictionary<ITextBuffer, TagSpanIntervalTree<TTag>> ComputeNewTa
foreach (var spanToTag in context.SpansToTag)
buffersToTag.Add(spanToTag.SnapshotSpan.Snapshot.TextBuffer);

using var _2 = s_tagSpanListPool.GetPooledObject(out var newTagsInBuffer);
using var _2 = SegmentedListPool.GetPooledList<TagSpan<TTag>>(out var newTagsInBuffer_safeToMutate);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to make it very clear that it's ok and expected for this to be mutated in the loops below (including past initially populating it).

using var _3 = ArrayBuilder<SnapshotSpan>.GetInstance(out var spansToInvalidateInBuffer);

var newTagTrees = ImmutableDictionary.CreateBuilder<ITextBuffer, TagSpanIntervalTree<TTag>>();
foreach (var buffer in buffersToTag)
{
newTagsInBuffer.Clear();
newTagsInBuffer_safeToMutate.Clear();
spansToInvalidateInBuffer.Clear();

// Ignore any tag spans reported for any buffers we weren't interested in.

foreach (var tagSpan in context.TagSpans)
{
if (tagSpan.Span.Snapshot.TextBuffer == buffer)
newTagsInBuffer.Add(tagSpan);
newTagsInBuffer_safeToMutate.Add(tagSpan);
}

// Invalidate all the spans that were actually tagged. If the context doesn't have any recorded spans
Expand All @@ -475,7 +485,10 @@ private ImmutableDictionary<ITextBuffer, TagSpanIntervalTree<TTag>> ComputeNewTa
spansToInvalidateInBuffer.Add(span);
}

var newTagTree = ComputeNewTagTree(oldTagTrees, buffer, newTagsInBuffer, spansToInvalidateInBuffer);
// Note: newTagsInBuffer_safeToMutate will be mutated by ComputeNewTagTree. This is fine as we don't
// use it after this and immediately clear it on the next iteration of the loop (or dispose of it once
// the loop finishes).
var newTagTree = ComputeNewTagTree(oldTagTrees, buffer, newTagsInBuffer_safeToMutate, spansToInvalidateInBuffer);
if (newTagTree != null)
newTagTrees.Add(buffer, newTagTree);
}
Expand All @@ -486,10 +499,10 @@ private ImmutableDictionary<ITextBuffer, TagSpanIntervalTree<TTag>> ComputeNewTa
private TagSpanIntervalTree<TTag>? ComputeNewTagTree(
ImmutableDictionary<ITextBuffer, TagSpanIntervalTree<TTag>> oldTagTrees,
ITextBuffer textBuffer,
SegmentedList<TagSpan<TTag>> newTags,
SegmentedList<TagSpan<TTag>> newTags_safeToMutate,
ArrayBuilder<SnapshotSpan> spansToInvalidate)
{
var noNewTags = newTags.Count == 0;
var noNewTags = newTags_safeToMutate.Count == 0;
var noSpansToInvalidate = spansToInvalidate.IsEmpty;
oldTagTrees.TryGetValue(textBuffer, out var oldTagTree);

Expand All @@ -500,7 +513,7 @@ private ImmutableDictionary<ITextBuffer, TagSpanIntervalTree<TTag>> ComputeNewTa
return null;

// If we don't have any old tags then we just need to return the new tags.
return new TagSpanIntervalTree<TTag>(newTags[0].Span.Snapshot, _dataSource.SpanTrackingMode, newTags);
return new TagSpanIntervalTree<TTag>(newTags_safeToMutate[0].Span.Snapshot, _dataSource.SpanTrackingMode, newTags_safeToMutate);
}

// If we don't have any new tags, and there was nothing to invalidate, then we can
Expand All @@ -511,14 +524,14 @@ private ImmutableDictionary<ITextBuffer, TagSpanIntervalTree<TTag>> ComputeNewTa
if (noSpansToInvalidate)
{
// If we have no spans to invalidate, then we can just keep the old tags and add the new tags.
var snapshot = newTags.First().Span.Snapshot;
var snapshot = newTags_safeToMutate.First().Span.Snapshot;

// For efficiency, just grab the old tags, remap them to the current snapshot, and place them in the
// newTags buffer. This is a safe mutation of this buffer as the caller doesn't use it after this point
// and instead immediately clears it.
oldTagTree.AddAllSpans(snapshot, newTags);
oldTagTree.AddAllSpans(snapshot, newTags_safeToMutate);
return new TagSpanIntervalTree<TTag>(
snapshot, _dataSource.SpanTrackingMode, newTags);
snapshot, _dataSource.SpanTrackingMode, newTags_safeToMutate);
}
else
{
Expand All @@ -536,8 +549,11 @@ private ImmutableDictionary<ITextBuffer, TagSpanIntervalTree<TTag>> ComputeNewTa
oldTagTree.RemoveIntersectingTagSpans(spansToInvalidate, nonIntersectingOldTags);
}

// For efficiency, add the non-intersecting old tags to the new tags buffer. This is a safe mutation of
// of that buffer as it is not used by us or our caller after this point.
newTags_safeToMutate.AddRange(nonIntersectingOldTags);
return new TagSpanIntervalTree<TTag>(
snapshot, _dataSource.SpanTrackingMode, nonIntersectingOldTags, newTags);
snapshot, _dataSource.SpanTrackingMode, newTags_safeToMutate);
}
}

Expand Down Expand Up @@ -613,8 +629,8 @@ private DiffResult ComputeDifference(
TagSpanIntervalTree<TTag> latestTree,
TagSpanIntervalTree<TTag> previousTree)
{
using var _1 = s_tagSpanListPool.GetPooledObject(out var latestSpans);
using var _2 = s_tagSpanListPool.GetPooledObject(out var previousSpans);
using var _1 = SegmentedListPool.GetPooledList<TagSpan<TTag>>(out var latestSpans);
using var _2 = SegmentedListPool.GetPooledList<TagSpan<TTag>>(out var previousSpans);

using var _3 = ArrayBuilder<SnapshotSpan>.GetInstance(out var added);
using var _4 = ArrayBuilder<SnapshotSpan>.GetInstance(out var removed);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ private static (TagSpanIntervalTree<ITextMarkerTag>, ITextBuffer) CreateTree(str
{
var exportProvider = EditorTestCompositions.Editor.ExportProviderFactory.CreateExportProvider();
var buffer = EditorFactory.CreateBuffer(exportProvider, text);
var tags = spans.Select(s => new TagSpan<ITextMarkerTag>(new SnapshotSpan(buffer.CurrentSnapshot, s), new TextMarkerTag(string.Empty)));
var tags = new SegmentedList<TagSpan<ITextMarkerTag>>(
spans.Select(s => new TagSpan<ITextMarkerTag>(new SnapshotSpan(buffer.CurrentSnapshot, s), new TextMarkerTag(string.Empty))));
return (new TagSpanIntervalTree<ITextMarkerTag>(buffer.CurrentSnapshot, SpanTrackingMode.EdgeInclusive, tags), buffer);
}

Expand Down
22 changes: 0 additions & 22 deletions src/Workspaces/Core/Portable/Shared/Collections/IntervalTree.cs

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
using System.Collections.Immutable;
using System.Diagnostics;
using System.Diagnostics.CodeAnalysis;
using Microsoft.CodeAnalysis.Collections;
using Microsoft.CodeAnalysis.PooledObjects;
using Microsoft.CodeAnalysis.Shared.Extensions;
using Roslyn.Utilities;
Expand Down Expand Up @@ -35,24 +36,71 @@ private delegate bool TestInterval<TIntrospector>(T value, int start, int length

protected Node? root;

public static IntervalTree<T> Create<TIntrospector>(in TIntrospector introspector, IEnumerable<T>? values1 = null, IEnumerable<T>? values2 = null)
public static IntervalTree<T> Create<TIntrospector>(in TIntrospector introspector, IEnumerable<T>? values = null)
where TIntrospector : struct, IIntervalIntrospector<T>
{
var result = new IntervalTree<T>();

AddAll(in introspector, values1);
AddAll(in introspector, values2);
if (values != null)
{
foreach (var value in values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foreach (var value in values)

Is sorting and calling into CreateFromSorted not possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to do that in follow-up (also where I get rid of trees)

result.root = Insert(result.root, new Node(value), in introspector);
}

return result;
}

void AddAll(in TIntrospector introspector, IEnumerable<T>? values)
/// <summary>
/// Creates an interval tree from a sorted list of values. This is more efficient than creating from an unsorted
/// list as building doesn't need to figure out where the nodes need to go n-log(n) and doesn't have to rebalance
/// anything (again, another n-log(n) operation). Rebalancing is particularly expensive as it involves tons of
/// pointer chasing operations, which is both slow, and which impacts the GC which has to track all those writes.
/// </summary>
/// <remarks>
/// The values must be sorted such that given any two elements 'a' and 'b' in the list, if 'a' comes before 'b' in
/// the list, then it's "start position" (as determined by the introspector) must be less than or equal to 'b's
/// start position. This is a requirement for the algorithm to work correctly.
/// </remarks>
public static IntervalTree<T> CreateFromSorted<TIntrospector>(in TIntrospector introspector, SegmentedList<T> values)
where TIntrospector : struct, IIntervalIntrospector<T>
{
#if DEBUG
if (values.Count >= 2)
CyrusNajmabadi marked this conversation as resolved.
Show resolved Hide resolved
{
if (values != null)
var current = values[0];
for (var i = 1; i < values.Count; i++)
{
foreach (var value in values)
result.root = Insert(result.root, new Node(value), in introspector);
var next = values[i];
Debug.Assert(introspector.GetSpan(current).Start <= introspector.GetSpan(next).Start);
current = next;
}
}
#endif

if (values.Count == 0)
return Empty;

return new IntervalTree<T>
{
root = CreateFromSortedWorker(values, 0, values.Count, in introspector),
};
}

private static Node? CreateFromSortedWorker<TIntrospector>(
SegmentedList<T> values, int startInclusive, int endExclusive, in TIntrospector introspector) where TIntrospector : struct, IIntervalIntrospector<T>
{
var length = endExclusive - startInclusive;
if (length <= 0)
return null;

var mid = startInclusive + (length >> 1);
var node = new Node(values[mid]);
node.SetLeftRight(
CreateFromSortedWorker(values, startInclusive, mid, in introspector),
CreateFromSortedWorker(values, mid + 1, endExclusive, in introspector),
in introspector);

return node;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for anyone reviewing this. To convince yourself of correctness, simply see how 'Insert' works:

    private static Node Insert<TIntrospector>(Node? root, Node newNode, int newNodeStart, in TIntrospector introspector)
        where TIntrospector : struct, IIntervalIntrospector<T>
    {
        if (root == null)
        {
            return newNode;
        }

        Node? newLeft, newRight;

        if (newNodeStart < introspector.GetSpan(root.Value).Start)
        {
            newLeft = Insert(root.Left, newNode, newNodeStart, in introspector);
            newRight = root.Right;
        }
        else
        {
            newLeft = root.Left;
            newRight = Insert(root.Right, newNode, newNodeStart, in introspector);
        }

        root.SetLeftRight(newLeft, newRight, in introspector);
        var newRoot = root;

        return Balance(newRoot, in introspector);
    }

The logic there is trivial. We simply examine the start of the element we're adding and the start of the node we have to determine down which path to place the new node. So that same 'sorting' is accomplished by an up front real sort of the items.

Then, once we've placed things down the correct path, 'set' the left/right elements (which picks the right height value for the node we're creating, as well as the right 'max end node'). That same 'setting' happens in the new code.

What chnages those is that we need no balancing. because we're always building from another sorted sub-node, so the tree is naturally balanced enough to efficiently search without any costly steps while building it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think this through a bit, but I'll ask before I reach understanding of this code: Why couldn't this just store an array and not need the actual tree structure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just had that conversation in person :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes I just nod and act like I'm listening to you :)

}

protected static bool Contains<TIntrospector>(T value, int start, int length, in TIntrospector introspector)
Expand Down
Loading