-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Outlining] Remove overlapping sequences #7146
base: main
Are you sure you want to change the base?
Conversation
1a589b6
to
0d961b8
Compare
src/support/intervals.h
Outdated
bool operator<(const Interval& other) const { | ||
return start < other.start && weight < other.weight; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should take end
into account as well. Otherwise the std::set<Interval>
returned by IntervalProcessor::getOverlaps()
will not be able to hold two intervals that differ only in their ends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.cpp
Outdated
|
||
std::set<Interval> | ||
IntervalProcessor::getOverlaps(std::vector<Interval>& intervals) { | ||
std::sort(intervals.begin(), intervals.end(), [](Interval a, Interval b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::sort(intervals.begin(), intervals.end(), [](Interval a, Interval b) { | |
std::sort(intervals.begin(), intervals.end(), [](const Interval& a, const Interval& b) { |
Just to avoid copying intervals around unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.cpp
Outdated
}); | ||
|
||
std::set<Interval> overlaps; | ||
auto& firstInterval = intervals[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an early return if the input vector is empty to avoid UB here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/passes/hash-stringify-walker.cpp
Outdated
for (auto startIdx : substring.StartIndices) { | ||
auto interval = | ||
Interval(startIdx, | ||
startIdx + substring.Length - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised we're using intervals inclusive of their ends. Would this work without the - 1
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, done
src/passes/hash-stringify-walker.cpp
Outdated
auto interval = | ||
Interval(startIdx, | ||
startIdx + substring.Length - 1, | ||
substring.Length * substring.StartIndices.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth a comment about why we are using this weight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/passes/hash-stringify-walker.cpp
Outdated
std::set<Interval> overlaps = IntervalProcessor::getOverlaps(intervals); | ||
std::set<unsigned> doNotInclude; | ||
for (auto& interval : overlaps) { | ||
doNotInclude.insert(intervalMap[interval]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could simplify the code here and get away without any map or set lookups if IntervalProcessor
returned a sequence of kept indices in its input vector rather than a set of removed intervals. With a sequence of kept indices, we could directly construct the list of kept substrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea, thanks!
@@ -1006,3 +1006,57 @@ | |||
(loop (nop)) | |||
) | |||
) | |||
|
|||
;; Test that no attempt is made to outline overlapping repeat substrings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add comments about what the overlapping substrings are.
test/lit/passes/outlining.wast
Outdated
(drop (i32.add | ||
(i32.const 0) | ||
(i32.const 1) | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make the test more concise and easy to read by just using constants and drops, unless I'm missing some reason why this wouldn't work.
(drop (i32.const 0))
(drop (i32.const 1))
(drop (i32.const 2))
(drop (i32.const 3))
(drop (i32.const 0))
(drop (i32.const 1))
(drop (i32.const 2))
(drop (i32.const 3))
(drop (i32.const 1))
(drop (i32.const 2))
(drop (i32.const 1))
(drop (i32.const 2))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, thanks
src/support/intervals.h
Outdated
}; | ||
|
||
struct IntervalProcessor { | ||
static std::set<Interval> getOverlaps(std::vector<Interval>&); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add gTests for edge cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still be good to include gtest unit tests for the various kinds of overlaps.
src/passes/hash-stringify-walker.cpp
Outdated
for (Index i = 0; i < substrings.size(); i++) { | ||
auto substring = substrings[i]; | ||
for (auto startIdx : substring.StartIndices) { | ||
// TODO: This weight was picked with an assumption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/passes/Outlining.cpp
Outdated
@@ -280,6 +280,8 @@ struct Outlining : public Pass { | |||
DBG(printHashString(stringify.hashString, stringify.exprs)); | |||
// Remove substrings that are substrings of longer repeat substrings. | |||
substrings = StringifyProcessor::dedupe(substrings); | |||
// Remove substrings with overlapping indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Remove substrings with overlapping indices | |
// Remove substrings with overlapping indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
std::vector<Interval> intervals; | ||
std::vector<int> substringIdxs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have a comment saying how these two vectors relate to each other.
src/passes/hash-stringify-walker.cpp
Outdated
if (substringsIncluded.find(substringIdx) != substringsIncluded.end()) { | ||
continue; | ||
} | ||
substringsIncluded.insert(substringIdx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (substringsIncluded.find(substringIdx) != substringsIncluded.end()) { | |
continue; | |
} | |
substringsIncluded.insert(substringIdx); | |
if (!substringsIncluded.insert(substringIdx)->second) { | |
continue; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.h
Outdated
}; | ||
|
||
struct IntervalProcessor { | ||
// TODO: Given a vector of Interval, returns a vector of the indices, mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the TODO here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was for me to review the comment before submitting
test/lit/passes/outlining.wast
Outdated
;; CHECK-NEXT: (i32.add | ||
;; CHECK-NEXT: (i32.const 0) | ||
;; CHECK-NEXT: (i32.const 1) | ||
;; CHECK-NEXT: (i32.sub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test changed because the order of the outlined functions to be created changed. This is theoretically possible now because we wait to add substrings to the result vector in hash-stringify-walker's removeOverlaps() until every interval for a repeat substring has been seen. For our purposes, It does not actually matter what order the substrings are in, because we create an OutliningSequence to represent each substring, and ensure that is sorted by idx, line 373 in Outlining.cpp.
src/passes/hash-stringify-walker.cpp
Outdated
if (seenCount[substringIdx] == substring.StartIndices.size() && | ||
substringsIncluded.insert(substringIdx).second) { | ||
result.push_back(substring); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're only considering a substring for outlining at all if all of its ocurrences survive overlap filtering? Could we keep the substring in consideration and just remove the particular occurrence of it that had the overlap instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -36,26 +37,24 @@ IntervalProcessor::filterOverlaps(std::vector<Interval>& intervals) { | |||
|
|||
std::sort( | |||
intIntervals.begin(), intIntervals.end(), [](const auto& a, const auto& b) { | |||
return a.first.start < b.first.end; | |||
return a.first.start < b.first.start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the lambda here if you fix operator<
to be a total order (meaning that for any pair of intervals a
and b
, exactly one of a < b
,b < a
, or a == b
is true)
src/support/intervals.cpp
Outdated
}); | ||
|
||
std::vector<int> result; | ||
auto& firstInterval = intIntervals[0]; | ||
auto& formerInterval = intIntervals[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this a reference means that when you do formerInterval = latterInterval
below, it writes to the first element in intIntervals
, which is a little odd. Intervals should be small enough that copying them is cheap, so let's just make this a non-reference. Alternatively, to avoid copying intervals, you could make this an index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/support/intervals.cpp
Outdated
firstInterval = nextInterval; | ||
} else { | ||
result.push_back(firstInterval.second); | ||
if (latterInterval.first.weight > formerInterval.first.weight) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the weights are equal, perhaps you can choose to keep the interval with the nearest end to reduce its potential to overlap with subsequent intervals.
// back to the original input vector, of non-overlapping indices, ie, the | ||
// intervals that overlap have already been removed. | ||
// Given a vector of Interval, returns a vector of the indices that, mapping | ||
// back to the original input vector, do not overlap with each other, ie: the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// back to the original input vector, do not overlap with each other, ie: the | |
// back to the original input vector, do not overlap with each other, i.e. the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
std::vector<Interval> intervals; | ||
intervals.emplace_back(Interval{0, 4, 2}); | ||
intervals.emplace_back(Interval{4, 8, 2}); | ||
ASSERT_EQ(IntervalProcessor::filterOverlaps(intervals).size(), 2u); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to test the precise results rather than just the size of the results. You can still do it with a single ASSERT_EQ:
std::vector<int> expected{0, 1};
ASSERT_EQ(IntervalProcessor::filterOverlaps(intervals), expected);
ASSERT_EQ(IntervalProcessor::filterOverlaps(intervals).size(), 2u); | ||
} | ||
|
||
TEST(IntervalsTest, TestOverlapFound) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add tests for different kinds of overlaps, different input orders, different weights. There are a lot of interesting combinations!
|
||
struct IntervalProcessor { | ||
// Given a vector of Interval, returns a vector of the indices that, mapping | ||
// back to the original input vector, do not overlap with each other, ie: the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjust punctuation around ie to , i.e.,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
While determining whether repeat sequences of instructions are candidates for outlining, remove sequences that overlap, giving weight to sequences that are longer and appear more frequently.