Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce til::rle - a run length encoded vector #10099

Merged
4 commits merged into from
May 20, 2021
Merged

Introduce til::rle - a run length encoded vector #10099

4 commits merged into from
May 20, 2021

Conversation

lhecker
Copy link
Member

@lhecker lhecker commented May 14, 2021

Summary of the Pull Request

Introduces til::rle, a vector-like container which stores elements of
type T in a run length encoded format. This allows efficient compaction
of repeated elements within the vector.

References

PR Checklist

Validation Steps Performed

  • Ran cacafire in OpenConsole.exe and it looked beautiful
  • Ran new suite of RunLengthEncodingTests.cpp

Co-authored-by: Michael Niksa miniksa@microsoft.com

@lhecker lhecker requested review from DHowett and miniksa May 14, 2021 21:51
@ghost ghost added Area-CodeHealth Issues related to code cleanliness, linting, rules, warnings, errors, static analysis, etc. Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Issue-Task It's a feature request, but it doesn't really need a major design. Product-Conhost For issues in the Console codebase labels May 14, 2021
## Summary of the Pull Request

Introduces `til::rle`, a vector-like container which stores elements of
type T in a run length encoded format. This allows efficient compaction
of repeated elements within the vector.

## References

* #8000 - Supports buffer rewrite work. A re-use of `til::rle` will be
  useful as a column counter as we pursue NxM storage and presentation.
* #3075 - The new iterators allow skipping forward by multiple units,
  which wasn't possible under `TextBuffer-/OutputCellIterator`.
  Additionally it also allows a bulk insertions.
* #8787 and #410 - High probability this should be `pmr`-ified
  like `bitmap` for things like `chafa` and `cacafire`
  which are changing the run length frequently.

## PR Checklist

* [x] Closes #8741
* [x] I work here.
* [x] Tests added.
* [x] Tests passed.

## Validation Steps Performed

* [x] Ran `cacafire` in `OpenConsole.exe` and it looked beautiful
* [x] Ran new suite of `RunLengthEncodingTests.cpp`

Co-authored-by: Michael Niksa <miniksa@microsoft.com>
@microsoft microsoft deleted a comment from github-actions bot May 14, 2021
@microsoft microsoft deleted a comment from github-actions bot May 14, 2021
@DHowett
Copy link
Member

DHowett commented May 18, 2021

@msftbot make sure @miniksa signs off on this

@ghost ghost added the AutoMerge Marked for automatic merge by the bot when requirements are met label May 18, 2021
@ghost
Copy link

ghost commented May 18, 2021

Hello @DHowett!

Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:

  • I'll only merge this pull request if it's approved by @miniksa

If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you".

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's perfect, and I love it. I can only find fault in a comment, and I did attempt to understand the algorithm. I've kicked the tires with my buffer implementation, as well, and it looks great. Excellent work!

src/til/ut_til/RunLengthEncodingTests.cpp Outdated Show resolved Hide resolved
src/til/ut_til/RunLengthEncodingTests.cpp Show resolved Hide resolved
template<typename T, typename S = std::size_t, std::size_t N = 1>
using small_rle = basic_rle<T, S, boost::container::small_vector<rle_pair<T, S>, N>>;
#endif
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to add one more class template (for fun!) i would suggest til::pmr::rle that uses a std::pmr::vector. However! basic_rle can't take an Allocator type, and this seems like an unnecessary cost for something we do not currently need. 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I meant "type alias" instead of "class template"

src/inc/til/rle.h Show resolved Hide resolved
Copy link
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very very pleased. You took my start and you made it shine. I love it. Excellent work.

Just a few little comments to clean up that I'd like to see the answers for before I sign for merge.

src/inc/til/rle.h Outdated Show resolved Hide resolved
src/inc/til/rle.h Outdated Show resolved Hide resolved
// rle_pair is a simple clone of std::pair, with one difference:
// copy and move constructors and operators are explicitly defaulted.
// This allows rle_pair to be std::is_trivially_copyable, if both T and S are.
// --> rle_pair can be used with memcpy(), unlike std::pair.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooooh nifty.

src/inc/til/rle.h Show resolved Hide resolved
src/inc/til/rle.h Outdated Show resolved Hide resolved
src/inc/til/rle.h Show resolved Hide resolved
//
//
//
// MUST READ: How this function (mostly) works
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this comment so much. It is exactly what I was hoping for.


VERIFY_ARE_EQUAL("1|3 3|2|1 1 1|5 5"sv, rle);
// empty
VERIFY_ARE_EQUAL(""sv, rle.slice(0, 0)); // begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, VERIFY_ARE_EQUAL takes a 3rd parameter which can be a string you want printed out in the log output so you can more easily tell which sub-test this is. (If you wanted to do that over the comments.)

using value_type = typename rle_vector::value_type;

public:
static bool AreEqual(const ::std::string_view& expected, const rle_vector& actual) noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super elegant and I love it.

src/til/ut_til/RunLengthEncodingTests.cpp Outdated Show resolved Hide resolved
@ghost ghost added Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something and removed Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something labels May 19, 2021
@github-actions

This comment has been minimized.

@lhecker
Copy link
Member Author

lhecker commented May 20, 2021

@miniksa @DHowett The most recent commit changes the way the iterator works: Instead of 1-based indices it's now based on 0-based ones. I felt that this fits better with the the container class itself, which is also written with 0-base indexes in mind. It also fixes a off-by-one error in operator-().

@DHowett
Copy link
Member

DHowett commented May 20, 2021

Ooh, you made the static analyzer mad!

operator-=(1);
if (_pos == 0)
{
--_it;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the vector [debug] iterator catch the out of bounds move if the user is at [0]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::vector one will. I'm not sure about the one in boost.
But this code is functionally equivalent to the operator+= implementation (just much simpler for inlining).

@miniksa
Copy link
Member

miniksa commented May 20, 2021

@miniksa @DHowett The most recent commit changes the way the iterator works: Instead of 1-based indices it's now based on 0-based ones. I felt that this fits better with the the container class itself, which is also written with 0-base indexes in mind. It also fixes a off-by-one error in operator-().

Yeah that's probably wise. I don't know what drugs I was on when I wrote it 1-based. Thanks for fixing it.

@miniksa miniksa removed the AutoMerge Marked for automatic merge by the bot when requirements are met label May 20, 2021
Copy link
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good. Thanks for taking this over the finish line. It looks great. Minorly sad that some of the comments got removed from the iterators, but you're probably right to do so as they really weren't describing the "why".

@lhecker lhecker added the AutoMerge Marked for automatic merge by the bot when requirements are met label May 20, 2021
@ghost
Copy link

ghost commented May 20, 2021

Hello @lhecker!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit a8e4bed into main May 20, 2021
@ghost ghost deleted the dev/lhecker/rle branch May 20, 2021 17:27
@ghost
Copy link

ghost commented May 25, 2021

🎉Windows Terminal Preview v1.9.1445.0 has been released which incorporates this pull request.:tada:

Handy links:

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-CodeHealth Issues related to code cleanliness, linting, rules, warnings, errors, static analysis, etc. Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Task It's a feature request, but it doesn't really need a major design. Product-Conhost For issues in the Console codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Abstract AttrRow as til::rle<T>
3 participants