-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⚡ Better Faster Cleaner STATUS
parsing
#225
Merged
nevans
merged 4 commits into
ruby:master
from
nevans:parser/better-faster-cleaner-status
Nov 13, 2023
Merged
⚡ Better Faster Cleaner STATUS
parsing
#225
nevans
merged 4 commits into
ruby:master
from
nevans:parser/better-faster-cleaner-status
Nov 13, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The SequenceSet class is only a placeholder for now, because the more complete implementation isn't ready yet. But we need `sequence-set` for both `tagged-ext-value`. And we need `tagged-ext-value` for the RFC4466 extension grammar for `STATUS`, `ESEARCH`, `LIST`, etc. The more complete SequenceSet implementation is needed for `ESEARCH`.
Although this is currently unused, it should eventually be used for `StatusData`, `BodyStructure`, `ESEARCH`, `MailboxList`, etc.
Although this is currently unused, we need `tagged-ext-val` for the RFC4466 extension grammar for `STATUS`, `ESEARCH`, `LIST`, etc.
Although "number" is still the default `status-att-val`, this uses ExtensionData with RFC4466's `tagged_ext_val` for any unknown non-numeric `STATUS` attribute. Running the benchmarks (on my phone, without YJIT) shows a 40% speedup! invalid_status_response_trailing_space v0.4.4-16-g0be6b65b: 43956.1 i/s 0.4.4: 31788.6 i/s - 1.38x slower rfc3501_7.2.4_STATUS_response_example v0.4.4-16-g0be6b65b: 45436.2 i/s 0.4.4: 32458.5 i/s - 1.40x slower status_response_uidnext_uidvalidity v0.4.4-16-g0be6b65b: 45334.2 i/s 0.4.4: 32709.1 i/s - 1.39x slower Various changes: * Add alias for `mailbox` to `astring`. * Use char token matchers (faster than `match(T_#{name})`). * Extract `status-att-list` and `status-att-val` methods, to mimic ABNF. * Add a case statement to `status-att-val` and explicitly match all RFC3501 and RFC9051 status attributes.
nevans
force-pushed
the
parser/better-faster-cleaner-status
branch
from
November 13, 2023 01:11
6129112
to
8070925
Compare
This was referenced Nov 13, 2023
24 tasks
nevans
added a commit
to nevans/net-imap
that referenced
this pull request
Dec 11, 2023
The version of SequenceSet in net-imap prior to this commit was merely a placeholder, needed in order to complete `tagged-ext` for ruby#225. This updates it with a full API, inspired by Set, Range, and Array. This allows it to be more broadly useful, e.g. for storing and working with mailbox state. In addition to Integer, Range, and enumerables, any object with `#to_sequence_set` can now be used to create a sequence set. For compatibility with MessageSet, `ThreadMember#to_sequence_set` collects all child seqno into a SequenceSet. Because mailbox state can be _very_ large, inputs are stored in an internal sorted array of ranges. These are stored as `[start, stop]` tuples, not Range objects, for simpler manipulation. A future optimization could convert all tuples to a flat one-dimensional Array (to reduce object allocations). Storing the data in sorted range tuples allows many of the important operations to be `O(lg n)`. Although updates do use `Array#insert` and `Array#slice!`—which are technically `O(n)`—they tend to be fast until the number of elements is very large. Count and index-based methods are also `O(n)`. A future optimization could cache the count and compose larger sets from a sorted tree of smaller sets, to preserve `O(lg n)` for most operations. SequenceSet can be used to replace MessageSet (which is used internally to validate, format, and send certain command args). Some notable differences between the two: * Most validation is done up-front, when initializing or adding values. * A ThreadMember to `sequence-set` bug has been fixed. * The generated string is sorted and adjacent ranges are combined. TODO in future PRs: * #index_lte => get the index of a number in the set, or if the number isn't in the set, the number before it. * Replace or supplement the UID set implementation in UIDPlusData. * fully replace MessageSet (probably not before v0.5.0)
73 tasks
nevans
added a commit
to nevans/net-imap
that referenced
this pull request
Dec 11, 2023
The version of SequenceSet in net-imap prior to this commit was merely a placeholder, needed in order to complete `tagged-ext` for ruby#225. This updates it with a full API, inspired by Set, Range, and Array. This allows it to be more broadly useful, e.g. for storing and working with mailbox state. In addition to Integer, Range, and enumerables, any object with `#to_sequence_set` can now be used to create a sequence set. For compatibility with MessageSet, `ThreadMember#to_sequence_set` collects all child seqno into a SequenceSet. Because mailbox state can be _very_ large, inputs are stored in an internal sorted array of ranges. These are stored as `[start, stop]` tuples, not Range objects, for simpler manipulation. A future optimization could convert all tuples to a flat one-dimensional Array (to reduce object allocations). Storing the data in sorted range tuples allows many of the important operations to be `O(lg n)`. Although updates do use `Array#insert` and `Array#slice!`—which are technically `O(n)`—they tend to be fast until the number of elements is very large. Count and index-based methods are also `O(n)`. A future optimization could cache the count and compose larger sets from a sorted tree of smaller sets, to preserve `O(lg n)` for most operations. SequenceSet can be used to replace MessageSet (which is used internally to validate, format, and send certain command args). Some notable differences between the two: * Most validation is done up-front, when initializing or adding values. * A ThreadMember to `sequence-set` bug has been fixed. * The generated string is sorted and adjacent ranges are combined. TODO in future PRs: * #index_lte => get the index of a number in the set, or if the number isn't in the set, the number before it. * Replace or supplement the UID set implementation in UIDPlusData. * fully replace MessageSet (probably not before v0.5.0)
nevans
added a commit
to nevans/net-imap
that referenced
this pull request
Dec 11, 2023
The version of SequenceSet in net-imap prior to this commit was merely a placeholder, needed in order to complete `tagged-ext` for ruby#225. This updates it with a full API, inspired by Set, Range, and Array. This allows it to be more broadly useful, e.g. for storing and working with mailbox state. In addition to Integer, Range, and enumerables, any object with `#to_sequence_set` can now be used to create a sequence set. For compatibility with MessageSet, `ThreadMember#to_sequence_set` collects all child seqno into a SequenceSet. Because mailbox state can be _very_ large, inputs are stored in an internal sorted array of ranges. These are stored as `[start, stop]` tuples, not Range objects, for simpler manipulation. A future optimization could convert all tuples to a flat one-dimensional Array (to reduce object allocations). Storing the data in sorted range tuples allows many of the important operations to be `O(lg n)`. Although updates do use `Array#insert` and `Array#slice!`—which are technically `O(n)`—they tend to be fast until the number of elements is very large. Count and index-based methods are also `O(n)`. A future optimization could cache the count and compose larger sets from a sorted tree of smaller sets, to preserve `O(lg n)` for most operations. SequenceSet can be used to replace MessageSet (which is used internally to validate, format, and send certain command args). Some notable differences between the two: * Most validation is done up-front, when initializing or adding values. * A ThreadMember to `sequence-set` bug has been fixed. * The generated string is sorted and adjacent ranges are combined. TODO in future PRs: * #index_lte => get the index of a number in the set, or if the number isn't in the set, the number before it. * Replace or supplement the UID set implementation in UIDPlusData. * fully replace MessageSet (probably not before v0.5.0)
nevans
added a commit
to nevans/net-imap
that referenced
this pull request
Dec 11, 2023
The version of SequenceSet in net-imap prior to this commit was merely a placeholder, needed in order to complete `tagged-ext` for ruby#225. This updates it with a full API, inspired by Set, Range, and Array. This allows it to be more broadly useful, e.g. for storing and working with mailbox state. In addition to Integer, Range, and enumerables, any object with `#to_sequence_set` can now be used to create a sequence set. For compatibility with MessageSet, `ThreadMember#to_sequence_set` collects all child seqno into a SequenceSet. Because mailbox state can be _very_ large, inputs are stored in an internal sorted array of ranges. These are stored as `[start, stop]` tuples, not Range objects, for simpler manipulation. A future optimization could convert all tuples to a flat one-dimensional Array (to reduce object allocations). Storing the data in sorted range tuples allows many of the important operations to be `O(lg n)`. Although updates do use `Array#insert` and `Array#slice!`—which are technically `O(n)`—they tend to be fast until the number of elements is very large. Count and index-based methods are also `O(n)`. A future optimization could cache the count and compose larger sets from a sorted tree of smaller sets, to preserve `O(lg n)` for most operations. SequenceSet can be used to replace MessageSet (which is used internally to validate, format, and send certain command args). Some notable differences between the two: * Most validation is done up-front, when initializing or adding values. * A ThreadMember to `sequence-set` bug has been fixed. * The generated string is sorted and adjacent ranges are combined. TODO in future PRs: * #index_lte => get the index of a number in the set, or if the number isn't in the set, the number before it. * Replace or supplement the UID set implementation in UIDPlusData. * fully replace MessageSet (probably not before v0.5.0)
nevans
added a commit
that referenced
this pull request
Dec 11, 2023
The version of SequenceSet in net-imap prior to this commit was merely a placeholder, needed in order to complete `tagged-ext` for #225. This updates it with a full API, inspired by Set, Range, and Array. This allows it to be more broadly useful, e.g. for storing and working with mailbox state. In addition to Integer, Range, and enumerables, any object with `#to_sequence_set` can now be used to create a sequence set. For compatibility with MessageSet, `ThreadMember#to_sequence_set` collects all child seqno into a SequenceSet. Because mailbox state can be _very_ large, inputs are stored in an internal sorted array of ranges. These are stored as `[start, stop]` tuples, not Range objects, for simpler manipulation. A future optimization could convert all tuples to a flat one-dimensional Array (to reduce object allocations). Storing the data in sorted range tuples allows many of the important operations to be `O(lg n)`. Although updates do use `Array#insert` and `Array#slice!`—which are technically `O(n)`—they tend to be fast until the number of elements is very large. Count and index-based methods are also `O(n)`. A future optimization could cache the count and compose larger sets from a sorted tree of smaller sets, to preserve `O(lg n)` for most operations. SequenceSet can be used to replace MessageSet (which is used internally to validate, format, and send certain command args). Some notable differences between the two: * Most validation is done up-front, when initializing or adding values. * A ThreadMember to `sequence-set` bug has been fixed. * The generated string is sorted and adjacent ranges are combined. TODO in future PRs: * #index_lte => get the index of a number in the set, or if the number isn't in the set, the number before it. * Replace or supplement the UID set implementation in UIDPlusData. * fully replace MessageSet (probably not before v0.5.0)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Although "number" is still the default
status-att-val
, this usesExtensionData with RFC4466's
tagged_ext_val
for any unknownnon-numeric
STATUS
attribute.Running the benchmarks (on my phone, without YJIT) shows a 40% speedup!