-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Feature multi range #1402
Conversation
… ranges, add tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I don't like much the saturating range. This would considerably slow down most use of fold like fold(0.., )
, thus I think we should keep specialization variant of many0 many1 fold_many0 and fold_many1. Cause it's more clear what they do, and would allow us to remove this saturation thing. I very concern with that affecting benchmark significantly.
Current state of candidate implementation.foldOpen questions: None
whileOpen questions: None
take_till*Open questions: It is not possible to just remove the number suffix as with
take_while*Open questions: Same problem as
take_until*Open questions: Same problem as
many_tillOpen questions: In contrast to all other parsers on this list
|
Im not sure if this would have a significant performance impact. The performance of parsers with both lower and upper bounds should remain identical since they also used ranges internally. And I dont think the impact of a relatively simple iterator in the unbounded cases is that severe.
I disagree. Ranges are a first class type in Rust (regardless of how unwieldy their structs can be) and they even have their own syntax. By using that syntax for all cases we can cut down on the amount of functions which makes the API easier to understand because now there are fewer functions whose documentation you have to look up. Im not sure what you mean with "that saturation thing".
|
Well, I think not
That could be used to justify use range anywhere cause you know they are first class thing... not a good argument. fold_many0 express better than Your idea of using Range for fold was my idea in the behind #1333 (comment), but use range imply we should use it naturally, in this PR you add strange check and add a saturate feature to bend range to what you want. That not what we should do, having a hard limit for overflow is the way range are suppose to be use. And that why we must keep a not using range fold to allow to express a not bounded fold. |
No, but being a first class type is a good argument to justify using it where it makes sense.
Ok. So how exactly does the former "express" better? A range tends to be more expressive because it makes clear that we are talking about a range of possibilities (reads as "fold that many times", in this case "fold 0 or more times"). In
And a "natural" range is what exactly? I am unfamiliar with any such concept (except natural numbers) and the docs make no mention of that as far as I know.
The check is absolutely necessary. Your comment on #1333 seems to imply that you think ranges somehow prevent the max > min condition. That is not the case.
I am not "bending" the range to anything. Saturation arithmetic is a perfectly valid concept and the terminology is even used within Rust. All it means is that instead of terminating after reaching the representable maximum it, it continues to return that maximum indefinitely. To quote the Rust docs for RangeFrom:
So saturating is valid even for normal unbounded ranges.
Untrue, see the doc snippet above. If anything, a "hard limit" for an unbounded range is the only thing it is not allowed to do.
You are contradicting yourself. Previously you tell me I shouldnt "bend the range to what you want" and I am adding "strange saturation behaviour" but here you tell me there is no globally defined behaviour and it should instead be interpreted based on the context? To reiterate, interpreting based on the context is exactly what we are doing right now:
|
You doesn't seem open to discussion at all, you read but don't listen. You don't look like you code in Rust from long. That not how you are suppose to implement or use unbound range. Your solution is not good. I don't have your level of english so everything I say look stupid so I will end talk and just say, I opposite to this PR. |
I am very much open to discussion. This is what discussions are: two people exchanging their (possibly opposing) viewpoints in hopes of coming to an understanding. And I do read and try very hard to listen. But I am having a tough time understanding what exactly your problems are.
Then how exactly would you implement this? Be aware that the restriction is that the parser must be able to perform all functions of the parsers that it replaces. If you can give me an alternative to how the ranges are supposed to be interpreted while staying within that restriction I would be happy to hear it. Because I dont really see any other option.
Ok. But its a solution, it works and it solves a rather annoying issue (the parser duplication). To me thats all a solution really needs.
Thats nice. Unfortunately I still have absolutely no idea why. So Im afraid I cant really take that vote into account as anything more than "some dude opposes this PR because reasons". |
see what I mean, I said why but you don't even acknowledge this, you are not open to talk, you talk that all. |
use |
do we have concrete evidence of the overhead of the saturating iterator? Like, is it actually slower, or is LLVM smart enough to see that it will loop indefinitely? Honestly I'm not a huge fan of the saturating iterator trick for fold, I prefer the approach in #1402 (comment) even if it results in some code duplication. But it can be a good tradeoff (especially since it gives some info at compile time). BTW the match could be done in the execution of the combinator instead of the parser closure that is returned. |
I have looked at the output of the bench job and compared them to the results on master. Im not sure how reliable those are but the difference seems to drown in the noise. So from that perspective the saturating iterator seems to remain competitive.
The solution in #1402 (comment) is incomplete. The match would need to be extended with two additional checks to see if the lower bound is 0 or 1. If its not then we still need a saturating counter to compare against the lower bound later. So I think that would need at least 5 arms. |
So I tried my implementation (or at least a good approximation without needing to import actual nom types) on godbolt. Looking at the assembler output, the compiler actually translates my implementation to the exact same binary as stars example:
So to me that looks like the compiler can easily optimize its way around that. At least for constant values, but dynamic ranges need to be handled via the Here are the two examples for comparison: |
Why you need to check 0 or 1 ? |
Assume the wanted range is Edit: In addition the ranges in your comment are wrong. |
What ? no ? That not clear.
There is nothing wrong to just use a simple mutable counter or you could use: let iter = std::iter::successors(Some(0usize), |n| n.checked_add(1));
for i in iter {
} If you really want to handle overflow. But I think I would do: match (range.start_bound(), range.end_bound) {
(Bound::Included(start), Bound::Included(end)) => for 0..=end {}, // need to check included start
(Bound::Included(start), Bound::Excluded(end)) => for 0..end {}, // need to check included start same code then ^
(Bound::Included(start), Bound::Unbounded) => { for 0..=start { } then loop { } } // the for loop is not allow to error
(Bound::Excluded(start), Bound::Included(end) => for 0..=end {}, // need to check excluded start
(Bound::Excluded(start), Bound::Excluded(end) => for 0..=end {}, // need to check excluded start same code then ^
(Bound::Excluded(start), Bound::Unbounded) => { for 0..start { } then loop { } } // the for loop is not allow to error same code then ^^^
(Bound::Unbounded, Bound::Included(end)) => for 0..=end {}, // no check
(Bound::Unbounded, Bound::Excluded(end)) => for 0..end {}, // no check
(Bound::Unbounded, Bound::Unbounded) => loop {}, // no check This will allow compiler to compile check the code, there is no error possible. And I see a lot of sharing code possible. |
That breaks the way ranges are interpreted in the old parsers. The range was always meant to represent the minimum and maximum amount of iterations the inner loop is allowed to do. So
That also breaks compatibility with existing parsers. And with checked arithmetic it still needs to check if an overflow has occured. So I dont really see how that is supposed to be much better compared to just saturating, which is also a check for overflow that just returns the max amount instead of
(warning: sarcasm) My implementation exposes all the necessary type information about each range as well. It just encapsulates it in the respective trait implementation so all the necessary edge case handling doesnt need to happen in the parser. I want to be careful that this doesnt end in another 20+ post argument because of the language barrier. |
That a new parser so I don't care of old one. Your interpretation is misleading with what user expect from range. I mean the fact that I didn't understand you ignore included from the code prove my point that your code is unclear.
What ?
choice whatever you want
Your code is unclear, use magic code like where does come from this 4 ? and hide logic far away from the code. Yes my idea is way less error prone in the long run. It's may look bad but it's very rusty. |
Please read this and the documentation for the individual types here. Then please understand why what you are proposing is fundamentally wrong not just from a programming perspective but also from a mathematical perspective.
Im going to assume this is once again the language barrier and/or you just flat out dont understand the problem. As such, this is not a good use of my time and Im going to call it quits. Your code doesnt work, breaks assumptions about the basic types and breaks compatibility with the old parsers. One of these points alone would be enough to disqualify it. My code only looks "magic" and unclear if you dont understand how traits and their implementations work. And no, your idea is many things, but definitely not "less error prone". My implementation has a single point of failure (the trait implementations). Your implementation has a point of failure for every parser that is implemented that way (or possibly even for every single match arm if you want to be exact). In addition to all of the above your implementation has yet another downside: dynamic data. The compiler is not guaranteed to infer anything about dynamic data and may (at worst) include the entire match statement in the final output. With my implementation the compiler will at the very least eliminate conditional checks since the type is known at compile time and guaranteed to be resolved to a concrete implementation. As for the "rustyness" of the code. I would say taking full advantage of rusts type and trait system makes my implementation fairly "rusty". Because apparently thats a solid, objective metric now. This is the last post I will make about this until you prove to me that you actually understand what Im trying to tell you as I am fairly convinced that this is just going to end in another dispute that solves nothing and just wastes time that I dont have. |
use std::ops::{Bound, RangeBounds};
struct RangeFromInclusive {
pub start: usize,
}
impl RangeBounds<usize> for RangeFromInclusive {
fn start_bound(&self) -> Bound<&usize> {
Bound::Included(&self.start)
}
fn end_bound(&self) -> Bound<&usize> {
Bound::Unbounded
}
}
fn main() {
let range = RangeFromInclusive { start: 42 };
assert_eq!(range.start_bound(), Bound::Included(&42));
} That not because core syntax didn't include this that user can't do it. You should really take into account that I'm a rust expert. I probably still have to learn but I pretty good in Rust. I code in Rust every day in my work, and have know Rust pre 1.0. My advice are not random, what I see is you probably a very good C++ dev or whatever OOP language, but sometime in Rust one should not use trait. My match reveal we handle 5 different behavior (we can probably share the code between fold, many, etc...) , there is no way to factorize the 5 behaviors without complicated code that would need an heavy optimization that will be hard to track down. We should prefer KISS. |
my example is stupid cause I should have done for excluded, anyway, same point. The user can also simply use a tuple |
if you don't want use range like range are mean to be use, maybe just take two parameters |
I will do that I think I code better than I write, but I would have prefer we work together. |
Im not sure what your standards for "complicated" are, but at least for me the following is pretty simple: fn bounded_iter(&self) -> impl Iterator<usize> {
0..core::usize::MAX
} All implementations are like this, they just return a single value which gets used as an iterator.
Again, not sure what "hard to track down" is for you. But its nothing different from a normal trait implementation which are used everywhere in Rust. Monomorphisation replaces the generic function call with one of the concrete implementations. for count in range.bounded_iter() { gets substituted by the compiler using the trait implementation for for count in 0..12 { which the exact same thing as the harcoded There is really nothing hard to track down in this chain. The functions are annotated with the trait generic and from there its trivial to find the respective implementation for whatever concrete type you pass to it. And if you look at the comparison between our two godbolt examples above you will see the compiler applies the same optimizations to the point that the binary output is literally identical. So we are not really loosing anything because of this abstraction. The only real argument for performance is the use of a saturating counter in the case of
This is KISS. At least not anymore complicated than having to deal with a dozen ranges and loops in a single function. I dont think looking up a trait implementation counts as "complicated". |
Hi, just checking in to ask about the current status here. From what I can see there has been a lot of discussion with few actual results - are there any specific open questions other than the naming of the |
The status can be summed up as "a working implementation exists". Not more, not less.
Geal has also raised the question of wether the saturating counter used in the trait implementations has any overhead over the explicit match with 3 different loops. Based on the benchmark tests I couldnt see any significant difference but I havent stress tested it in great detail or inspected the dissasembly. I personally havent really worked on this recently as I have a second PR open here that was supposed to be reviewed ~6 months ago. I assume Geal is busy but I cant really finish either until I get some proper feedback from a maintainer. |
This is now merged as part of #1608 🥳 |
This PR intends to fix #1393.
Description
It replaces existing parser variations that only differ by how many times a subparser may run (like the different
many_*
variants) with a single parser that takes a range.In addition to a range parsers can also take a single
usize
value as parameter which evaluates to a range containing that value only (value..=value
). This is primarily done for convenience.Migration remains fairly simple as the new parsers are a full replacement and just require the respective parameters to be rephrased as a single range.
Reasoning
Using ranges makes the API cleaner by removing unnecessary function duplication. It also allows nom to integrate with more of Rusts language features (specifically the range syntax).
Example
With the old parsers this would be expressed using the
many_m_n
parser:Special attention should be drawn to the fact that the other variations, such as
many0
are also possible by using open ended ranges (0..
,1..
).Notes
The current implementation allows individual parsers to take ranges as parameters but requires them to document how they interpret open ended ranges. This was done to maximize flexibility and allow multiple, potentially contradicting, interpretations to coexist within the codebase.
A good example of this in action is the difference between
many
andfold
:many
(a..
->a..=usize::MAX
)many
packages its results into aVec
. Therefore going beyond theusize
limit makes no sense and any open ended range is capped atusize::MAX
.fold
(a..
->a..=∞
)fold
has static memory requirements since there is only one accumulator. This accumulator can be used indefinitely and as such the amount of iterations can go beyond even theusize::MAX
limit (the current implementations offold_*
also support such cases). Because of this the parser interprets open ended ranges to mean "can run for an infinite number of times".TODOs
Candidates
The following parsers have been proposed for merging.
many_*
(many0, many1, many_m_n)Resolution: Done
fold_*
(fold_many0, fold_many1, fold_many_m_n)Resolution: Done
take_till*
(take_till, take_till1)take_while*
(take_while, take_while1, take_while_m_n)take_until*
(take_until, take_until1)many_till
Open questions
How should the obsolete parsers be handled?Currently the obsolete parsers (like
many_m_n
) are deprecated using#[deprecated=""]
with a message pointing the developer to the replacement. (Usually in the form of a simpleReplaced by <new_parser>
). This allows the changes to remain backwards compatible while steering developers towards using the replacement parsers.Tests for the deprecated parsers are left as is but are annotated with
#[allow(deprecated)]
to suppress the warnings. I would recommend keeping them in for as long as the parsers still exist to make sure that they arent broken accidentally.Resolution: No deprecations for this release. The old parsers and the new range based parsers will exist side-by-side.
See here