-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate parser variants using ranges (e.g. many0
, many_m_n
)
#1393
Comments
I quite like this idea. At the very least this makes the API much cleaner by unifying the very similar parsers. This could also be done for the However Rust Ranges are
As an afterthought, your example has a parameter order of many(1..5,
alt((
tag("+"),
tag("-"),
))
) A fold version would look similar: fold(1..5,
Vec::new,
tag("+"),
|acc: Vec<_>, s: &str| {
acc.push(s);
acc
}
) |
This comment has been minimized.
This comment has been minimized.
many0
, many_m_n
)
Yes, I assume the range would just be for the existing type of the type
Oh, I wasn't aware of that parser, so yes, we'd need to handle going beyond usize::MAX
My ordering was not meant to be prescriptive; I was just going off of a faulty memory |
To double check, do we feel its worth I go ahead and put together a PR? |
Im just a random contributor, but for what its worth I like this idea enough to want to make a PR myself. So I would say yes.
For |
My plan is to create a PR for one parser to get feedback on the design and approach and then do one PR for each additional parser. Candidates:
More questionable cases:
The main open question is on naming. Dropping the suffix would work in cases like
|
That problem runs much deeper than just this. Take
Are all covered by
Can all be covered by a universal
Maybe? The Naming in these cases is an issue, primarily because nom breaks its own naming convention (again) by naming it |
Probably not that one. It would be better to introduce the ranged parsers as completely new and then deprecate the old ones, which avoids breaking backwards compatibility. |
Could we make this |
It was less about signedness and more about size. Specifically the fact that
What kind of composition would that be? And what exactly is the type of the intermediary I quite frankly see no reason why that should be split into two types and as far as I know this pattern isnt used anywhere else in nom. |
So I have gone ahead and written a small proof of concept in this branch. It includes a Range trait implemented for all ranges of It looks terrible, is undocumented and untested, but it works with the few documentation tests from the original parsers. It also currently all lives in the Letting |
Actually, scratch all that nonsense about our own traits and stuff. Turns out the |
I was realizing the same thing with |
I realized that as well. But I think I made it work? My current implementation allows conversion from |
I think my branch is in a usable state. I migrated all the existing examples and tests to use the ranged parsers for The trait now lives under |
Maybe the iteration logic can be moved to a seperate struct and then have something like |
That works pretty well actually. I implemented two iterators for our Ranges |
for count in range.saturating_iter() {
let len = input.input_len();
match parse.parse(input.clone()) {
...
Err(Err::Error(err)) => {
if !range.contains(&count) {
...
}
Err(e) => return Err(e),
}
} This highlights only the changes made to the parser logic. Specifically replacing the explicit range in the From the parsers perspective this is actually a pretty minimal change and Im quite happy with how this looks (the iterators are a bit more involved but the parser doesnt need to care about that). It doesnt overload the parser with unnecessary information and doesnt make it more difficult to use than the old version (I would argue its easier to use since The iterator behaviour can probably be reused for the other parsers ( |
What are your thoughts on the trait being pub trait IntoRangeBounds<Idx>
{
/// Convert to a RangeBound
fn convert(self) -> (Bound<Idx>, Bound<Idx>);
} instead of pub trait IntoRangeBounds<T>
where
T: RangeBounds<usize>
{
/// Convert to a RangeBound
fn convert(self) -> T;
} I think it would remove the need for pub fn many<I, O, E, F, G, H>(
range: G,
mut parse: F,
) -> impl FnMut(I) -> IResult<I, Vec<O>, E>
where
I: Clone + InputLength,
F: Parser<I, O, E>,
E: ParseError<I>,
G: IntoRangeBounds<H>,
H: RangeBounds<usize>, I would also say it can help in reducing code bloat by reducing monomorphization, like what momo does, but the signatures are brittle enough that it is unlikely we can get any gains here. As a pattern to follow, this will help in |
The downside of that is that we cant use the I would defintely like to keep the logic in the parsers as simply and straightforward as possible. So Im against just unpacking to a Ill go ahead and experiment with a custom Range type (either a wrapper around the tuple or just a simple struct with two bounds). Then I could say with greater confidence how well that works from the parsers perspective (or not, I guess). |
|
I did not know that. That would work then. However trying to |
I always forget that rule and assume more won't work than is possible, so I was surprised when your original solution worked ;) |
Not sure I fully understand that, but I assume that means you dont have a good replacement as well? Im fine with leaving it as is if there are no alternatives. While perhaps a bit redundant I dont think the second generic is that terrible for as long as it makes the type system happy. That would just leave the new names for the other parser candidates. Im still drawing blanks for those however. |
I am fairly confident that this is something worth finishing so I have opened a draft to make it easier to track the progress towards a final implementation. |
Great! Really excited about this. I did some testing and it looks like the last version I looked at will break if we add more impls to |
For now its in Draft until I get all the open points done which will probably take some time. Geal is also seemingly absent recently so even once its open its up there when exactly it will be reviewed/merged. With all that I would probably not want to wait for the following reasons:
I will most likely not get this fully done this weekend so you probably still have some time. If anything substantial changes I can still extend its Draft status once Im done. |
Just want to confirm it wasn't missed since it wasn't acknowledged: adding the impl breaks builds for people using the trait added in the PR. I'm assuming what we are doing is not too common such we can move forward with it if nom doesn't move forward. If nom does move forward with the current PR, we'll see if the libs team has any magic for making this all work. If you want to take a look, feel free to look at the first and second commit of https://github.com/epage/range_bounds_example and play around with it.
It would be independent of nom's MSRV. We just use |
It was indeed missed.
Ok, you lost me. Maybe Im just too tired to follow all the "trait magic" right now. Which impl breaks what? Or more specific to my interest: Is there a problem with my current implementation that needs to be addressed or are you talking about something else? (maybe the implementation of
Unsure how I feel about that. For one, having With all that, heres my plan: Ill try to get the other open points (other parsers, naming schemes, etc) on my Draft done with the current implementation (which is probably going to take a bit longer than monday). When all those are closed and there still hasnt been any update from you Ill ping you directly to discuss the status before opening the Draft as a PR proper. If anything comes up between that we can still agree to keep the Draft open for longer if necessary. That sound good? |
The following crate combines both my Rust RFC with an iteration on your solution for this crate If you clone that repo and build it, you will get
Maybe there is some magic that can be done to make this smoother. I'm hoping the libs team can help with that, if they find interest in the idea.
Sounds good! Thanks for all of your help with the proposal and then work on this! |
I want clarification on the following requirement:
for me, if we take fold_many_m_n:
I think this follow rust logic, In short, I pretty sure for |
According to you the range |
Yes ? fn main() {
let foo: Vec<_> = (0..=0).collect();
assert_eq!(foo.len(), 1);
} Found out that use core::ops::Bound;
fn main() {
let foo = &[0, 1, 2, 3, 4][(Bound::Excluded(1), Bound::Included(2))];
assert_eq!(foo.len(), 1);
assert_eq!(foo, &[2]);
} So I think I will correct my code to follow the convention, I describe above. |
Got it. So The range passed to the parser is supposed be a constraint on the final result (in other words, the final result is supposed to be within the constraint passed to the parser). Its not an iterator for you to use. So the amount of elements inside the range are irrelevant. |
as I say so, if I follow what you want there is no good reason to use range, you should prefer |
Sorry, I've not been paying attention to this. I've played around with several designs for this and I feel like using Some cases I played with that caused me problems
|
I think I finally understand the inner logic of range and so why my interpretation is correct for Range:
It's now obvious but somehow I miss this simple way to represent range. I think this is clearly how every user should interpret range. And this make it way more clear in my eye way excluded for start bound was doing +1 but that was included for max bound was doing +1. I will follow this convention for my PR and add additional documentation to make it clear. |
I say Option for cenodis cause he seem to ignore |
I run some bench unfortunately the 3 different current PR have a cost. I think this could be a acceptable loose I didn't know yet exactly why the 3 implementation add a cost but I suspect the benches of nom use too small input, and that even a one time branch is enough to show 10% regression. If most parser will run into this behavior this could be a problem. This I think is a reason to not depreciate |
Prerequisites
Here are a few things you should provide to help me understand the issue:
Idea
Nom has
many1
,many0
, andmany_m_n
functions. Similar with other parts of the API.What if instead we had a
IntoRange
trait that took in the different range types and single numbers.Example:
The text was updated successfully, but these errors were encountered: