Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce syntactic noise #1408

Open
epage opened this issue Sep 27, 2021 · 4 comments · Fixed by winnow-rs/winnow#47
Open

Reduce syntactic noise #1408

epage opened this issue Sep 27, 2021 · 4 comments · Fixed by winnow-rs/winnow#47
Milestone

Comments

@epage
Copy link
Contributor

epage commented Sep 27, 2021

Prerequisites

Here are a few things you should provide to help me understand the issue:

  • Rust version : 1.55.0
  • nom version : 7.0.0
  • nom compilation features used: default

Test case

I wanted to experiment with porting toml_edit to nom. Papercuts I ran into

  • Deep module nesting makes it slow to walk around docs
  • Deep module nesting makes it annoying to fully qualify functions
  • use nom::...::*; led to conflicts between nom, my parserr, and std (especially char0
  • After a foray with combine, a lot of the syntactic noise of nom stands out (though I still prefer nom)

An example parser

use nom::{
    branch::*, bytes::complete::*, character::complete::*, combinator::*, error::context, multi::*,
    sequence::*, AsChar, IResult,
};

...

pub(crate) fn dec_int(input: &str) -> IResult<&str, &str> {
    recognize(tuple((
        opt(alt((char('+'), char('-')))),
        alt((
            char('0'),
            map(
                tuple((
                    satisfy(|c| ('1'..='9').contains(&c)),
                    take_while(is_dec_digit_with_sep),
                )),
                |t| t.0,
            ),
        )),
    )))(input)
}

Ideal:

pub(crate) fn dec_int(input: &str) -> nom::IResult<&str, &str> {
    ((
        nom::opt(nom::alt(('+', '-'))),
        nom::alt((
            '0',
            ((
                nom::satisfy(|c| ('1'..='9').contains(&c)),
                nom::take_while(is_dec_digit_with_sep),
            )).map(|t| t.0)
        ))
    )).recognize()(input)

EDIT: version based on https://github.com/epage/nom-experimental

pub(crate) fn dec_int(input: Input<'_>) -> IResult<Input<'_>, &str> {
    (
        opt(one_of((b'+', b'-'))),
        alt((
            (
                DIGIT1_9,
                many0_count(alt((digit.value(()), (one_of(b'_'), cut(digit)).value(())))),
            )
                .value(()),
            digit.value(()),
        )),
    )
        .recognize()
        .map(|b: &[u8]| unsafe { from_utf8_unchecked(b, "`digit` and `_` filter out non-ASCII") })
        .parse(input)
}  

Unsure how much is even possible or what downsides there might be, but ideas include

Figured I'd create this on issue for exploring the ideas and then create individual ones, rather than flood the issue list with every random idea

@Stargateur
Copy link
Contributor

My personal way to import nom module is as follow:

use nom::{
    branch, bytes::complete as bytes, character::complete as character, combinator, multi,
    sequence, AsChar, IResult as Outcome,
};

pub(crate) fn dec_int(input: &str) -> IResult<&str, &str> {
    combinator::recognize(sequence::tuple((
        combinator::opt(multi::alt((character::char('+'), character::char('-')))),
        multi::alt((
            character::char('0'),
            combinator::map(
                sequence::tuple((
                    character::satisfy(|c| ('1'..='9').contains(&c)),
                    multi::take_while(is_dec_digit_with_sep),
                )),
                |t| t.0,
            ),
        )),
    )))(input)
}

You get the idea. I think it's quite clear that way, that easy to read and write. I opposite to flattering module.

impl Parser for u8, &[u8], char, and &str, reducing the need for char and tag

That should be possible, for char yes, for u8 I'm not sure could be confusing and allow mistake, yes for &[u8] and &str.

@Stargateur
Copy link
Contributor

Stargateur commented Sep 27, 2021

I could see a world where we inverse foo::complete and foo::streaming for complete::foo and streaming::foo this make sense cause:

  • I don't think there is a case where in the same parser you mix complete and streaming
  • This allow to reduce the number of page in doc
  • This remove the need to write bytes::complete as bytes
  • Would allow people to import every complete or streaming modules in one line complete::*
  • Allow to more easily import module by reducing the fragmentation complete::{foo, bar, baz} vs {foo::complete as foo, bar::complete as bar, baz::complete as baz

@Geal
Copy link
Collaborator

Geal commented Oct 10, 2021

These look like interesting suggestions, and more methods in Parser would be nice. I'm not sure about putting too much emphasis on method calls though (well, for map it's natural). Personally I like that there are multiple styles available, and maybe that is what the docs should show.
Maybe a prelude module could help too? Like, bring in all the streaming style parsers with ll the combinators

@epage
Copy link
Contributor Author

epage commented Oct 11, 2021

For the module hierarchy and allowing impl Parser for &str / char. u8, it'd be nice if we had a way to make code polymorphic over the parser type (stream vs complete).

combine handles this by having streaming vs complete be different parse functions on the trait. This requires people implementing the parser trait to implement both methods. I assume we couldn't make one an inherent method in terms of the other because that would cause papercuts for one of the two workflows. I also appreciate the simplicity of the signature for function parsers and am unsure of a good way to preserve that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants