Reduce syntactic noise #1408

epage · 2021-09-27T19:58:21Z

Prerequisites

Here are a few things you should provide to help me understand the issue:

Rust version : 1.55.0
nom version : 7.0.0
nom compilation features used: default

Test case

I wanted to experiment with porting toml_edit to nom. Papercuts I ran into

Deep module nesting makes it slow to walk around docs
Deep module nesting makes it annoying to fully qualify functions
use nom::...::*; led to conflicts between nom, my parserr, and std (especially char0
After a foray with combine, a lot of the syntactic noise of nom stands out (though I still prefer nom)

An example parser

use nom::{
    branch::*, bytes::complete::*, character::complete::*, combinator::*, error::context, multi::*,
    sequence::*, AsChar, IResult,
};

...

pub(crate) fn dec_int(input: &str) -> IResult<&str, &str> {
    recognize(tuple((
        opt(alt((char('+'), char('-')))),
        alt((
            char('0'),
            map(
                tuple((
                    satisfy(|c| ('1'..='9').contains(&c)),
                    take_while(is_dec_digit_with_sep),
                )),
                |t| t.0,
            ),
        )),
    )))(input)
}

Ideal:

pub(crate) fn dec_int(input: &str) -> nom::IResult<&str, &str> {
    ((
        nom::opt(nom::alt(('+', '-'))),
        nom::alt((
            '0',
            ((
                nom::satisfy(|c| ('1'..='9').contains(&c)),
                nom::take_while(is_dec_digit_with_sep),
            )).map(|t| t.0)
        ))
    )).recognize()(input)

EDIT: version based on https://github.com/epage/nom-experimental

pub(crate) fn dec_int(input: Input<'_>) -> IResult<Input<'_>, &str> {
    (
        opt(one_of((b'+', b'-'))),
        alt((
            (
                DIGIT1_9,
                many0_count(alt((digit.value(()), (one_of(b'_'), cut(digit)).value(())))),
            )
                .value(()),
            digit.value(()),
        )),
    )
        .recognize()
        .map(|b: &[u8]| unsafe { from_utf8_unchecked(b, "`digit` and `_` filter out non-ASCII") })
        .parse(input)
}

Unsure how much is even possible or what downsides there might be, but ideas include

Flatten modules, where possible (work like Consolidate parser variants using ranges (e.g. many0, many_m_n) #1393 can help keep it navigatable)
~~impl Parser for u8, &[u8], char, and &str, reducing the need for char and tag~~
- ~~Maybe even implement parsers for ranges of u8 and char~~
- Divide of streaming vs complete prevents this
impl Parser for tuples, reducing the need for tuple(()), impl Parser for tuples #1417
Emphasize Parser::map in "choose your combinator" and docs (I didn't know it existed until I started writing this), Update docs to point to Parser::map over nom::combinator::map. #1415
Add additional inherent functions to Parser, like
- Parser::between(first, third)
- Parser::terminated_by(second)
- Parser::preceded_by(first)
- Parser::recognize()
- Parser::consumed()
- Parser::verify(bool)
- Parser::cut()
- Parser::complete()
- Parser::context(ctx)
- Thought process:
  - Avoid yoda speak (foo.not(), granted preceded_by somewhat violates that)
  - self should be a parser that is returning its value
- See also add map_res/map_opt combinators to Parser #1562

Figured I'd create this on issue for exploring the ideas and then create individual ones, rather than flood the issue list with every random idea

The text was updated successfully, but these errors were encountered:

Stargateur · 2021-09-27T20:15:27Z

My personal way to import nom module is as follow:

use nom::{
    branch, bytes::complete as bytes, character::complete as character, combinator, multi,
    sequence, AsChar, IResult as Outcome,
};

pub(crate) fn dec_int(input: &str) -> IResult<&str, &str> {
    combinator::recognize(sequence::tuple((
        combinator::opt(multi::alt((character::char('+'), character::char('-')))),
        multi::alt((
            character::char('0'),
            combinator::map(
                sequence::tuple((
                    character::satisfy(|c| ('1'..='9').contains(&c)),
                    multi::take_while(is_dec_digit_with_sep),
                )),
                |t| t.0,
            ),
        )),
    )))(input)
}

You get the idea. I think it's quite clear that way, that easy to read and write. I opposite to flattering module.

impl Parser for u8, &[u8], char, and &str, reducing the need for char and tag

That should be possible, for char yes, for u8 I'm not sure could be confusing and allow mistake, yes for &[u8] and &str.

Stargateur · 2021-09-27T23:44:37Z

I could see a world where we inverse foo::complete and foo::streaming for complete::foo and streaming::foo this make sense cause:

I don't think there is a case where in the same parser you mix complete and streaming
This allow to reduce the number of page in doc
This remove the need to write bytes::complete as bytes
Would allow people to import every complete or streaming modules in one line complete::*
Allow to more easily import module by reducing the fragmentation complete::{foo, bar, baz} vs {foo::complete as foo, bar::complete as bar, baz::complete as baz

Geal · 2021-10-10T20:19:03Z

These look like interesting suggestions, and more methods in Parser would be nice. I'm not sure about putting too much emphasis on method calls though (well, for map it's natural). Personally I like that there are multiple styles available, and maybe that is what the docs should show.
Maybe a prelude module could help too? Like, bring in all the streaming style parsers with ll the combinators

epage · 2021-10-11T15:06:19Z

For the module hierarchy and allowing impl Parser for &str / char. u8, it'd be nice if we had a way to make code polymorphic over the parser type (stream vs complete).

combine handles this by having streaming vs complete be different parse functions on the trait. This requires people implementing the parser trait to implement both methods. I assume we couldn't make one an inherent method in terms of the other because that would cause papercuts for one of the two workflows. I also appreciate the simplicity of the signature for function parsers and am unsure of a good way to preserve that.

epage mentioned this issue Sep 27, 2021

Provide shortcut for streaming::char #1410

Closed

This was referenced Sep 28, 2021

Swap module hierarchy so streaming/complete is in the root #1414

Open

Update docs to point to Parser::map over nom::combinator::map. #1415

Closed

impl Parser for tuples #1417

Closed

epage mentioned this issue Oct 22, 2021

Possibly replace git-conventional with conventional-commit-parser orhun/git-cliff#28

Closed

Geal added this to the 8.0 milestone Mar 14, 2022

epage mentioned this issue Jul 6, 2022

Allow types to be both stream and complete parsers #1535

Open

epage mentioned this issue Jul 26, 2022

Challenges with using nom #1506

Open

This was referenced Dec 7, 2022

Too many character parsers, but not enough #1580

Open

Convenience syntax for many #1581

Open

docs(parser): Clean up Parser function docs winnow-rs/winnow#36

Merged

This was referenced Dec 16, 2022

feat: Move non-grammar combinators to Parser winnow-rs/winnow#40

Merged

feat(parser): impl Parsers for literals winnow-rs/winnow#47

Merged

epage closed this as completed in winnow-rs/winnow#47 Dec 19, 2022

epage reopened this Dec 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce syntactic noise #1408

Reduce syntactic noise #1408

epage commented Sep 27, 2021 •

edited

Loading

Stargateur commented Sep 27, 2021

Stargateur commented Sep 27, 2021 •

edited

Loading

Geal commented Oct 10, 2021

epage commented Oct 11, 2021

Reduce syntactic noise #1408

Reduce syntactic noise #1408

Comments

epage commented Sep 27, 2021 • edited Loading

Prerequisites

Test case

Stargateur commented Sep 27, 2021

Stargateur commented Sep 27, 2021 • edited Loading

Geal commented Oct 10, 2021

epage commented Oct 11, 2021

epage commented Sep 27, 2021 •

edited

Loading

Stargateur commented Sep 27, 2021 •

edited

Loading