Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

many0 and many1 with_capacity option #1495

Open
voronaam opened this issue Jan 27, 2022 · 2 comments
Open

many0 and many1 with_capacity option #1495

voronaam opened this issue Jan 27, 2022 · 2 comments
Milestone

Comments

@voronaam
Copy link

Currently many0 and many1 parsers pre-allocate a Vector with capacity just 4 for the result:

https://github.com/Geal/nom/blob/7.0.0/src/multi/mod.rs#L47

This hurts performance in our case where we need to parse thousands of identical items, but do not know their exact number. We can estimate the approximate number of items in the resulting vector from the size of the input, but not the exact count (so, we can not use count or many_m_n).

Would it be possible to add into nom a version of many0 with the user specified initial capacity for the vector-accumulator?

@Xiretza
Copy link
Contributor

Xiretza commented Jan 28, 2022

Sounds like it might be a good addition after #1402 lands. Right now, there are already too many parsers in this space, duplicating some of them to allow passing a size hint would make the situation even worse.

You can always implement such a combinator yourself, of course, either by adapting the existing implementations or using fold.

@voronaam
Copy link
Author

We have changed our implementation to use fold, thank you for the suggestion. For the benefit of anybody landing on this issue via a google search, it is going from

let (input, items) = many0(parse_line)(input)?;

to

let capacity = 4 + input.len() / 100; // Or whatever is a good enough guess
let (input, items) = fold_many0(
	parse_line,
	|| Vec::with_capacity(capacity),
	|mut acc: Vec<_>, item| {
		acc.push(item);
		acc
	},
)(input)?;

I think it is still a good idea to add a with_capacity version of the many0 and agree it would be best done after the PR you mentioned is merged.

@Geal Geal added this to the 8.0 milestone Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants