Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak while parsing improperly terminated inherit-expressions #31

Merged
merged 1 commit into from
Sep 23, 2021

Conversation

Ma27
Copy link
Member

@Ma27 Ma27 commented Sep 19, 2021

First of all, big thanks to @fufexan who helped me to reliably reproduce
this.

Originally discovered in rnix-lsp[1], but I confirmed that
nixpkgs-fmt is also affected.

Basically, when having an expression such as

let
  inherit

the parser would wait for a TOKEN_SEMICOLON indefinitely. The actual
problem however is that self.parse_val() always detects the SAME
syntax-error, i.e. "unexpected EOF". This will be written indefinetely
into self.errors. However, errors is of type Vec<ParseError> and a
vector in Rust grows in an amortized fashion[2] which means that if an
entry is pushed and the vector exceeds the currently allocated size, it
will be ~doubled (though the exact growth-factor isn't constant).

This essentially means that the buffer is growing exponentially pretty fast
and - according to KDE heaptrack - my system allocated ~9.5GB after 20s
while running some tests.

I added an exit-condition to the loop traversing through
inherit-subexpressions to avoid that. Checking for an "unexpected EOF"
is actually sufficient here:

  • There's either a ; later in the expression causing the loop to
    terminate and causing an actual "unexpected token" error then.

  • Otherwise, parse_val will go through the tokens until a matching
    semicolon is found (which is not the case) and then reach the end of
    the file. In that case, unexpected EOF is returned by parse_val.

[1] nix-community/rnix-lsp#33
[2] https://www.cs.cornell.edu/courses/cs3110/2011sp/Lectures/lec20-amortized/amortized.htm

…ions

First of all, big thanks to @fufexan who helped me to reliably reproduce
this.

Originally discovered in `rnix-lsp`[1], but I confirmed that
`nixpkgs-fmt` is also affected.

Basically, when having an expression such as

    let
      inherit

the parser would wait for a `TOKEN_SEMICOLON` indefinitely. The actual
problem however is that `self.parse_val()` always detects the SAME
syntax-error, i.e. "unexpected EOF". This will be written indefinetely
into `self.errors`. However, `errors` is of type `Vec<ParseError>` and a
vector in Rust grows in an amortized fashion[2] which means that if an
entry is pushed and the vector exceeds the currently allocated size, it
will be ~doubled (though the exact growth-factor isn't constant).

This essentially means that the buffer is growing exponentially pretty fast
and - according to KDE heaptrack - my system allocated ~9.5GB after 20s
while running some tests.

I added an exit-condition to the loop traversing through
`inherit`-subexpressions to avoid that. Checking for an "unexpected EOF"
is actually sufficient here:

* There's either a `;` later in the expression causing the loop to
  terminate and causing an actual "unexpected token" error then.

* Otherwise, `parse_val` will go through the tokens until a matching
  semicolon is found (which is not the case) and then reach the end of
  the file. In that case, `unexpected EOF` is returned by `parse_val`.

[1] nix-community/rnix-lsp#33
[2] https://www.cs.cornell.edu/courses/cs3110/2011sp/Lectures/lec20-amortized/amortized.htm
@Ma27
Copy link
Member Author

Ma27 commented Sep 19, 2021

@fufexan btw would you mind building your rnix-lsp with this branch of rnix-parser and check for a few days if the issue is really gone for you? :)

@fufexan
Copy link

fufexan commented Sep 19, 2021

@Ma27 I would, I'm just not sure how to build it, as I haven't worked with rust/cargo before, so I've got no clue how to replace the default rnix package with the branch.

@Ma27
Copy link
Member Author

Ma27 commented Sep 19, 2021

Oh right, sorry!

The thing is, naersk doesn't seem to support Git repositories as crate overrides, so I took a bit more manual way myself:

Copy link
Member

@aaronjanse aaronjanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa! Thank you so much @Ma27 @fufexan!

Everything here LGTM, although I haven't yet had a chance to run the code. Tomorrow morning I'll try to write a flake to make it easy to build rnix-lsp with a custom rnix-parser

@aaronjanse
Copy link
Member

This should build rnix-parser, using this PR:

nix build git+https://gist.github.com/aaronjanse/0b6145784fcb1aa6ee1419248ae53872

@Ma27
Copy link
Member Author

Ma27 commented Sep 22, 2021

@fufexan did you have a chance to test this? :)

@fufexan
Copy link

fufexan commented Sep 22, 2021

@Ma27 yes, all works fine. Haven't had a memleak since switching :)

Copy link
Member

@aaronjanse aaronjanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, and the code looks good to me. Thank you @Ma27!

@Ma27 Ma27 merged commit 61f19a9 into master Sep 23, 2021
@Ma27 Ma27 deleted the fix-memleak branch September 23, 2021 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants