Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser gets stuck after trying to parse auto-generated file #13788

Closed
jedel1043 opened this issue Dec 17, 2022 · 14 comments · Fixed by #13794
Closed

Parser gets stuck after trying to parse auto-generated file #13788

jedel1043 opened this issue Dec 17, 2022 · 14 comments · Fixed by #13794
Labels
A-parser parser issues

Comments

@jedel1043
Copy link

jedel1043 commented Dec 17, 2022

Rust analyzer's parser thread worker gets stuck after trying to parse big, auto-generated files, making r-a start throwing a lot of errors.
I would expect r-a to skip parsing all files marked with // @generated, but it's apparently still trying to parse them anyways.

To reproduce

git clone https://github.com/jedel1043/ra-bug.git && cd ra-bug

After opening the repo with r-a, it should start throwing a lot of errors.

rust-analyzer version: 0.3.1317-standalone

rustc version: rustc 1.66.0 (69f9c33d7 2022-12-12)

cc @Manishearth

@jedel1043 jedel1043 changed the title Rust analyzer parser gets stuck after trying to parse auto-generated file Parser gets stuck after trying to parse auto-generated file Dec 17, 2022
@Manishearth
Copy link
Member

I don't think @generated is standard, fwiw, it's just what we use

but clearly databake crates are a pathological case for r-a and should be investigated, and also potentially there should be a standard solution for such pathological cases

we're happy to generate code that uses some kind of internal feature to expose the same API but lie to r-a about contents

@Veykril
Copy link
Member

Veykril commented Dec 17, 2022

Curious, what is in that generated output? I assume its not public facing API if it would be fine for r-a to ignore? Ideally we would have a tool attribute to skip out on specific modules for r-a #11556 I think

@Manishearth
Copy link
Member

Manishearth commented Dec 17, 2022

@jedel1043 you should check in the generated output into tree (both for the purposes of this issue and in the long run)

@Veykril Here's ICU4X's version of this. We work around this by using include!() on mod.rs which doesn't seem to work for @jedel1043 for whatever reason. Using mod instead there would fix it. Everything in this folder except for mod.rs and any.rs is a mess of generated static code.

https://github.com/unicode-org/icu4x/tree/main/provider/testdata/data/baked

All the generated modules are private.

"it would be fine for r-a to ignore" is missing some nuance, the code is used in implementations of some traits in mod.rs and any.rs, however we could write impls of those traits that simply panic or return Err or something, if there was a way to indicate to r-a that it should ignore these files.

I think there are two issues here:

  • r-a gets stuck/broken around this. It really shouldn't, it's fine if it slows down I guess but it shouldn't break
  • there should be a way to tell r-a to ignore something, perhaps a cfg or something

I suspect "ignore certain kinds of files" isn't actually feasible because you get a broken crate.

@Veykril
Copy link
Member

Veykril commented Dec 17, 2022

We work around this by using include!() on mod.rs

What exactly do you mean with this, does r-a fail to handle a specific scenario here that you are making use of to avoid the parsing bug?

@Veykril
Copy link
Member

Veykril commented Dec 17, 2022

And I see the generated file that is causing r-a to bail on parsing, data/baked/datetime/japanext/datesymbols_v1.rs with 290k lines is quite a beast 😓 The parser thinks its stuck because of the sheer amount of tokens (> 15_000_000) in the file which is a sanity check we have in place in case the parser doesn't terminate on an input.

I suspect "ignore certain kinds of files" isn't actually feasible because you get a broken crate.

Well in this case thats the only option aside from bumping the token limit for r-a, we can't "cfg" out things to prevent them from being parsed, we either parse a file or we don't

@Veykril Veykril added the A-parser parser issues label Dec 17, 2022
@Manishearth
Copy link
Member

we can't "cfg" out things to prevent them from being parsed, we either parse a file or we don't

can't there be a cfg option that gets set when r-a is running? there's a cfg(clippy) for this reason iirc

@Manishearth
Copy link
Member

What exactly do you mean with this, does r-a fail to handle a specific scenario here that you are making use of to avoid the parsing bug

You can see what we do there, it does not seem to break r-a. We are pinning to an old rustc there, I'm not sure if r-a respects pinned versions and is using an older version whereas our include hack stops working for @jedel1043 in his code.

I don't use r-a so I don't know the details but @robertbastian might.

@Veykril
Copy link
Member

Veykril commented Dec 17, 2022

can't there be a cfg option that gets set when r-a is running? there's a cfg(clippy) for this reason iirc

No what i meant it is even if an item has a cfg attached to it we will need to parse it since attribute/cfg resolution happens after parsing, so at best you can cfg out the mod declaration (with the cfg being on the declaration, not inside the module) to prevent r-a from parsing the file. Whether thats a cfg or a tool attribute doesn't really matter in that regard, alternatively we could make the token limit configurable in some way (via an attribute or something).

@jedel1043
Copy link
Author

Added the generated files to the repo example.

@robertbastian
Copy link

@Manishearth I'm not sure if r-a actually ignores our baked testdata. It is a lot smaller than the full data that @jedel1043 is working with, it could just be working.

r-a gets stuck/broken around this. It really shouldn't, it's fine if it slows down I guess but it shouldn't break

This could potentially be fixed by further breaking up the files into smaller ones, down to one static per file. Does r-a work per-module or per-file? I.e. can we use include or do we have to split the files into different mods?

@Veykril
Copy link
Member

Veykril commented Dec 19, 2022

either should work, macro expansions are handled as separate virtual files, so using includes doesn't cause bigger syntax trees in a file.

bors added a commit that referenced this issue Dec 19, 2022
…=jonas-schievink

fix: fix "parser seems stuck" panic when parsing colossal files

The parser step count is incremented every time the parser inspects a token. It's purpose is to ensure the parser doesn't get stuck in infinite loops. But since `self.pos` grows monotonically when parsing source code, it gives a better idea for whether the parser is stuck or not: if `self.pos` is changed, we know that the parser cannot be stuck, so it is safe to reset the step count to 0. This makes the limit check scale with the size of the file, and so should fix #13788.
@jonas-schievink
Copy link
Contributor

In #13794 I fixed the parser step limit to scale with the size of the file automatically. That should fix the panics you're seeing.

@jedel1043
Copy link
Author

So, I tried again with the latest nightly and the parser doesn't panic anymore... but now R-A OOMs while trying to analyze the files. Should I open another issue to track improvements on this?

@jonas-schievink
Copy link
Contributor

Probably, yeah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser parser issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants