-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement lexing and parsing for all WGSL number literal types #1184
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very elaborate!
I just noticed that I broke the lexing of the width of the integer literal (i.e. the number after a type suffix, e.g. |
WGSL has types defined as "xW", where W is the number of bits. We definitely need to parse that. |
I think I might be missing something in the spec. I was looking at https://gpuweb.github.io/gpuweb/wgsl/#literals so far. Could you perhaps point me to the right section in the spec, that describes this part of the language? As far as I can tell (from reading parts of the naga code) the Edit2: The width returned in consume_number should probably always be Edit3: I think part of the misunderstanding may be, that the suffixes for number literals are not the type names (i.e. |
Ok, yeah, the suffixes are not type names, they just look the same. Parsing a literal can just return the width of 4 (in bytes). |
Currently the Lines 46 to 50 in 551b711
Should I just return "32" then? Since the width seems to be a string with the decimal digits for the number of bits, seeLines 1551 to 1553 in 551b711
Lines 1051 to 1058 in 551b711
|
Just return an empty string there? |
With 9ff29df the only tests that still fail, fail because of the parsing code currently not being capable of handling hexadecimal integers and floats. And with c5817f0 the code compiles with Rust 1.43 (at least on my machine). |
which code is this? |
The following are only needed if the state machine should also check for validity due to stuff like negative zeros (not allowed), floats with a uint suffix, etc.: Edit: Except for negative zeros (which the spec does allow for hexadecimal integers and all kinds of floats, just not for decimal integers), these issues would all also fail at the parsing stage later on. |
I don't have a clear idea on where to draw the line in responsibilities between the new state machine and the parser. Keeping this code around isn't bad, since it's fairly isolated. So it seems to me that we can proceed and then see if it can be simplified? |
There are still two TODO comments left with regards to the verification code. One where the verification code could be called and one suggesting that the verification code should probably use |
No need to remove. Keeping them is fine. |
I just took a quick look at the parsing side of things: |
How would you like the code to be simplified? I'm not sure the state machine part of the code (the giant match that is) can be simplified much further without accepting a broader or a narrower syntactical definition of the literals. One avenue of simplification could be to parse the number literals into Rust types in the lexer already (since we already have the structure of the literal available here: hex or not?, decimal digits or not?, etc.). Searching around, some lexers seem to do this (i.e. have the token contain the value, not the string), but that would require returning a Another option would be to just not care too much about the syntactical detail in the lexer and rely on the parser to do the validation of the syntax, but this would complicate the parser, as the syntax of WGSL is sufficiently different from that of Rust (and the parsing tools it offers). |
Anything block you on this? |
Well I guess I'd like to hear from you how you'd like to see the code structured (see previous comment from me). Personally I'd just keep the lexing code as-is and add some modifications to the parsing code to allow for hex integers. Hex float parsing could be done with something like hexf-parse, though v0.2.1 requires Rust ≥1.45, v0.1.0 is Rust ≥ 1.15. I haven't yet looked at the changes between these versions. Edit: regarding hex floats, I could also just add an error reporting that hex floats are a NYI literal format for now. |
There's also Lines 294 to 320 in 448ea65
These do their own parsing, unaware of hexadecimal representations and should probably replaced with functions in the parsing code. They only seem to be used for integer literals as parameters to attributes like location .
|
I think I'll just write up a commit with the changes I think this needs to allow for parsing & lexing of hex integers (hex float parsing can wait a bit). If something is off with that, you can just tell me then ^^ |
Yes, that sounds good, thank you! |
Mmm... might take me a bit to make a reasonably correct parsing impl for integer literals because of the fun with non-negative / positive / possibly uint or sint, but no larger than (Edit: this is an issue because it complicates replacing the lexer parsing methods that kinda break separation of concerns) |
c5817f0
to
e6efcfa
Compare
Ok, with e6efcfa I've implemented parsing for hexadecimal integers. Now there are only two open points for this PR:
|
35e0360
to
0355553
Compare
For hex parsing, the group hasn't figured out the full story yet - https://github.com/gpuweb/gpuweb/issues?q=is%3Aissue+is%3Aopen+hex+ |
First of all, thanks for bringing this to my attention, I was not aware of the fact that there is still bikeshedding happening at the time. I don't know any of the authors of the spec, I am in fact rather new to WGSL in general. I just wanted to use wgpu with WGSL shaders ported from my current GLSL shaders (since WGSL and SPIR-V are the two shader langs with first class support in wgpu). Does naga already make guarantees about the backwards compatibility with regards to parsing and lexing? Since the spec still seems to be in flux, especially with something important like gpuweb/gpuweb#775, this doesn't really seem possible at them moment. I think one thing to at least make people aware of the friction / possible changes would be to prominently display the current spec version that the implementation is based on in the Sadly, I don't think the spec has something like the Redline editions of the IEEE specs (i.e. a page where a fully rendered spec has changed sections marked up accordingly). The next best thing is probably looking at the changes to the sources in the git repo. I believe my implementation of the lexing is relatively easy to modify in terms of changes similar to accepting the exponent marker case-insensitively. Harder changes would be in the greediness of number parsing or the The best scenario for naga would probably be an author or somebody working closely with the authors of the spec writing issues / pinging contributors to naga in case of changes. But that person would likely have to know the naga implementation of the spec in order to know how the changes affect naga. I can't really offer much valuable advice as I'm still rather inexperienced in the world of specs, spec implementations and especially implementations of specs in the drafting stage, but I hope my thoughts above are at least somewhat helpful (and sorry for the giant block of text). |
0355553
to
1aa0d42
Compare
I've now implemented parsing of hex floats with the hexf-parse crate. Version 0.2.1 of hexf-parse is compatible with 1.43 and the |
I haven't updated the test output files in |
This should not happen. I recently found and fixed a case for GLSL outputs. If you rebase, the test outputs should be all matching. |
Suffixes are not type names and currently only a plain `u` is supported for uints. More specifically, `i` and `f` suffixes or suffixes with widths in bits like `u32` are not supported at the moment.
Ah yes, I've rebased onto master now and now the test outputs are up to date again. My changes do affect them at the moment in exactly one way: There's also the option of casting back down to an |
1aa0d42
to
d5820e3
Compare
d5820e3
to
1a4021c
Compare
Those |
Yeah, I mean when interpreted as an Other than that, I'd say the code is fine now. Do you have any points you'd like me to address still? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you! Just a few organizational things.
Should I squash some commits together or even squash them all into one after you give your stamp of approval? Some commits like the removal of Rust 1.43 incompatible code are prime candidates for squashing, including the cleanup commits just now (which I mostly did so you can get a better idea of what I changed). |
I tried to implement lexing for all different types of number literals
as described by the current working draft and editor's draft
(they are the same for the literal definitions at the time of writing).
This should bring naga closer to being able to parse all kinds of number literals,
partially resolving #866 for example.
The next step would be to also allow for parsing of these new kinds of number literals,
as the currently used
FromStr::parse
won't work for hexadecimal literals (float or integer), see rust-lang/rfcs#1098.The relevant code for this seems to be here:
naga/src/front/wgsl/mod.rs
Lines 1027 to 1060 in e384bce
I have added the test cases I had written up for #866 to
tests/in/operators.wgsl
since they seem to fit in there(though the file name might need some bikeshedding, should they stay there).
I have not yet adjusted any of the output files for the test, as I'm somewhat unsure about what they'd look like.
While I have tried my best to make the code easy to understand, it has become somewhat unwieldy due to the state machinery required to (hopefully) correctly lex the literals.
This state machine code also still needs to see some testing / fuzzing, which I haven't done myself so far.
I have tested how naga fails for
operators.wgsl
and my small tests seem to indicate that the literals are lexed correctly.For example, with the following WGSL code:
the modified code reports the following error:
wheras the old code reports the following:
An alternative to lexing the literals this way, with a manually written state machine, would be to use the regexes given in the WGSL spec directly (after correcting them, since they use
+
ambiguously for example) with some existing regex engine like https://github.com/rust-lang/regex, this would pull in additional dependencies, but might be worth it, to reduce maintenance efforts, should the spec change it's definitions.