librustc_lexer: Refactor the module #66015

popzxc · 2019-11-01T06:36:02Z

This PR introduces a refactoring of the librustc_lexer in order to improve readability.

All the changes performed are only cosmetic and do not introduce any changes the lexer logic or performance.

Newly introduced modules literal, token and utils are just copy-pasted from the lib.rs and do not contain even cosmetic changes (I decided to do so so it'll be easier to review changes looking only on diff).

r? @petrochenkov

cc @Centril @matklad

popzxc · 2019-11-01T06:46:04Z

src/librustc_lexer/src/lib.rs

+        // string with single quotes).
+        if self.first() == '\'' {
+            self.bump();
+            let kind = Char { terminated: true };


By the way, I'm not sure why we're consuming the literal suffix above, but do not consume here.
As a result, we have a different errors for non-single character single-quoted literals with suffixes depending on the first symbol:
Playground 1 / Playground 2

That's pretty esoteric, I know, but nevertheless it seems a bit inconsistent to me.

(I'm not sure if that can even be called a bug since the code in example is completely invalid)

I think if we detect error in the char literal, it's better to recover the next token as identifier, rather than treat it as a suffix

matklad

Don't have strong opinions on any changes suggested here :)

rutsc_lexer is mostly a from-scratch implementation, so the current shape of code is mostly what I'd consider the most readable implementation.

I am also feel slightly uneasy because it's not trivially clear that this doesn't change behavior. I wish we did lexer spec and full-coverage test-suite already :)

src/librustc_lexer/src/cursor.rs

src/librustc_lexer/src/utils.rs

src/librustc_lexer/src/lib.rs

matklad · 2019-11-01T08:46:53Z

src/librustc_lexer/src/lib.rs

+        // string with single quotes).
+        if self.first() == '\'' {
+            self.bump();
+            let kind = Char { terminated: true };


I think if we detect error in the char literal, it's better to recover the next token as identifier, rather than treat it as a suffix

src/librustc_lexer/src/lib.rs

matklad · 2019-11-01T08:54:43Z

src/librustc_lexer/src/lib.rs

-    fn float_exponent(&mut self) -> Result<(), ()> {
+    /// Eats the float exponent. Returns true if at least one digit was met,
+    /// and returns false otherwise.
+    fn eat_float_exponent(&mut self) -> bool {


All other eat_x functions have a contract that, if they return false, they don't consume anything.

This function always consumed something, and, if it returns an Err, you must report it, hence this weird owl-result/bool. It definitely could use a comment though :)

Hm, do they? For example, eat_decimal_digits will consume _______ and return false.

src/librustc_lexer/src/lib.rs

petrochenkov · 2019-11-01T21:00:52Z

I don't think this split into multiple files is an improvement.
The lexer was only recently rewritten and similarly to matklad I'm pretty happy about its current state and readability, and the file is far from being large enough for requiring a split

petrochenkov · 2019-11-01T21:02:26Z

Could you put the code into its old place? Then I'll be able to review the remaining diff.

popzxc · 2019-11-02T05:39:21Z

Sure, I'll bring everything back soon.

src/librustc_lexer/src/lib.rs

petrochenkov · 2019-11-03T07:59:05Z

+1 on eat_while and first/second.
Could you move them into separate commits so the rest of the PR can be reviewed more easily?

petrochenkov · 2019-11-03T16:09:31Z

src/librustc_lexer/src/lib.rs

-                // Newline without following '\'' means unclosed quote, stop parsing.
-                '\n' if self.nth_char(1) != '\'' => break,
-                // End of file, stop parsing.
-                EOF_CHAR if self.is_eof() => break,


Not sure why the order of match arms was changed here.

Well, I had two-level motivation here:

I ordered match arms depending on the termination level (first match has return, then go exceptional cases with break, then go char-skipping arms (escaped char and any other char)).

I thought that it's a bit more readable to have the normal exit condition to be the first match arm.

petrochenkov · 2019-11-03T16:12:07Z

r? @matklad

matklad

r=me with a comment about '0 expanded to mention error recovery, and with 13da2aa squashed into fdf74a3 (such that there's no back and forth with outlining and inlining methods).

Overall, I must say that some changes here are clearly wins, while others seem more like just a different equivalent way to write the same code. In the future, I would advise splitting uncontroversial strict improvements from stylistic changes, such that it becomes easier to access and merge both independently.

matklad · 2019-11-03T16:27:25Z

src/librustc_lexer/src/lib.rs

+            false
+        } else {
+            // If the first symbol is valid for identifier
+            // or it's a digit, it can be a lifetime.


It can't really be a lifetime if second is a digit. Rather, this is a special-cased error recovery.

matklad · 2019-11-03T16:33:44Z

src/librustc_lexer/src/lib.rs

@@ -682,15 +670,33 @@ impl Cursor<'_> {
        if self.eat_decimal_digits() { Ok(()) } else { Err(()) }
    }

-    // Eats the suffix if it's an identifier.
+    // Eats the suffix of the literal, e.g. "_u8".
    fn eat_literal_suffix(&mut self) {


seems like this method can be removed now?

I think it's better to have it for readability. It's obvious why we are calling "eat_literal_suffix" after parsing the literal, but it's not that obvious when we'll call "eat_identifier" instead.

matklad · 2019-11-03T16:50:19Z

src/librustc_lexer/src/lib.rs

        }
+
+        (n_hashes, started, finished)
    }


Not entirely sure if this is a simplification:

started/finished can be const-propagated so that there's no need to keep state in your head

mutable predicates are odd

To be honest, I don't completely like this too, but IMO it's better than the current approach.
With the current approach instead of having values of "started" and "finished" in the head, you had to remember what exactly "n_hashes, true, false" mean (which at least for me was a bit harder).

Regarding the mutable predicate: that's the cost of not having a eat_at_most_while. However, the scope of this predicate is pretty small and the context is simple, so I don't find it confusing.

Nevertheless I can revert those changes if you insist :)

No need to revert this: I don't claim that the old version was better. That's really just different ways to work-around the bad signature of the method.

popzxc · 2019-11-03T17:21:20Z

Regarding the splitting changes into several PRs: sure, I've got my lesson, and sorry for putting it all-together this time.
I'll try to do better in the future :)

matklad · 2019-11-04T10:31:22Z

@bors r+

Thanks!

bors · 2019-11-04T10:31:23Z

📌 Commit 31735b0 has been approved by matklad

@petrochenkov

…matklad librustc_lexer: Refactor the module This PR introduces a refactoring of the `librustc_lexer` in order to improve readability. All the changes performed are only cosmetic and do not introduce any changes the lexer logic or performance. Newly introduced modules `literal`, `token` and `utils` are just copy-pasted from the `lib.rs` and do not contain even cosmetic changes (I decided to do so so it'll be easier to review changes looking only on diff). r? @petrochenkov cc @Centril @matklad

@ghost

Rollup of 9 pull requests Successful merges: - #65776 (Rename `LocalInternedString` and more) - #65973 (caller_location: point to macro invocation sites, like file!/line!, and use in core::panic!.) - #66015 (librustc_lexer: Refactor the module) - #66062 (Configure LLVM module PIC level) - #66086 (bump smallvec to 1.0) - #66092 (Use KERN_ARND syscall for random numbers on NetBSD, same as FreeBSD.) - #66103 (Add target thumbv7neon-unknown-linux-musleabihf) - #66133 (Update the bundled `wasi-libc` repository) - #66139 (use American spelling for `pluralize!`) Failed merges: r? @ghost

rust-highfive assigned petrochenkov Nov 1, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 1, 2019

popzxc commented Nov 1, 2019

View reviewed changes

matklad reviewed Nov 1, 2019

View reviewed changes

petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 1, 2019

popzxc force-pushed the refactor-librustc_parser branch from f671200 to b93c988 Compare November 2, 2019 12:03

petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 2, 2019

petrochenkov reviewed Nov 3, 2019

View reviewed changes

src/librustc_lexer/src/lib.rs Outdated Show resolved Hide resolved

src/librustc_lexer/src/lib.rs Show resolved Hide resolved

petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 3, 2019

popzxc added 6 commits November 3, 2019 11:39

librustc_lexer: Add methods "first" and "second" to the "Cursor"

0825b35

librustc_lexer: Introduce "eat_while" and "eat_identifier" methods

72767a8

librustc_lexer: Make "eat_float_exponent" return bool instead of result

e0c45f7

librustc_lexer: Simplify "single_quoted_string" method

649a524

librustc_lexer: Simplify "double_quoted_string" method

d6f722d

librustc_lexer: Simplify "raw_double_quoted_string" method

6e350bd

popzxc force-pushed the refactor-librustc_parser branch from b93c988 to 13da2aa Compare November 3, 2019 11:54

petrochenkov reviewed Nov 3, 2019

View reviewed changes

rust-highfive assigned matklad and unassigned petrochenkov Nov 3, 2019

petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 3, 2019

matklad reviewed Nov 3, 2019

View reviewed changes

popzxc added 3 commits November 4, 2019 06:27

librustc_lexer: Simplify "lifetime_or_char" method

ecd2673

librustc_lexer: Reorder imports in lib.rs

e8b8d2a

librustc_lexer: Make nth_char method private

31735b0

popzxc force-pushed the refactor-librustc_parser branch from 13da2aa to 31735b0 Compare November 4, 2019 03:56

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 4, 2019

Centril mentioned this pull request Nov 6, 2019

Rollup of 9 pull requests #66143

Merged

bors merged commit 31735b0 into rust-lang:master Nov 6, 2019

popzxc deleted the refactor-librustc_parser branch November 6, 2019 09:46

SimonSapin mentioned this pull request Nov 12, 2019

sccache consistently errors on CI since Rust nightly-2019-11-07 servo/servo#24714

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

librustc_lexer: Refactor the module #66015

librustc_lexer: Refactor the module #66015

popzxc commented Nov 1, 2019

popzxc Nov 1, 2019

popzxc Nov 1, 2019

matklad Nov 1, 2019

matklad left a comment •

edited

Loading

matklad Nov 1, 2019

matklad Nov 1, 2019

popzxc Nov 1, 2019

petrochenkov commented Nov 1, 2019

petrochenkov commented Nov 1, 2019

popzxc commented Nov 2, 2019

petrochenkov commented Nov 3, 2019

petrochenkov Nov 3, 2019

popzxc Nov 3, 2019 •

edited

Loading

petrochenkov commented Nov 3, 2019

matklad left a comment

matklad Nov 3, 2019

matklad Nov 3, 2019

popzxc Nov 3, 2019

matklad Nov 3, 2019

popzxc Nov 3, 2019

matklad Nov 3, 2019

popzxc commented Nov 3, 2019

matklad commented Nov 4, 2019

bors commented Nov 4, 2019

librustc_lexer: Refactor the module #66015

librustc_lexer: Refactor the module #66015

Conversation

popzxc commented Nov 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matklad left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petrochenkov commented Nov 1, 2019

petrochenkov commented Nov 1, 2019

popzxc commented Nov 2, 2019

petrochenkov commented Nov 3, 2019

Choose a reason for hiding this comment

popzxc Nov 3, 2019 • edited Loading

Choose a reason for hiding this comment

petrochenkov commented Nov 3, 2019

matklad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

popzxc commented Nov 3, 2019

matklad commented Nov 4, 2019

bors commented Nov 4, 2019

matklad left a comment •

edited

Loading

popzxc Nov 3, 2019 •

edited

Loading