Started with the new lexer implementation #432

Razican · 2020-05-31T11:17:35Z

This Pull Request fixes #294.

It changes the following:

The lexer now can be created with anything that implements Read. Ideally, we should use either a Cursor<String> if we are reading input from user (for the console, for example), or a buffered reader when reading from files.
Adds stream lexing, which can be used with code streams that are not yet completely available.
Adds stream parsing. This means that the parser does not need to wait for all the tokens to be lexed before the parsing starts.
Adds goal symbols, which means we can correcly identify the difference between a division / and a regular expression literal starting with /.

Note that this is still WIP. I have only laid out the initials with the new cursor for the lexer, but I wanted to have it here in order to have benchmarks soon, and to receive feedback.

Lan2u · 2020-06-09T20:36:31Z

Have been looking through this / how the lexer used to work and I think I have a basic understanding of where stuff is. Is there an area within the new lexer that I could look into working on?

Razican · 2020-06-10T06:07:39Z

Have been looking through this / how the lexer used to work and I think I have a basic understanding of where stuff is. Is there an area within the new lexer that I could look into working on?

Basically, porting the old lexer to the new architecture would be nice. You can create PRs to this branch. If you find you need something new from the cursor, let me know, and I can add it.

I might have some time this week to finish the unimplemented functions in the cursor. Then we need to have some extra logic to use the goal symbols.

Lan2u · 2020-06-10T20:08:07Z

Have started by working my way through the old lex() function and moving across code for each of token types

Moved across (if something wasn't implemented before I haven't implemented it yet, TODO's etc. remain)

Lan2u · 2020-06-10T20:35:59Z

When it comes to matching the start of a token it would be nice to keep the characters matched on as part of the same file that the lexing is done in i.e. in
let token = match next_chr { '\r' | '\n' | '\u{2028}' | '\u{2029}' => Ok(Token::new( TokenKind::LineTerminator, Span::new(start, self.cursor.pos()), )), '"' | '\'' => StringLiteral::new(next_chr).lex(&mut self.cursor, start), TemplateLiteral::BEGIN_CHR => TemplateLiteral::new().lex(&mut self.cursor, start), _ => unimplemented!(), };

I think it would be cleaner to move the '"' | '\'' for StringLiteral into the string file. This could be done by having a Literal::BeginChr(c) which is called for each literal type until one returns true indicating it can start lexing. This obviously might come with some performance hit so there might be a better way - macros? - Ideally something like a c:
#define STRING_LITERAL_CHECKS '"' | '\''

Co-authored-by: Iban Eguia <razican@protonmail.ch>

Lan2u · 2020-06-11T20:37:51Z

I see the cursor gets ASCII bytes - what about if unicode is used?

Razican · 2020-06-12T07:29:49Z

This obviously might come with some performance hit so there might be a better way - macros? - Ideally something like a c:
#define STRING_LITERAL_CHECKS '"' | '\''
I see that you created some macros in the PR. I think that's the way to go for now. We'll see if in the future this gets a bit too difficult to maintain or can be improved.

I see the cursor gets ASCII bytes - what about if unicode is used?

The cursor goes through bytes, independently if they are ASCII or not. Then, there is a wrapper that converts them to Unicode if needed.

Lan2u · 2020-06-12T20:25:38Z

@Razican is it possible to allow putting tokens back onto the cursor? It would be useful for handling cases like regex (or alternatively give the option to peek more than a single cursor ahead).

Razican · 2020-06-12T20:27:24Z

@Razican is it possible to allow putting tokens back onto the cursor? It would be useful for handling cases like regex (or alternatively give the option to peek more than a single cursor ahead).

Yep, we should be able peek at most 4 characters. Maybe during the weekend I have time to implement that in the cursor.

jasonwilliams · 2020-07-04T16:41:09Z

Is this PR superseeded by #486 ?

Lan2u · 2020-07-04T16:49:06Z

Is this PR superseeded by #486 ?

I think so unless @Razican has local changes?

Razican · 2020-07-04T19:33:50Z

I have no further local changes, we can close this :)

Razican added performance Performance related changes and issues parser Issues surrounding the parser lexer Issues surrounding the lexer labels May 31, 2020

Razican mentioned this pull request Jun 2, 2020

New Public API #445

Closed

Started with the new lexer implementation

2f78ebe

Razican force-pushed the new_lexer branch from 4f3ad80 to 2f78ebe Compare June 10, 2020 17:04

Lan2u mentioned this pull request Jun 10, 2020

New Lexer: Minimal amount to allow compiling #477

Merged

New Lexer: Minimal amount to allow compiling (#477)

688fff5

Co-authored-by: Iban Eguia <razican@protonmail.ch>

Lan2u mentioned this pull request Jun 12, 2020

New lexer #486

Closed

Razican mentioned this pull request Jun 14, 2020

Initial template literal lexer implementation #254

Closed

Lan2u mentioned this pull request Jun 21, 2020

New Lexer: Updating parser to use new lexer #517

Closed

Razican closed this Jul 4, 2020

Razican deleted the new_lexer branch July 9, 2020 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Started with the new lexer implementation #432

Started with the new lexer implementation #432

Razican commented May 31, 2020

Lan2u commented Jun 9, 2020

Razican commented Jun 10, 2020

Lan2u commented Jun 10, 2020 •

edited

Loading

Lan2u commented Jun 10, 2020 •

edited

Loading

Lan2u commented Jun 11, 2020

Razican commented Jun 12, 2020

Lan2u commented Jun 12, 2020

Razican commented Jun 12, 2020

jasonwilliams commented Jul 4, 2020

Lan2u commented Jul 4, 2020

Razican commented Jul 4, 2020

Started with the new lexer implementation #432

Started with the new lexer implementation #432

Conversation

Razican commented May 31, 2020

Lan2u commented Jun 9, 2020

Razican commented Jun 10, 2020

Lan2u commented Jun 10, 2020 • edited Loading

Lan2u commented Jun 10, 2020 • edited Loading

Lan2u commented Jun 11, 2020

Razican commented Jun 12, 2020

Lan2u commented Jun 12, 2020

Razican commented Jun 12, 2020

jasonwilliams commented Jul 4, 2020

Lan2u commented Jul 4, 2020

Razican commented Jul 4, 2020

Lan2u commented Jun 10, 2020 •

edited

Loading

Lan2u commented Jun 10, 2020 •

edited

Loading