-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenizer implementation in Rust #2641
Conversation
6e61709
to
cbb4d9f
Compare
self.char_at(self.current) | ||
}; | ||
|
||
if alnum && self.current_char.is_alphanumeric() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole section is unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unnecessarily complexity for python performance
if index < self.size { | ||
self.char_at(index) | ||
} else { | ||
'\0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if you have \0 as a char? isn't that confusing? why isn't this just an optional char?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because using Option becomes cumbersome very quickly. You basically need to unwrap it explicitly everywhere you use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you ever have \0 in a user submitted char?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can hardly imagine this to be the case since it's universally used as a null-terminator for a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for '\0'
I landed on the same conclusion separately.
if end <= self.size { | ||
self.sql[start..end].iter().collect() | ||
} else { | ||
String::from("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a constant? or is it being allocated every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's allocated on heap every time. The type system doesn't permit using a constant value here, since String type is mutable in rust.
self.scan_radix_string(16, TokenType::HEX_STRING); | ||
} | ||
|
||
fn scan_radix_string(&mut self, radix: u32, radix_token_type: TokenType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this called radix string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beautiful
if index < self.size { | ||
self.char_at(index) | ||
} else { | ||
'\0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for '\0'
I landed on the same conclusion separately.
4975741
to
de1ca96
Compare
de1ca96
to
4936ba7
Compare
This update introduce the native implementation of the Tokenizer written in Rust. All parsing features available in the Python's version have been implemented but haven't been tested using the complete test suite yet.
Here are the remaining items that are out of the scope of this PR and will be addressed in future PRs:
TokenType
enum from the Python's version to ensure that new tokens are added in one place only.