-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new TokeniseOption EnsureLF #336
Conversation
Could we replace the instructions
with a call to this function which will be more efficient
I tested this function with the following code. You may use it for the test function.
See and test it in the playground: https://play.golang.org/p/t4-rE3grHH8 |
Confirmed. Your method is x2-x3 times faster than mine.
|
func ensureLF(text string) string { | ||
buf := make([]byte, len(text)) | ||
var j int | ||
for i := 0; i < len(text); i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is unicode safe is it? If a unicode code point contains \r\n
this will mangle the codepoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this would iterate over the bytes (uint8s). We have to use a for range loop here, so we iterate over the individual runes.
Here's a version which operates on the runes directly. It should be unicode-safe, but I didn't benchmark it yet.
|
ASCII characters are single byte unicode code points. Multi-byte code points have at least one of the two most significant bits set to one in all their bytes. There is no possible confusion with ASCII control characters. Working with runes will be less efficient and it's not necessary. |
Good point! |
* Add new TokeniseOption EnsureLF ref alecthomas#329 * Use efficient process suggested by @chmike
ref #329
related gohugoio/hugo#6596
I have been wondering if it is better to add this functionality to chroma in itself or chroma's client (such as Hugo) because it will be a breaking change somehow.
Walking around chroma's source code and playing with your online demo (it always uses
\n
and works fine), I am inclined to implement this functionality on chroma rather than on its client because I assume that logics behind chroma expect\n
as EOL .So I added
EnsureLF
asTokeniseOption
and the default values is true.Any comments are welcome.