-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support \u in string literals #1982
Comments
Note, this is just explicitly failing on the unsupported code, it should be a small task to pick up. |
Hey! Is this issue still open? If yes, I would like to give it a try. This is my first time working on open source. So any advice I can get on getting started with this would be welcome. I have gone through the CONTRIBUTIONS and CODE_OF_CONDUCT. |
I think @etalian is focused on #1971 so probably fine for you to grab, @vinsdragonis |
So, after analyzing the files under So here's my approach to fixing it. We can create a standalone function that behaves like the The function could go something like this.
While this may not be the most efficient approach to patch up the problem, there are still a few other escape sequences and more features yet to be implemented. If there is a need to only print a message and exit at the time, this could prove helpful. Please let me know if I could go ahead with it. And please excuse me if my approach doesn't seem up to the mark as is my first time trying open source. |
While that would likely make the error message less jarring, the issue here is that an error is raised in the first place. NOTE: in C++, a |
Agreed with @etalian -- while the message isn't ideal, the intent of this issue is to fix it by adding support, not just change the message. I think UTF-8 is the right solution too. |
K, I will look into this. |
Okay, so after a bit of research on the function mentioned by @etalian, an alternate idea came up. Why not consider writing a function which can manually convert the hex value to a UTF-8 string? |
This is the proposed function that could be used for hex to UTF-8 conversions:
No problems if this is not what is needed. I can always go with the UTF-8 function mentioned by @etalian if that is what's expected. |
For a quick description of how UTF-8 works: https://fasterthanli.me/articles/working-with-strings-in-rust#a-very-quick-utf-8-primer As another example, the emoji 😁 corresponds to the Unicode literal |
Hi! Would it be OK if I also took a shot at this issue? Like @vinsdragonis , this is my first time contributing to open source code as well. I've gone through the code of conduct and contributing files. |
Sure, I'm happy to collaborate with anyone willing to help 😄 |
I added a first draft of the new code to my fork at https://github.com/Pritjam/carbon-lang. @vinsdragonis , could you take a look and see what you think/make changes? Specifically, I added code to string_helpers.cpp (in the UnescapeStringLiteral switch-case block) and a test in string_helpers_test.cpp. |
@Pritjam, sure I will have a look at this. |
Support \u in string literals as per issue carbon-language#1982
I have drafted test cases for the same. Unfortunately, this doesn't seem to recognize the escape sequence yet since we haven't parsed the string to get those literals. While that didn't work, the code for the escape sequence seems logically correct to me, though I will need to confirm the same. |
I think the reason the escape sequences in the new test case aren't being recognized is that they aren't formatted quite correctly. Looking at the docs for string literals (docs/design/lexical_conventions/string_literals.md), Unicode code points are specified with \u{HHHH...}, and the new test case omits these brackets. As for the ParseBlockStringLiteral(), I don't think we need to extract the literals there, since that function calls UnescapeStringLiteral, so we just have to implement the Unicode literal parsing logic in UnescapeStringLiteral. This also makes sense because that's where the old code had a CARBON_FATAL() to show that Unicode wasn't implemented yet. |
So I think we have 3 objectives we're trying to accomplish here:
I think with what we have right now, we accomplish the first two objectives. Next, you should add a test case or some other way of testing the functionality (which you pretty much have, we just need to get your test case working) and then we can go ahead and make a pull request of the entire commit chain. What do you think of that? |
Yeah, it seems good to me. As for the test case, I have rectified it by sticking to the format \u{HHHH}. Do check it out and let me know if it seems right. |
I saw the change you made in the unicode.carbon test. It's almost correct! The one other change is, the string you're comparing against (on line 15 of that file) is still the old string, "str". I'll go ahead and fix that real quick, and then I think we should be good to make a PR. Also, do you know how to run the tests? It took me a bit to figure out as well, but it's super useful to be able to iteratively test your code. To run the unicode.carbon test, you have to run To run the |
Alright, the code is good to go. We should be able to make a PR now. Do you want to go ahead and make that? |
Sure, I'll do that |
I think #2027 resolves this. |
While writing doctests for
string_literals.md
, this came up:Input:
Output:
The text was updated successfully, but these errors were encountered: