-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrating to package:characters? #80
Comments
Hi Brett, The characters package is great, I've already converted some of my other code to use it. For example, the more package uses it for its configurable printers. I've also tried to use it for the char_matcher in the same package (which is used in similar form in PetitParser), with less success. I found it challenging to implement character-based predicates efficiently, for example character ranges or character sets (whitespaces, letters, digits, lower-case, upper-case, ...). PetitParser needs random access to characters. It currently does it mostly through What I am more concerned about is performance: I expect |
I'm sure we'd love some feedback at https://github.com/dart-lang/characters if you have a chance =) |
Now after more than a year, is there any Unicode support available? We are in 2021 and Unicode support is really required. |
This library is designed to parse over streams of bytes (or UTF-16 characters), and as such it is agnostic to the encoding of your input. For example, the xml library is successfully parsing unicode input without the need for either of the libraries to "understand" the underlying characters. While I understand that out of the box decoding would be desirable, it comes at a hefty cost in performance. So far I haven't seen a compelling need to support it natively. A real-world use-case where the current infrastructure doesn't work would definitely help to motivate investing time into this issue. |
To only slightly hijack this thread, I'm wondering if it's possible to use the higher-level logic provided by the parser to extract values out of an arbitrary jsonDecode. It'd be nice if I could specify a pattern of a json object containing some strings and bools and a json list of further objects, and then the .map action could map that into a Dart object via a constructor. I suspect it's only a matter of coming up with a useful set of primitives, and then using the rest of the mechanics without change, but if there's been any thought about this, I'd be interested. I know the switch stuff makes some of this easier, but it still doesn't feel as satisfying as just writing a composable Parser rule. |
I'm not sure if this fits the bill, but I just discovered that I can't make a To make everything safe for characters above U+FFFF, I think you need to replace every instance of |
As mentioned in #147 (comment) you can create a parser that accepts surrogate pairs. That said, there should probably be helpers that would make this simpler and the |
What I want is to say I suppose I can make a big OptionParser with |
Understood, unfortunately this is currently not possible. Switching to use In the meantime, I suggest you work around with the proposed solution, there is no need for a big final surrogatePair = seq2(
pattern('\uD800-\uDBFF'),
pattern('\uDC00-\uDFFF'),
);
final decodedSurrogatePair = surrogatePair.map2((hi, lo) =>
0x400 * (hi.codeUnitAt(0) - 0xD800) +
(lo.codeUnitAt(0) - 0xDC00) +
0x10000);
final patternInQuestion = decodedSurrogatePair
.where((value) => 0x20000 <= value && value <= 0x2FFFF); |
Thank you for the suggestion.
I agree with this. I had two possibilities in mind:
Do either of these seem compelling to you? |
I like the |
I started looking at this and it's fairly hairy.
It looks to me like a parallel API would be nicer. I wouldn't blame you for not wanting to deal with that. |
Agreed, a boolean positional argument is not desired. In this case probably better to introduce a parallel API ( I can look into some options later today. |
Actually I think some of the difficulty is in going back and forth between ints and strings. Ultimately I am looking to do range comparisons and use e.g. UnicodeCharMatcher from the I'll play around with this and see what I come up with. |
Some initial prototype of how a unicode character parser could look like: #183. In the tests it seems to work well, not sure how to create a nice API yet. |
Thanks for that. I ended up with something similar but more specialized—my only use case allows me to get by with a Parser because I always discard the result. |
Hi Lukas,
I'm curious as to your feelings about the potential for porting Petit Parser to the Characters package?
I'm wondering if the CharacterRange iterator is sufficient for Petit Parser's backtracking requirements. The upside of migrating to the Character package is that Petit Parser would be parsing in terms of Unicode Grapheme clusters instead of characters.
brett
The text was updated successfully, but these errors were encountered: