Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IsCSharpIdentifier() and IsCSharpIdentifierPart() methods to Character class #1

Open
NightOwl888 opened this issue Dec 31, 2019 · 0 comments
Labels
good first issue Good for newcomers is:feature pri:low up for grabs This issue is open to be worked on by anyone

Comments

@NightOwl888
Copy link
Owner

NightOwl888 commented Dec 31, 2019

Similar methods were part of the JDK implementation. The rules for how to implement these methods for C# are documented here.

The spot reserved for them to match the Apache Harmony's implementation's order is here.

See this usage example for a real-world perspective of how these methods work together to detect a valid class name.

NOTES:

  • We also should have overloads for IsJavaIdentifier and IsJavaIdentifierStart, since this library is a bridge between Java and .NET and the original implementations might come in handy.
  • Prefer the implementation style of the Apache Harmony Character class is 10 years old, so we should take a look at the current JavaDocs to ensure the implementation is up to date.

Note there is also a port of the Java identifier code to C# in Spatial4n which might come in useful for working out some of the more complex rules, but we should prefer the implementation style of Apache Harmony and avoid the Regex class, if possible. The Regex class documentation might come in handy for some clues about how to handle certain character classes, see Character classes in regular expressions.

Also, the Spatial4n implementation has some shortcomings:

  • I suspect it actually detects Java identifiers rather than C# identifiers because of the link over to the Javadoc, which might not be the right choice for Spatial4n
  • The Unicode support is broken - it is ignoring characters outside of the range c < 0x00d800 && c > 0x00dfff

In the latter case, the code point would have to be converted to a surrogate pair before passed into CharUnicodeInfo.GetUnicodeCategory() as was done in other methods of the Character class, such as GetType(). However, since the Apache Harmony implementation uses the GetType() method directly, using that example will avoid this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers is:feature pri:low up for grabs This issue is open to be worked on by anyone
Projects
None yet
Development

No branches or pull requests

1 participant