-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some tool bugfixes #3349
Some tool bugfixes #3349
Conversation
b44319d
to
9f0ffc0
Compare
@parrt please review |
Will switching Target to an enum not break all unofficial targets: Kotlin, Typescript… ?
Envoyé de mon iPhone
… Le 13 nov. 2021 à 20:05, Ivan Kochurkin ***@***.***> a écrit :
All tests are passing.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I don't think so because it's an internal enum that is not used from the external environment. To support unofficial target source code should be modified anyway, specifically, yet another enum value should be added together with related target class and template. Also, I added the test |
dedcb3a
to
9f70d8c
Compare
60a9736
to
c7cf61c
Compare
Remove backward compatibility exception v3TreeGrammarException
… (display \r\n instead of empty line) fixes antlr#2281, antlr#2885 Restored missed test PredFromAltTestedInLoopBack_1
Remove EOFInClosure test since now it's not actual on runtime
It looks like I was wrong and at least Kotlin target relies on the current structure, reflection, and contains target files in certain directories: https://github.com/Strumenta/antlr-kotlin/tree/master/antlr-kotlin-target/src/main. It's not clear structure but it works. That's why |
Create CodeGenerator only one time during grammar processing, refactor code
…on returns correct value
c7cf61c
to
6da860e
Compare
I think you have a separate PR for this. Should we kill the old one? |
The fix is included in this pull request, specifically the commit bab5a9e |
Ok. cool. is it ready for a final review, @KvanTTT ? Thanks for all your efforts! |
@@ -20,6 +19,11 @@ | |||
import java.util.Set; | |||
|
|||
public class JavaTarget extends Target { | |||
public final static String key = "Java"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should automate this to use the required convention we have that XTarget uses X as key. I should have done this earlier rather than hardcoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like automatization but I don't how to implement it in a better way with Java. The key value is being checked below in the static constructor (as you notice it should be X from XTarget). These constants are used in the following switch: https://github.com/antlr/antlr4/pull/3349/files#diff-0bbbf52a2f6fceb14a08812a3c96f56f6069ba4a8acab5032ac75ca5c1b0c181R24-R39 That's why it should be constant (switch does not accept not constants).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me experiment. Might be another reason to avoid the switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the switch, some logic is duplicated across several targets, that I'd like to avoid:
class JavaTarget {
...
protected override void appendUnicodeEscapedCodePoint(codePoint) {
if (Character.isSupplementaryCodePoint(codePoint)) {
// char is not an 'integral' type, so we have to explicitly convert
// to int before passing to the %X formatter or else it throws.
sb.append(String.format("\\u%04X", (int)Character.highSurrogate(codePoint)));
sb.append(String.format("\\u%04X", (int)Character.lowSurrogate(codePoint)));
}
else {
sb.append(String.format("\\u%04X", codePoint));
}
}
}
class JavaScriptTarget {
...
protected override void appendUnicodeEscapedCodePoint(codePoint) {
if (Character.isSupplementaryCodePoint(codePoint)) {
// char is not an 'integral' type, so we have to explicitly convert
// to int before passing to the %X formatter or else it throws.
sb.append(String.format("\\u%04X", (int)Character.highSurrogate(codePoint)));
sb.append(String.format("\\u%04X", (int)Character.lowSurrogate(codePoint)));
}
else {
sb.append(String.format("\\u%04X", codePoint));
}
}
}
The common code can be extracted into a separate method:
class JavaTarget {
...
protected override void appendUnicodeEscapedCodePoint(codePoint) {
appendUnicodeEscapedCodePointForJavaAndJavaScript(codePoint);
}
}
class JavaScriptTarget {
...
protected override void appendUnicodeEscapedCodePoint(codePoint) {
appendUnicodeEscapedCodePointForJavaAndJavaScript(codePoint);
}
}
but it's not clear how to name it. appendUnicodeEscapedCodePointForJavaAndJavaScript
does not look clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly this removes need to specify "Java" etc... in subclasses:
protected Target(CodeGenerator gen) {
...
String clazz = this.getClass().getSimpleName();
this.language = clazz.substring(0,clazz.indexOf("Target"));
but can't do string switch. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it look like to remove refs to virtual method? We would replace it with direct calls to appendEscapedCodePoint().
I am afraid it could break compatibility with other unofficial targets because it will not be possible to override escaping behavior. It's not possible to call an instance method from a static method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, maybe we just add a comment to the virtual method to indicate we are leaving it in for flexibility and backward compatibility with external targets but that target implementations are free to call the static method as a support function if they like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per my other comment made just now, seems like we should use case "Java":
here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot that these constants are also already used in tests besides switch
statement: https://github.com/antlr/antlr4/pull/3349/files#diff-453e4cf66eb5c1ed759da1d5a64049c1307117fc10c10e0b2fc4a2dcec17b9e1R27 Should I also replace JavaTarget.key
with Java
there? I still think it's more error-prone to use string literals instead of constants, but I accept your point here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, let's converted to a string and simplify this PR. You can tease me later when we decide we need a constant here. haha.
/** | ||
* Escape the Unicode code point appropriately for this language | ||
* and append the escaped value to {@code sb}. | ||
*/ | ||
abstract protected void appendUnicodeEscapedCodePoint(int codePoint, StringBuilder sb); | ||
protected void appendUnicodeEscapedCodePoint(int codePoint, StringBuilder sb) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so now we group escaping to a central method but leave ability for target to override? makes sense but then we have a target overriding this and a diff implementation than what is in UnicodeEscapes.appendEscapedCodePoint. Isn't this what virtual method calls were all about? (removing IF/switches?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should either do switch and call directly or use virtual methods but not both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is in the static method appendEscapedCodePoint
. It's used both in another static method escapeCodePoint
and in Target.appendUnicodeEscapedCodePoint
. Static escapeCodePoint
is used in tests and there is no access to Target in these tests.
@Test
public void latinJavaEscape() {
checkUnicodeEscape("\\u0061", 0x0061, JavaTarget.key);
}
Target
requires CodeGenerator
that requires tool and grammar as constructor parameters. Maybe it's possible to use fake grammar, tool in tests for creating fake CodeGenerator
and Target
but it does not look clear.
That's why I've split methods in such a way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thoughts. yes. hmm...well, this argues for going completely to your switch but then we need an enum or const string.
@KvanTTT reviewing PR now. it looks like you have deleted: runtime-testsuite/resources/org/antlr/v4/test/runtime/descriptors/ParserExec/EOFInClosure.txt is that because you've placed it elsewhere? I added it recently I think because it was missing. can't remember. |
@@ -461,7 +461,7 @@ public void testActions(String templates, String actionName, String action, Stri | |||
AnalysisPipeline anal = new AnalysisPipeline(g); | |||
anal.process(); | |||
|
|||
CodeGenerator gen = new CodeGenerator(g); | |||
CodeGenerator gen = CodeGenerator.createCodeGenerator(g); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious why you converted this to a factory method. again I'm looking for gratuitous changes ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it returns null
if CodeGenerator
can not be created (and throws the error CANNOT_CREATE_TARGET_GENERATOR
). Contructor always returns not null value. Also, I removed duplication of some logic in CodeGnerator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to rename to CodeGenerator.create
instead of CodeGenerator.createCodeGenerator
(a bit redundancy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. yeah. that makes sense.
Because it's not relevant runtime test now. |
awesome. Thanks |
sb.append(String.format("\\U%08X", codePoint)); | ||
public static void appendEscapedCodePoint(StringBuilder sb, int codePoint, String language) { | ||
switch (language) { | ||
case CSharpTarget.key: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, hang on a second. why isn't this just
switch ( language ) {
case "Cpp": ...
case "Java": ...
...
}
We have a requirement that XTarget be language X. That way we can simplify creating the constants by looking at the class name in the target constructor. In other words, Python3Target.key
must always be Python3
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To prevent future potential mistakes in string literals if these constants will be used something else. For instance, BaseRuntimeTest
and other similar classes can use these constants as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general you're following a great principle here which is to use the type system versus strings, but code generators are specifically about flexibility and besides. People are always able to make errors and strings even here:
public final static String key = "CSharp";
As long as we are going for flexibility, we might as well remove all of the extra code you've added to each target and simplify the switch statement.
literally the only place we use the string is in the switch statement, if intellij is giving me the straight story. I can delete many lines of code from your PR and simplify multiple expressions by going back to a string.
Ok, this is looking really good @KvanTTT let's merge it after we change the switch statement to use literals. I.e., |
These constants can be used in |
I think you are combining multiple arguments here. I agreed to your recommendation that we leave both the virtual method and the switch statement. I'm simply trying to remove multiple changes across all targets and make the switch statement simpler. Not sure what you are referring to as to my inconsistency. |
Sorry, maybe I picked up the wrong words and it was a bit rude. I just want to add those constants can be used in other classes as well. And you still think it's better to remove them for the current moment? |
no worries. yes, I think that since the only place we ever refer to that constant is in the switch statement, it's worth getting rid of it along with the extra code needed to support it. |
getLanguage(): protected -> public as it was before appendUnicodeEscapedCodePoint(int codePoint, StringBuilder sb, boolean escape): protected -> private (it's a new helper method, no need for API now) Added comment for appendUnicodeEscapedCodePoint
looking good @KvanTTT are we ready to merge? |
Yes. Also, I've restored the API of Target, now it's exactly the same as it was before (except for some methods that are not abstract now). |
good work, thanks!! |
TREE_GRAMMAR : 'tree' WSNLCHARS* 'grammar' ;
rule from ANTLR 4 grammar because it was created for back compatibility with ANTLR 3. But at the current moment, it's not actual and causes another bug that is fixed by this PR.INVALID_ESCAPE_SEQUENCE
should be an error because the rest charset is undefined if it's a warning and it may cause unwanted behavior. Moreover, it's covered byEMPTY_STRINGS_AND_SETS_NOT_ALLOWED
error.EMOJI_MODIFIER: [\p{Grapheme_Cluster_Break=E_Base}]
(unsupported emoji modifier) throwsINVALID_ESCAPE_SEQUENCE
instead ofEMPTY_STRINGS_AND_SETS_NOT_ALLOWED
TargetType
enum that is used instead of language string, refactored, and simplified targets code.EOF_CLOSURE
error for preventing stack overflow errors on runtime.