Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some tool bugfixes #3349

Merged
merged 15 commits into from
Dec 28, 2021
Merged

Some tool bugfixes #3349

merged 15 commits into from
Dec 28, 2021

Conversation

KvanTTT
Copy link
Member

@KvanTTT KvanTTT commented Nov 6, 2021

  • I removed TREE_GRAMMAR : 'tree' WSNLCHARS* 'grammar' ; rule from ANTLR 4 grammar because it was created for back compatibility with ANTLR 3. But at the current moment, it's not actual and causes another bug that is fixed by this PR.
  • INVALID_ESCAPE_SEQUENCE should be an error because the rest charset is undefined if it's a warning and it may cause unwanted behavior. Moreover, it's covered by EMPTY_STRINGS_AND_SETS_NOT_ALLOWED error.
  • EMOJI_MODIFIER: [\p{Grapheme_Cluster_Break=E_Base}] (unsupported emoji modifier) throws INVALID_ESCAPE_SEQUENCE instead of EMPTY_STRINGS_AND_SETS_NOT_ALLOWED
  • Fixed a couple of critical errors related to stack overflow using lexer.
  • I introduced TargetType enum that is used instead of language string, refactored, and simplified targets code.
  • Added EOF_CLOSURE error for preventing stack overflow errors on runtime.

@KvanTTT KvanTTT force-pushed the tool-bugfixes branch 2 times, most recently from b44319d to 9f0ffc0 Compare November 12, 2021 20:16
@KvanTTT KvanTTT changed the title Some tool bugfixes and removing reflection usage from Go.stg Some tool bugfixes Nov 13, 2021
@KvanTTT
Copy link
Member Author

KvanTTT commented Nov 13, 2021

@parrt please review

@ericvergnaud
Copy link
Contributor

ericvergnaud commented Nov 14, 2021 via email

@KvanTTT
Copy link
Member Author

KvanTTT commented Nov 14, 2021

I don't think so because it's an internal enum that is not used from the external environment. To support unofficial target source code should be modified anyway, specifically, yet another enum value should be added together with related target class and template.

Also, I added the test testTargetTypeCorrespondsToTargetClass that checks that every enum value corresponds to target class.

@KvanTTT
Copy link
Member Author

KvanTTT commented Dec 27, 2021

Will switching Target to an enum not break all unofficial targets: Kotlin, Typescript… ?

It looks like I was wrong and at least Kotlin target relies on the current structure, reflection, and contains target files in certain directories: https://github.com/Strumenta/antlr-kotlin/tree/master/antlr-kotlin-target/src/main. It's not clear structure but it works. That's why TargetType (Language) field should be reverted to String field instead of Enum.

@parrt parrt mentioned this pull request Dec 27, 2021
@parrt
Copy link
Member

parrt commented Dec 27, 2021

  • Added EOF_CLOSURE error for preventing stack overflow errors on runtime.

I think you have a separate PR for this. Should we kill the old one?

@KvanTTT
Copy link
Member Author

KvanTTT commented Dec 27, 2021

  • Added EOF_CLOSURE error for preventing stack overflow errors on runtime.

I think you have a separate PR for this. Should we kill the old one?

The fix is included in this pull request, specifically the commit bab5a9e

@parrt
Copy link
Member

parrt commented Dec 27, 2021

Ok. cool. is it ready for a final review, @KvanTTT ? Thanks for all your efforts!

@@ -20,6 +19,11 @@
import java.util.Set;

public class JavaTarget extends Target {
public final static String key = "Java";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should automate this to use the required convention we have that XTarget uses X as key. I should have done this earlier rather than hardcoding.

Copy link
Member Author

@KvanTTT KvanTTT Dec 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like automatization but I don't how to implement it in a better way with Java. The key value is being checked below in the static constructor (as you notice it should be X from XTarget). These constants are used in the following switch: https://github.com/antlr/antlr4/pull/3349/files#diff-0bbbf52a2f6fceb14a08812a3c96f56f6069ba4a8acab5032ac75ca5c1b0c181R24-R39 That's why it should be constant (switch does not accept not constants).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me experiment. Might be another reason to avoid the switch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the switch, some logic is duplicated across several targets, that I'd like to avoid:

class JavaTarget {
    ...
    protected override void appendUnicodeEscapedCodePoint(codePoint) {
        if (Character.isSupplementaryCodePoint(codePoint)) {
			// char is not an 'integral' type, so we have to explicitly convert
			// to int before passing to the %X formatter or else it throws.
			sb.append(String.format("\\u%04X", (int)Character.highSurrogate(codePoint)));
			sb.append(String.format("\\u%04X", (int)Character.lowSurrogate(codePoint)));
		}
		else {
			sb.append(String.format("\\u%04X", codePoint));
		}
    }
}

class JavaScriptTarget {
    ...
    protected override void appendUnicodeEscapedCodePoint(codePoint) {
        if (Character.isSupplementaryCodePoint(codePoint)) {
			// char is not an 'integral' type, so we have to explicitly convert
			// to int before passing to the %X formatter or else it throws.
			sb.append(String.format("\\u%04X", (int)Character.highSurrogate(codePoint)));
			sb.append(String.format("\\u%04X", (int)Character.lowSurrogate(codePoint)));
		}
		else {
			sb.append(String.format("\\u%04X", codePoint));
		}
    }
}

The common code can be extracted into a separate method:

class JavaTarget {
    ...
    protected override void appendUnicodeEscapedCodePoint(codePoint) {
        appendUnicodeEscapedCodePointForJavaAndJavaScript(codePoint);
    }
}

class JavaScriptTarget {
    ...
    protected override void appendUnicodeEscapedCodePoint(codePoint) {
        appendUnicodeEscapedCodePointForJavaAndJavaScript(codePoint);
    }
}

but it's not clear how to name it. appendUnicodeEscapedCodePointForJavaAndJavaScript does not look clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly this removes need to specify "Java" etc... in subclasses:

protected Target(CodeGenerator gen) {
  ...
  String clazz = this.getClass().getSimpleName();
  this.language = clazz.substring(0,clazz.indexOf("Target"));

but can't do string switch. :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it look like to remove refs to virtual method? We would replace it with direct calls to appendEscapedCodePoint().

I am afraid it could break compatibility with other unofficial targets because it will not be possible to override escaping behavior. It's not possible to call an instance method from a static method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe we just add a comment to the virtual method to indicate we are leaving it in for flexibility and backward compatibility with external targets but that target implementations are free to call the static method as a support function if they like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per my other comment made just now, seems like we should use case "Java": here :)

Copy link
Member Author

@KvanTTT KvanTTT Dec 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that these constants are also already used in tests besides switch statement: https://github.com/antlr/antlr4/pull/3349/files#diff-453e4cf66eb5c1ed759da1d5a64049c1307117fc10c10e0b2fc4a2dcec17b9e1R27 Should I also replace JavaTarget.key with Java there? I still think it's more error-prone to use string literals instead of constants, but I accept your point here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, let's converted to a string and simplify this PR. You can tease me later when we decide we need a constant here. haha.

/**
* Escape the Unicode code point appropriately for this language
* and append the escaped value to {@code sb}.
*/
abstract protected void appendUnicodeEscapedCodePoint(int codePoint, StringBuilder sb);
protected void appendUnicodeEscapedCodePoint(int codePoint, StringBuilder sb) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so now we group escaping to a central method but leave ability for target to override? makes sense but then we have a target overriding this and a diff implementation than what is in UnicodeEscapes.appendEscapedCodePoint. Isn't this what virtual method calls were all about? (removing IF/switches?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should either do switch and call directly or use virtual methods but not both.

Copy link
Member Author

@KvanTTT KvanTTT Dec 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is in the static method appendEscapedCodePoint. It's used both in another static method escapeCodePoint and in Target.appendUnicodeEscapedCodePoint. Static escapeCodePoint is used in tests and there is no access to Target in these tests.

@Test
public void latinJavaEscape() {
	checkUnicodeEscape("\\u0061", 0x0061, JavaTarget.key);
}

Target requires CodeGenerator that requires tool and grammar as constructor parameters. Maybe it's possible to use fake grammar, tool in tests for creating fake CodeGenerator and Target but it does not look clear.

That's why I've split methods in such a way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thoughts. yes. hmm...well, this argues for going completely to your switch but then we need an enum or const string.

@parrt
Copy link
Member

parrt commented Dec 28, 2021

@KvanTTT reviewing PR now. it looks like you have deleted:

runtime-testsuite/resources/org/antlr/v4/test/runtime/descriptors/ParserExec/EOFInClosure.txt

is that because you've placed it elsewhere? I added it recently I think because it was missing. can't remember.

@@ -461,7 +461,7 @@ public void testActions(String templates, String actionName, String action, Stri
AnalysisPipeline anal = new AnalysisPipeline(g);
anal.process();

CodeGenerator gen = new CodeGenerator(g);
CodeGenerator gen = CodeGenerator.createCodeGenerator(g);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious why you converted this to a factory method. again I'm looking for gratuitous changes ;)

Copy link
Member Author

@KvanTTT KvanTTT Dec 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it returns null if CodeGenerator can not be created (and throws the error CANNOT_CREATE_TARGET_GENERATOR). Contructor always returns not null value. Also, I removed duplication of some logic in CodeGnerator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to rename to CodeGenerator.create instead of CodeGenerator.createCodeGenerator (a bit redundancy).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. yeah. that makes sense.

@KvanTTT
Copy link
Member Author

KvanTTT commented Dec 28, 2021

is that because you've placed it elsewhere?

Because it's not relevant runtime test now. stat : 'x' ('y' | EOF)*?; now throws error EOF_CLOSURE on generation step, see https://github.com/antlr/antlr4/pull/3349/files#diff-84648ab3df4fefe24aa050658ef718c13f75b5621e18060d3118460d44c7caceR374 , no need in runtime test.

@parrt
Copy link
Member

parrt commented Dec 28, 2021

awesome. Thanks

sb.append(String.format("\\U%08X", codePoint));
public static void appendEscapedCodePoint(StringBuilder sb, int codePoint, String language) {
switch (language) {
case CSharpTarget.key:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, hang on a second. why isn't this just

switch ( language ) {
    case "Cpp": ...
    case "Java": ...
...
}

We have a requirement that XTarget be language X. That way we can simplify creating the constants by looking at the class name in the target constructor. In other words, Python3Target.key must always be Python3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent future potential mistakes in string literals if these constants will be used something else. For instance, BaseRuntimeTest and other similar classes can use these constants as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general you're following a great principle here which is to use the type system versus strings, but code generators are specifically about flexibility and besides. People are always able to make errors and strings even here:

	public final static String key = "CSharp";

As long as we are going for flexibility, we might as well remove all of the extra code you've added to each target and simplify the switch statement.

literally the only place we use the string is in the switch statement, if intellij is giving me the straight story. I can delete many lines of code from your PR and simplify multiple expressions by going back to a string.

@parrt
Copy link
Member

parrt commented Dec 28, 2021

Ok, this is looking really good @KvanTTT let's merge it after we change the switch statement to use literals. I.e., Python3Target.key is always Python3 so we don't need anything complicated like a constant. after that we can merge!

@KvanTTT
Copy link
Member Author

KvanTTT commented Dec 28, 2021

I.e., Python3Target.key is always Python3

These constants can be used in BaseRuntimeTest because runtime tests also have a lot of duplicated logic that can be merged. And in other places with runtimes where it's more convenient to use switch expression instead of virtual methods. But since such a place is only single now you insist on constants removing for simplicity?

@parrt
Copy link
Member

parrt commented Dec 28, 2021

I think you are combining multiple arguments here. I agreed to your recommendation that we leave both the virtual method and the switch statement. I'm simply trying to remove multiple changes across all targets and make the switch statement simpler. Not sure what you are referring to as to my inconsistency.

@KvanTTT
Copy link
Member Author

KvanTTT commented Dec 28, 2021

Sorry, maybe I picked up the wrong words and it was a bit rude. I just want to add those constants can be used in other classes as well. And you still think it's better to remove them for the current moment?

@parrt
Copy link
Member

parrt commented Dec 28, 2021

Sorry, maybe I picked up the wrong words and it was a bit rude. I just want to add those constants can be used in other classes as well. And you still think it's better to remove them for the current moment?

no worries. yes, I think that since the only place we ever refer to that constant is in the switch statement, it's worth getting rid of it along with the extra code needed to support it.

getLanguage(): protected -> public as it was before

appendUnicodeEscapedCodePoint(int codePoint, StringBuilder sb, boolean escape): protected -> private (it's a new helper method, no need for API now)

Added comment for appendUnicodeEscapedCodePoint
@parrt
Copy link
Member

parrt commented Dec 28, 2021

looking good @KvanTTT are we ready to merge?

@KvanTTT
Copy link
Member Author

KvanTTT commented Dec 28, 2021

Yes. Also, I've restored the API of Target, now it's exactly the same as it was before (except for some methods that are not abstract now).

@parrt
Copy link
Member

parrt commented Dec 28, 2021

good work, thanks!!

@KvanTTT KvanTTT deleted the tool-bugfixes branch December 28, 2021 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants