-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make changes nescessary to build IL asm grammar with Bison. #89704
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsRelated to #4776
|
Would be great if my changes was tested with internal yacc, so while build infra prepare nescessary images and stuff we can test this little experiment |
Unrelated to this change, but do someone know where "Lexical tokens" entries in asmparse.grammar come from? Looks like some of them are not accurate, for example: |
%token <int32> INT32_V /* 3425 0x34FA 0352 */ | ||
%token <int64> INT64_V /* 342534523534534 0x34FA434644554 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would instead recommend INT32_T
and INT64_T
for naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume _T is token here. I use _V for value. but whatether, let maintainers decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2 cents are: I personally like _V
for value. If it were _T
I would expect a token representing the type, not a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"_V" is fine.
src/coreclr/ilasm/asmparse.h
Outdated
@@ -314,6 +314,7 @@ class AsmParse : public ErrorReporter | |||
friend char* nextBlank(_In_ __nullterminated char*); | |||
friend int ProcessEOF(); | |||
friend unsigned __int8* skipType(unsigned __int8* ptr, BOOL fFixupType); | |||
friend unsigned corCountArgs(BinStr* args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary.
Instead, add a static unsigned corCountArgs(BinStr* args);
declaration in grammar_before.cpp.
See 424e9a2
src/coreclr/ilasm/grammar_after.cpp
Outdated
@@ -1557,7 +1557,7 @@ void FixupTyPars(BinStr* pbstype) | |||
FixupTyPars((PCOR_SIGNATURE)(pbstype->ptr()),(ULONG)(pbstype->length())); | |||
} | |||
/**************************************************************************/ | |||
static unsigned corCountArgs(BinStr* args) | |||
unsigned corCountArgs(BinStr* args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary change.
@RIP-webmaster for example how numbers as lexems are parsed is defined here. runtime/src/coreclr/ilasm/grammar_after.cpp Lines 1184 to 1229 in 2422429
Identifier rules runtime/src/coreclr/ilasm/grammar_after.cpp Lines 548 to 558 in 2422429
which is used here runtime/src/coreclr/ilasm/grammar_after.cpp Lines 903 to 918 in 2422429
all of that pretty standard lexer stuff encoded in the code directly. |
@@ -51,6 +51,7 @@ static char* newString(_In_ __nullterminated const char* str1); | |||
static void corEmitInt(BinStr* buff, unsigned data); | |||
static void AppendStringWithLength(BinStr* pbs, _In_ __nullterminated char* sz); | |||
static void AppendFieldToCustomBlob(BinStr* pBlob, _In_ BinStr* pField); | |||
static unsigned corCountArgs(BinStr* args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also append below lines after this line:
extern void yyerror(_In_ __nullterminated const char*);
extern Instr* SetupInstr(unsigned short);
extern int yylex();
See 3bf3d70#diff-45997acd93aca6458c6a5b1ff5ebcc1eb2953ccff2062309d4c8592f366d0384
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify why extern needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's optional, you don't have to add the extern
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
%token <int32> INT32_V /* 3425 0x34FA 0352 */ | ||
%token <int64> INT64_V /* 342534523534534 0x34FA434644554 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to change it if we use byacc (https://invisible-island.net/byacc/byacc.html) instead of bison.
And worth to note by using bison there will be license conflict between GPL and MIT.
So I would recommend to revert this change (the parser generated by byacc doesn't have issue around here) and use byacc instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a lawyer. @richlander , what do you think? This only applies to the parser that BISON generates from the grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverting this can help prevent developers from generating code using bison by accident (because it would just fail), which can make sure we won't have license issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to use byacc due to the licensing personally.
@TIHan please review this community PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks like a very simple change, though I do not know what the implications are with BISON being GPL and what the means for the parsers that are generated.
We need to get some insight with @richlander .
%token <int32> INT32_V /* 3425 0x34FA 0352 */ | ||
%token <int64> INT64_V /* 342534523534534 0x34FA434644554 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"_V" is fine.
%token <int32> INT32_V /* 3425 0x34FA 0352 */ | ||
%token <int64> INT64_V /* 342534523534534 0x34FA434644554 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a lawyer. @richlander , what do you think? This only applies to the parser that BISON generates from the grammar.
@kant2002 , we are going to check to see if the generated parser from BISON will be license compatible. If it isn't, will |
This pull request has been automatically marked |
This pull request will now be closed since it had been marked |
Re-opening, we can use this as long as we don't remove the special exception that BISON generates, very important. |
@kant2002 we are willing to accept this. Can you generate the new parser and make it part of this PR? |
I unlocked the thread, sorry about that. I didn't realize it was locked. |
Let me check. Give me couple days to find time for this one |
No worries and no rush. |
This pull request has been automatically marked |
This pull request will now be closed since it had been marked |
To generate with Bison
If bison is unacceptable due to licensing
Thanks @hez2010 for the hint.
Related to #4776