Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Skip PL/SQL selection directives and add sanity check for inquiry directive size #3654

Merged
merged 1 commit into from
Mar 21, 2023

Conversation

techee
Copy link
Contributor

@techee techee commented Mar 2, 2023

PLSQL selection directives are of the form

$IF boolean_static_expression $THEN
   text
[ $ELSIF boolean_static_expression $THEN
   text
]...
[ $ELSE
   text
]
$END

In addition, boolean_static_expression conditions like

$$MY_CUSTOM_VAR > 3

can appear which confuses the current parser because it
thinks $$ is the start of dollar-quoted string and drops
the rest of the code. In addition, as the double-quoted
string is getting bigger and bigger and tested against
inquiry directives by

lookupCaseKeyword (vStringValue (string), Lang_sql)

it takes huge amount of time to perform the repeated
lookups in the hash table.

This patch skips all the code between $IF and $END because
we can't know which of the branches to take and this is the
easiest way to avoid invalid code. In addition, this patch
also drops $ from all identifiers starting with $$ between
$IF and $END as these may be conditional variables in the
$IF block.


Another issue is that
$$ typed anywhere in the code, e.g. by accident, makes the rest of the
code appear to the parser as a dollar-quoted string which can be thousands
of bytes long. In this case lookupCaseKeyword() is called repeatedly on
this ever increasing string which consumes a lot of time and makes the
parser appear completely unresponsive for large files.

This patch adds a sanity check to perform lookupCaseKeyword() only for
strings of length smaller than 30 (currently the longest inquiry directive
keyword is 21 characters long so there should be some safe extra margin
even for longer keywords if added in the future).

Fixes #3647.

@codecov
Copy link

codecov bot commented Mar 2, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.04 🎉

Comparison is base (6f4a48a) 82.82% compared to head (b8d6a4a) 82.87%.

❗ Current head b8d6a4a differs from pull request most recent head 78a7342. Consider uploading reports for the commit 78a7342 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3654      +/-   ##
==========================================
+ Coverage   82.82%   82.87%   +0.04%     
==========================================
  Files         223      223              
  Lines       54502    54561      +59     
==========================================
+ Hits        45143    45219      +76     
+ Misses       9359     9342      -17     
Impacted Files Coverage Δ
parsers/sql.c 76.29% <100.00%> (+0.58%) ⬆️

... and 6 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@b4n
Copy link
Member

b4n commented Mar 12, 2023

This patch skips all the code between $IF and $END because
we can't know which of the branches to take and this is the
easiest way to avoid invalid code.

Shouldn't it rather take one branch, any branch? I don't know the syntax (and have been too lazy to look it up any closely), but if it's anything similar to e.g. C preprocessor taking no branch is the best way to get invalid syntax. I'd then take the first branch (because it's likely the easiest solution), and if it's not what's wanted, well… at least it gives something.

@techee
Copy link
Contributor Author

techee commented Mar 12, 2023

Shouldn't it rather take one branch, any branch?

Yeah, you are right, I'll have a look at it when I have time.

@techee
Copy link
Contributor Author

techee commented Mar 13, 2023

@b4n Done.

@techee
Copy link
Contributor Author

techee commented Mar 14, 2023

I've created #3664 which might be a better and more universal fix for the original "slow parser" problem as it fixes it at the level of the keyword table so individual parsers don't have to worry about it any more.

@techee
Copy link
Contributor Author

techee commented Mar 14, 2023

I've created #3664 which might be a better and more universal fix for the original "slow parser" problem as it fixes it at the level of the keyword table so individual parsers don't have to worry about it any more.

Just to be clear, it replaces the patch performing the string length check

if ((string->length < 30 && KEYWORD_inquiry_directive == lookupCaseKeyword (vStringValue (string), Lang_sql))

but the PL/SQL selection directive is still worth implementing.

@masatake
Copy link
Member

(I'm catching up now.)

@@ -294,6 +300,7 @@ static const keywordTable SqlKeywordTable [] = {
{ "drop", KEYWORD_drop },
{ "else", KEYWORD_else },
{ "elseif", KEYWORD_elseif },
{ "elsif", KEYWORD_elsif },
Copy link
Member

@masatake masatake Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an entry for KEYWORD_elsif to

static struct SqlReservedWord SqlReservedWord [SQLKEYWORD_COUNT] = {
	/*
	 * RESERVED_BIT: MYSQL & POSTGRESQL&SQL2016&SQL2011&SQL92 & ORACLE11g&PLSQL & SQLANYWERE
	 *
	 * {  0  } means we have not inspect whether the keyword is reserved or not.
	 */
        ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for noticing. I'll then squash the commits when you think there's not more work to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please. Could you remove c3b0324 and 9f532d3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

parsers/sql.c Outdated
@@ -797,8 +797,8 @@ static tokenType parseDollarQuote (vString *const string, const int delimiter, i
{
vStringPut (string, c);
if (empty_tag
&& (KEYWORD_inquiry_directive == lookupCaseKeyword (vStringValue (string),
Lang_sql)
&& ((string->length < 30 && KEYWORD_inquiry_directive == lookupCaseKeyword (vStringValue (string),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use '#define' for the 30 with a good name?
Finding a good name is not easy for me, but it may be easy for you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch checking string->length here can be dropped completely now. It is the reason why I created #3664 which is more universal and works for every parser automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

PLSQL selection directives are of the form

$IF boolean_static_expression $THEN
   text
[ $ELSIF boolean_static_expression $THEN
   text
]...
[ $ELSE
   text
]
$END

In addition, boolean_static_expression conditions like

$$MY_CUSTOM_VAR > 3

can appear which confuses the current parser because it
thinks $$ is the start of dollar-quoted string and drops
the rest of the code.

This patch keeps only the $IF branch and skips all the code
between $ELSE or $ELSIF and $END. In addition, this patch
also drops $ from all identifiers starting with $$ behind
$IF and $ELSIF as these may be conditional variables and
not dollar-quoted strings.
@masatake masatake merged commit 1b8c941 into universal-ctags:master Mar 21, 2023
@masatake
Copy link
Member

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: slow parsing of certain SQL files
3 participants