Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix status line display of the executed SQL command #432

Merged
merged 1 commit into from
Mar 18, 2024

Conversation

amotl
Copy link
Member

@amotl amotl commented Mar 16, 2024

Problem

When prefixing an SQL statement with an SQL comment (both per-line -- and block /* */), crash mistreats that comment as an SQL command when printing its status line.

image

Full Example:

$ echo '/*foo*/ select 1' | crash
CONNECT OK
+---+
| 1 |
+---+
| 1 |
+---+
FOO 1 row in set (0.001 sec)

Solution

Because crash already uses the sqlparse SQL parser, it wasn't too difficult to re-use its tokenizer already for extracting the proper canonical SQL command from the SQL expression.

$ echo '/*foo*/ select 1' | crash
CONNECT OK
+---+
| 1 |
+---+
| 1 |
+---+
SELECT 1 row in set (0.003 sec)

Review FYI

The fundamental change is very slim, but the patch grew a bit in size because software tests and corresponding mocking techniques needed to be adjusted a bit for better testability, and for backward-compatibility reasons with existing function interfaces (str vs. sql.Statement).

At a few of those spots, the diff also looks a bit strange, but the changes should be sound. Please take it into consideration when reviewing. When in doubt, look at the new code, not the diff. Thanks!

Details

sqlparse.sql.Statement.token_first() provides the skip_cm option already:

if skip_cm is True (default: False), comments are ignored too.

Using statement.get_type() would be even more DWIM:

The returned value is a string holding an upper-cased reprint of
the first DML or DDL keyword. If the first token in this group
isn't a DML or DDL keyword "UNKNOWN" is returned.

Whitespaces and comments at the beginning of the statement
are ignored.

However, that would introduce a regression that the outcome of parsing non-DML/DDL keywords like BEGIN would yield UNKNOWN, so we decided against using it for now. If you think it would be the better option, please advise correspondingly.

/cc @surister, @seut, @mfussenegger

@amotl amotl requested a review from matriv March 16, 2024 15:41
@amotl amotl force-pushed the amo/fix-statusline-command branch 3 times, most recently from a9dbd7b to d1c7950 Compare March 16, 2024 17:00
@amotl amotl marked this pull request as ready for review March 16, 2024 17:14
Comment on lines -487 to +489
return re.findall(r'[\w]+', statement)[0].upper()
statement = to_statement(expression)
return str(statement.token_first(skip_ws=True, skip_cm=True)).upper()
Copy link
Member Author

@amotl amotl Mar 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the spot where the previous naive way of finding the canonical SQL command of an expression based on regular expressions was replaced by using a more elaborate method now, completely based on sqlparse's capabilities to skip whitespace and comments on behalf of the tokenized SQL statement.

All the other changes in this patch merely revolve around compensating for this improvement.

Comment on lines -239 to +243
cmd._exec_and_print = MagicMock()
cmd._exec = Mock()
cmd.process_iterable(sql.splitlines())
self.assertListEqual(cmd._exec_and_print.mock_calls, [
self.assertListEqual(cmd._exec.mock_calls, [
Copy link
Member Author

@amotl amotl Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That noise, and subsequent changes, have been needed to move the testing probe to a different spot. The corresponding tests verify that what goes in (user input SQL statement) will yield the correct outcome when being submitted to the database.

Previously, the probe was sitting on _exec_and_print by mocking it and verifying the calls on this mock. Now, that probing moved to the more low-level _exec method. This changes the execution behavior of the test suite, as the _exec_and_print method will now be executed fully 1, unlocking the possibility to test its effects, which hasn't been possible before.

Footnotes

  1. That is the reason why the test suite also needs to supply a mocked cursor now, because the ingredients of _exec_and_print need it.

Copy link
Member

@mfussenegger mfussenegger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks okay to me but I think we should look into replacing sqlparse with a parser generated from the CrateDB antlr grammar, to ensure it supports our SQL dialect.

See https://github.com/antlr/antlr4/blob/4.6/doc/python-target.md

@amotl
Copy link
Member Author

amotl commented Mar 18, 2024

we should look into replacing sqlparse with a parser generated from the CrateDB antlr grammar

Thanks. I've diverted this into GH-433. Please adjust the wording / emphasis where you see applicable.

`sqlparse.sql.Statement.token_first()` provides the `skip_cm` option
already:

  if *skip_cm* is ``True`` (default: ``False``), comments are
  ignored too.

Using `statement.get_type()` would be even more DWIM:

  The returned value is a string holding an upper-cased reprint of
  the first DML or DDL keyword. If the first token in this group
  isn't a DML or DDL keyword "UNKNOWN" is returned.

  Whitespaces and comments at the beginning of the statement
  are ignored.

However, that would introduce a regression that the outcome of
non-DML/DDL keywords like `BEGIN` would yield `UNKNOWN`.
@amotl amotl force-pushed the amo/fix-statusline-command branch from d1c7950 to 13a6485 Compare March 18, 2024 17:14
@amotl amotl merged commit c151975 into master Mar 18, 2024
21 checks passed
@amotl amotl deleted the amo/fix-statusline-command branch March 18, 2024 17:18
@BaurzhanSakhariev
Copy link

Not sure whether it's related but just in case https://github.com/crate/crate-alerts/issues/626

mfussenegger added a commit to crate/crate that referenced this pull request Mar 19, 2024
crate/crash#432 causes some undesired output:

    VALUES (1, 'one'), (2, 'two'), (3, 'three')
    +------+-------+
    | col1 | col2  |
    +------+-------+
    |    1 | one   |
    |    2 | two   |
    |    3 | three |
    +------+-------+
    VALUES (1, 'ONE'), (2, 'TWO'), (3, 'THREE') 3 rows in set (0.004 sec)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants