Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds validation to allow only flint queries and sql SELECT queries to security lake type datasource #2959

Merged

Conversation

eirsep
Copy link
Member

@eirsep eirsep commented Aug 30, 2024

Description

Adds validation to allow only flint queries and sql select queries to security lake type datasource

Related Issues

Resolves #2907

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@eirsep

This comment was marked as resolved.

@eirsep eirsep marked this pull request as ready for review September 3, 2024 23:25
@eirsep eirsep force-pushed the validate_security_lake_queries branch 2 times, most recently from 88861bf to 6302f62 Compare September 3, 2024 23:44
Comment on lines +97 to +100
logger.error(
String.format(
"Failed to parse sql statement context while validating sql query %s", sqlQuery),
e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add this error log?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for better debuggability if there is a syntax error?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As normal flow go through here, the log level should be info or debug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is an info or debug level log
Imagine a bad sql query we are debugging in a log deep dive. It would be easier to trace it correctly if we throw error logs and user can also keep an eye on them.
this is an error level log because we have caught an exception from a user input or from another system. I strongly feel this should be retained as error level log

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this skips SyntaxCheckException intentionally. @vamsi-amazon any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was little apprehensive to add error logs because..sometimes g4 files are not updated to the latest one from spark repo and this can result in syntax errors for valid queries. So, in case of syntax exceptions, I have decided to forward the query to spark in case of validation errors.

Now since we are clear on which spark version we are going to support, we should take g4 files from that branch and throw validation error in case of exception. what do you think @ykmr1224 ?

ykmr1224
ykmr1224 previously approved these changes Sep 4, 2024
Copy link
Member

@vamsi-amazon vamsi-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If possible, can we add more types of queries in SQLQueryUtils test.

…asource

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
…y class

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
…ests

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
@eirsep eirsep force-pushed the validate_security_lake_queries branch from 6b781d0 to b8fc8c0 Compare September 4, 2024 19:08
Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
@ykmr1224 ykmr1224 merged commit 6c5c685 into opensearch-project:main Sep 4, 2024
15 of 16 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 4, 2024
… security lake type datasource (#2959)

* allows only flint queries and select sql queries to security lake datasource

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

* add sql validator for security lake and refactor validateSparkSqlQuery class

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

* spotless fixes

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

* address review comments.

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

* address comment to extract validate logic into a separate method in tests

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

* add more tests to get more code coverage

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

---------

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
(cherry picked from commit 6c5c685)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 4, 2024
… security lake type datasource (#2959)

* allows only flint queries and select sql queries to security lake datasource

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

* add sql validator for security lake and refactor validateSparkSqlQuery class

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

* spotless fixes

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

* address review comments.

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

* address comment to extract validate logic into a separate method in tests

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

* add more tests to get more code coverage

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

---------

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
(cherry picked from commit 6c5c685)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ykmr1224 pushed a commit that referenced this pull request Sep 5, 2024
… security lake type datasource (#2959) (#2977)

* allows only flint queries and select sql queries to security lake datasource



* add sql validator for security lake and refactor validateSparkSqlQuery class



* spotless fixes



* address review comments.



* address comment to extract validate logic into a separate method in tests



* add more tests to get more code coverage



---------


(cherry picked from commit 6c5c685)

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ykmr1224 pushed a commit that referenced this pull request Sep 5, 2024
… security lake type datasource (#2959) (#2976)

* allows only flint queries and select sql queries to security lake datasource



* add sql validator for security lake and refactor validateSparkSqlQuery class



* spotless fixes



* address review comments.



* address comment to extract validate logic into a separate method in tests



* add more tests to get more code coverage



---------


(cherry picked from commit 6c5c685)

Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit Spark SQL queries to SELECT + FLINT commands when Lake Formation is enabled
3 participants