Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve check for UTF-8 in integration tests by testing from the JVM #6041

Merged
merged 6 commits into from
Jul 21, 2022

Conversation

NVnavkumar
Copy link
Collaborator

Fix for #6028.

This moves the check for UTF-8 to using the underlying JVM from the spark context instead of checking from the Python locale. Also checks are added to handle specific regular expression integration tests to ensure that UTF-8 is available to ensure the test can run properly.

Signed-off-by: Navin Kumar navink@nvidia.com

Signed-off-by: Navin Kumar <navink@nvidia.com>
Signed-off-by: Navin Kumar <navink@nvidia.com>
@NVnavkumar
Copy link
Collaborator Author

build

@NVnavkumar NVnavkumar self-assigned this Jul 20, 2022
@NVnavkumar
Copy link
Collaborator Author

build

@@ -194,7 +194,7 @@
("SELECT * FROM test_table WHERE strF LIKE 'Y%'", "* WHERE strF LIKE 'Y%'"),
("SELECT * FROM test_table WHERE strF LIKE '%an' ", "* WHERE strF LIKE '%an'"),
("SELECT REPLACE(strF, 'Yuan', 'Eric') FROM test_table", "REPLACE(strF, 'Yuan', 'Eric')"),
("SELECT REGEXP_REPLACE(strF, 'Yu', 'Eric') FROM test_table", "REGEXP_REPLACE(strF, 'Yu', 'Eric')"),
# ("SELECT REGEXP_REPLACE(strF, 'Yu', 'Eric') FROM test_table", "REGEXP_REPLACE(strF, 'Yu', 'Eric')"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that we do this elsewhere in the file but it would be great to add comments as to why something is commented out. Or even better just delete it and explain in the PR description and commit why

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering what might be the better solution, but wanted to get the PR going. Another option might be to separate out into separate test function in qa_nightly_select_test.py. Wonder if that would make more sense.

Signed-off-by: Navin Kumar <navink@nvidia.com>
Signed-off-by: Navin Kumar <navink@nvidia.com>
@NVnavkumar
Copy link
Collaborator Author

build

@NVnavkumar NVnavkumar marked this pull request as ready for review July 21, 2022 06:20
@pxLi pxLi added bug Something isn't working test Only impacts tests labels Jul 21, 2022
Signed-off-by: Navin Kumar <navink@nvidia.com>
@NVnavkumar
Copy link
Collaborator Author

build

jlowe
jlowe previously approved these changes Jul 21, 2022
Copy link
Contributor

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An unnecessary method and would be nice to clean up the missing EOL at EOF on some files, but not must-fixes from my perspective.

Comment on lines 200 to 201
def is_jvm_charset_not_utf8():
return get_jvm_charset() != 'UTF-8'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is extra code to maintain that seems unnecessary. I'm assuming this was created for readability, but if we want to check for not is_jvm_charset_utf8(), then we would type exactly that:

  if not is_jvm_charset_utf8():
    ...

IMHO this is quite readable and precludes the need to maintain the separate method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up the code a bit based on this feedback in most recent commit

gerashegalov
gerashegalov previously approved these changes Jul 21, 2022
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Navin Kumar <navink@nvidia.com>
@NVnavkumar NVnavkumar dismissed stale reviews from gerashegalov and jlowe via 3c22c59 July 21, 2022 18:04
@NVnavkumar
Copy link
Collaborator Author

build

@NVnavkumar NVnavkumar merged commit 1b27375 into NVIDIA:branch-22.08 Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test Only impacts tests
Projects
None yet
4 participants