-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve check for UTF-8 in integration tests by testing from the JVM #6041
Improve check for UTF-8 in integration tests by testing from the JVM #6041
Conversation
Signed-off-by: Navin Kumar <navink@nvidia.com>
Signed-off-by: Navin Kumar <navink@nvidia.com>
build |
build |
@@ -194,7 +194,7 @@ | |||
("SELECT * FROM test_table WHERE strF LIKE 'Y%'", "* WHERE strF LIKE 'Y%'"), | |||
("SELECT * FROM test_table WHERE strF LIKE '%an' ", "* WHERE strF LIKE '%an'"), | |||
("SELECT REPLACE(strF, 'Yuan', 'Eric') FROM test_table", "REPLACE(strF, 'Yuan', 'Eric')"), | |||
("SELECT REGEXP_REPLACE(strF, 'Yu', 'Eric') FROM test_table", "REGEXP_REPLACE(strF, 'Yu', 'Eric')"), | |||
# ("SELECT REGEXP_REPLACE(strF, 'Yu', 'Eric') FROM test_table", "REGEXP_REPLACE(strF, 'Yu', 'Eric')"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that we do this elsewhere in the file but it would be great to add comments as to why something is commented out. Or even better just delete it and explain in the PR description and commit why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering what might be the better solution, but wanted to get the PR going. Another option might be to separate out into separate test function in qa_nightly_select_test.py
. Wonder if that would make more sense.
Signed-off-by: Navin Kumar <navink@nvidia.com>
build |
Signed-off-by: Navin Kumar <navink@nvidia.com>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An unnecessary method and would be nice to clean up the missing EOL at EOF on some files, but not must-fixes from my perspective.
def is_jvm_charset_not_utf8(): | ||
return get_jvm_charset() != 'UTF-8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is extra code to maintain that seems unnecessary. I'm assuming this was created for readability, but if we want to check for not is_jvm_charset_utf8(), then we would type exactly that:
if not is_jvm_charset_utf8():
...
IMHO this is quite readable and precludes the need to maintain the separate method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaned up the code a bit based on this feedback in most recent commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Navin Kumar <navink@nvidia.com>
build |
Fix for #6028.
This moves the check for UTF-8 to using the underlying JVM from the spark context instead of checking from the Python locale. Also checks are added to handle specific regular expression integration tests to ensure that UTF-8 is available to ensure the test can run properly.
Signed-off-by: Navin Kumar navink@nvidia.com