-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install locales and generate en_US.UTF-8 #5627
Conversation
Signed-off-by: gashen <gashen@nvidia.com>
We may also need to update the integration test docker image. |
build |
jenkins/Dockerfile-blossom.ubuntu
Outdated
# install locale and use UTF-8 | ||
RUN apt-get install -y locales | ||
RUN locale-gen en_US.UTF-8 | ||
ENV LANG en_US.UTF-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the unit test in the Docker container, but what about someone who runs the test outside of this container (i.e.: everybody other than CI) who doesn't happen to have LANG=en_US.UTF-8 setup in their environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a minimum, we should document this requirement for correct behavior for regexp handling of Unicode text. We could also consider falling back to CPU if LANG != en_US.UTF-8
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to make sure this issue isn't considered "fixed" because we finally got the tests to pass. Filed #5629 to track it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated the PR as my comment. It doesn't fix the issue, but prepare the condition to fix it in test scripts.
I would prefer to put this in the https://github.com/NVIDIA/spark-rapids/blob/branch-22.06/integration_tests/run_pyspark_from_build.sh (or the pytest file) directly instead of the dockerfile as its actually part of the test locales installation should be fine to be put inside the dockerfile. And would be nice to doc this requirement under https://github.com/NVIDIA/spark-rapids/blob/branch-22.06/integration_tests/README.md#prerequisities |
Don't set LANG to UTF-8, leave it to the test script Signed-off-by: gashen <gashen@nvidia.com>
build |
Updated the PR to only install locales and generate the UTF-8, leave the LANG be set in test script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @GaryShen2008
@pxLi We also need to set IMHO, if only one case/test requires that change, it would be better to be put in the scala test code directly. |
Related to #5549, but not close it.
Install locales and generate en_US.UTF-8 in ubuntu docker image.
No need to update centos one because there's already en_US.UTF-8.
Signed-off-by: gashen gashen@nvidia.com