Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] intermittent orc test_read_round_trip failed due to /tmp/hive location #6146

Closed
pxLi opened this issue Jul 29, 2022 · 3 comments · Fixed by #6726
Closed

[BUG] intermittent orc test_read_round_trip failed due to /tmp/hive location #6146

pxLi opened this issue Jul 29, 2022 · 3 comments · Fixed by #6726
Labels
bug Something isn't working test Only impacts tests

Comments

@pxLi
Copy link
Collaborator

pxLi commented Jul 29, 2022

Describe the bug
We saw this intermittently in a few pre-merge builds, this is not always reproducible.

I guess its due to parallel test running and some bad timing. Not sure which case might introduced the side effect here

The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxr-xr-x

[2022-07-28T23:58:46.386Z] �[31m�[1m_ test_read_round_trip[-{'spark.rapids.sql.format.orc.reader.type': 'PERFILE'}-read_orc_sql-[Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Decimal(7,3)],['child11', Decimal(12,2)],['child12', Decimal(20,2)]), Struct(['child0', Byte],['child1', Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Decimal(7,3)],['child11', Decimal(12,2)],['child12', Decimal(20,2)])]), Struct(['child0', Array(Short)],['child1', Double])]] _�[0m
[2022-07-28T23:58:46.386Z] [gw1] linux -- Python 3.8.13 /usr/bin/python
[2022-07-28T23:58:46.386Z] 
[2022-07-28T23:58:46.386Z] spark_tmp_path = '/tmp/pyspark_tests//premerge-ci-2-jenkins-rapids-premerge-github-5244-s6dc8-qcr8c-gw1-21626-1785008513/'
[2022-07-28T23:58:46.386Z] orc_gens = [Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],[...al(7,3)],['child11', Decimal(12,2)],['child12', Decimal(20,2)])]), Struct(['child0', Array(Short)],['child1', Double])]
[2022-07-28T23:58:46.386Z] read_func = <function read_orc_sql at 0x7f63e2e605e0>
[2022-07-28T23:58:46.386Z] reader_confs = {'spark.rapids.sql.format.orc.reader.type': 'PERFILE'}
[2022-07-28T23:58:46.386Z] v1_enabled_list = ''
[2022-07-28T23:58:46.386Z] 
[2022-07-28T23:58:46.386Z]     @pytest.mark.order(2)
[2022-07-28T23:58:46.386Z]     @pytest.mark.parametrize('orc_gens', orc_gens_list, ids=idfn)
[2022-07-28T23:58:46.386Z]     @pytest.mark.parametrize('read_func', [read_orc_df, read_orc_sql])
[2022-07-28T23:58:46.386Z]     @pytest.mark.parametrize('reader_confs', reader_opt_confs, ids=idfn)
[2022-07-28T23:58:46.386Z]     @pytest.mark.parametrize('v1_enabled_list', ["", "orc"])
[2022-07-28T23:58:46.386Z]     def test_read_round_trip(spark_tmp_path, orc_gens, read_func, reader_confs, v1_enabled_list):
[2022-07-28T23:58:46.386Z]         gen_list = [('_c' + str(i), gen) for i, gen in enumerate(orc_gens)]
[2022-07-28T23:58:46.386Z]         data_path = spark_tmp_path + '/ORC_DATA'
[2022-07-28T23:58:46.386Z]         with_cpu_session(
[2022-07-28T23:58:46.386Z]                 lambda spark : gen_df(spark, gen_list).write.orc(data_path))
[2022-07-28T23:58:46.386Z]         all_confs = copy_and_update(reader_confs, {'spark.sql.sources.useV1SourceList': v1_enabled_list})
[2022-07-28T23:58:46.386Z] >       assert_gpu_and_cpu_are_equal_collect(
[2022-07-28T23:58:46.386Z]                 read_func(data_path),
[2022-07-28T23:58:46.386Z]                 conf=all_confs)
[2022-07-28T23:58:46.386Z] 
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/orc_test.py�[0m:142: 
[2022-07-28T23:58:46.386Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/asserts.py�[0m:508: in assert_gpu_and_cpu_are_equal_collect
[2022-07-28T23:58:46.386Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/asserts.py�[0m:427: in _assert_gpu_and_cpu_are_equal
[2022-07-28T23:58:46.386Z]     run_on_cpu()
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/asserts.py�[0m:413: in run_on_cpu
[2022-07-28T23:58:46.386Z]     from_cpu = with_cpu_session(bring_back, conf=conf)
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:115: in with_cpu_session
[2022-07-28T23:58:46.386Z]     return with_spark_session(func, conf=copy)
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:99: in with_spark_session
[2022-07-28T23:58:46.386Z]     ret = func(_spark)
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/asserts.py�[0m:201: in <lambda>
[2022-07-28T23:58:46.387Z]     bring_back = lambda spark: limit_func(spark).collect()
[2022-07-28T23:58:46.387Z] �[1m�[31m../../src/main/python/orc_test.py�[0m:31: in <lambda>
[2022-07-28T23:58:46.387Z]     return lambda spark : spark.sql('select * from orc.`{}`'.format(data_path))
[2022-07-28T23:58:46.387Z] �[1m�[31m../../../.download/spark-3.1.1-bin-hadoop3.2/python/pyspark/sql/session.py�[0m:723: in sql
[2022-07-28T23:58:46.387Z]     return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
[2022-07-28T23:58:46.387Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-5244-ci-2/.download/spark-3.1.1-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py�[0m:1304: in __call__
[2022-07-28T23:58:46.387Z]     return_value = get_return_value(
[2022-07-28T23:58:46.387Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-07-28T23:58:46.387Z] 
[2022-07-28T23:58:46.387Z] a = ('xro13438', <py4j.java_gateway.GatewayClient object at 0x7f63bbbf14c0>, 'o62', 'sql')
[2022-07-28T23:58:46.387Z] kw = {}
[2022-07-28T23:58:46.387Z] converted = AnalysisException('java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current per...e.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)\n\t... 74 more\n', JavaObject id=o13439)
[2022-07-28T23:58:46.387Z] 
[2022-07-28T23:58:46.387Z]     def deco(*a, **kw):
[2022-07-28T23:58:46.387Z]         try:
[2022-07-28T23:58:46.387Z]             return f(*a, **kw)
[2022-07-28T23:58:46.387Z]         except py4j.protocol.Py4JJavaError as e:
[2022-07-28T23:58:46.387Z]             converted = convert_exception(e.java_exception)
[2022-07-28T23:58:46.387Z]             if not isinstance(converted, UnknownException):
[2022-07-28T23:58:46.387Z]                 # Hide where the exception came from that shows a non-Pythonic
[2022-07-28T23:58:46.387Z]                 # JVM exception message.
[2022-07-28T23:58:46.387Z] >               raise converted from None
[2022-07-28T23:58:46.387Z] �[1m�[31mE               pyspark.sql.utils.AnalysisException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxr-xr-x�[0m
[2022-07-28T23:58:46.387Z] 
[2022-07-28T23:58:46.387Z] �[1m�[31m../../../.download/spark-3.1.1-bin-hadoop3.2/python/pyspark/sql/utils.py�[0m:117: AnalysisException
@pxLi
Copy link
Collaborator Author

pxLi commented Jul 29, 2022

this was firstly reported in one of #5941 premerge builds

@sameerz sameerz added ? - Needs Triage Need team to review and classify and removed ? - Needs Triage Need team to review and classify labels Jul 29, 2022
@pxLi
Copy link
Collaborator Author

pxLi commented Aug 15, 2022

Closing this as not showing anymore in recent days. Please reopen if see this again

@pxLi pxLi closed this as completed Aug 15, 2022
@gerashegalov
Copy link
Collaborator

gerashegalov commented Oct 7, 2022

another instance of this issue popped up on CI

@gerashegalov gerashegalov reopened this Oct 7, 2022
gerashegalov added a commit to gerashegalov/spark-rapids that referenced this issue Oct 7, 2022
Fixes NVIDIA#6146

- Create a place holder for global initialization for pytests
- Add /tmp/hive provisioning. It's a world-writable dir where hive
  creates user-writable dirs `/tmp/hive/$USER`

Signed-off-by: Gera Shegalov <gera@apache.org>
@jlowe jlowe linked a pull request Oct 11, 2022 that will close this issue
jlowe pushed a commit that referenced this issue Oct 11, 2022
Fixes #6146

- Create a place holder for global initialization for pytests
- Add /tmp/hive provisioning. It's a world-writable dir where hive
  creates user-writable dirs `/tmp/hive/$USER`

Signed-off-by: Gera Shegalov <gera@apache.org>

Signed-off-by: Gera Shegalov <gera@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test Only impacts tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants