[BUG] intermittent orc test_read_round_trip failed due to /tmp/hive location #6146

pxLi · 2022-07-29T01:07:43Z

Describe the bug
We saw this intermittently in a few pre-merge builds, this is not always reproducible.

I guess its due to parallel test running and some bad timing. Not sure which case might introduced the side effect here

The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxr-xr-x

[2022-07-28T23:58:46.386Z] �[31m�[1m_ test_read_round_trip[-{'spark.rapids.sql.format.orc.reader.type': 'PERFILE'}-read_orc_sql-[Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Decimal(7,3)],['child11', Decimal(12,2)],['child12', Decimal(20,2)]), Struct(['child0', Byte],['child1', Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Decimal(7,3)],['child11', Decimal(12,2)],['child12', Decimal(20,2)])]), Struct(['child0', Array(Short)],['child1', Double])]] _�[0m
[2022-07-28T23:58:46.386Z] [gw1] linux -- Python 3.8.13 /usr/bin/python
[2022-07-28T23:58:46.386Z] 
[2022-07-28T23:58:46.386Z] spark_tmp_path = '/tmp/pyspark_tests//premerge-ci-2-jenkins-rapids-premerge-github-5244-s6dc8-qcr8c-gw1-21626-1785008513/'
[2022-07-28T23:58:46.386Z] orc_gens = [Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],[...al(7,3)],['child11', Decimal(12,2)],['child12', Decimal(20,2)])]), Struct(['child0', Array(Short)],['child1', Double])]
[2022-07-28T23:58:46.386Z] read_func = <function read_orc_sql at 0x7f63e2e605e0>
[2022-07-28T23:58:46.386Z] reader_confs = {'spark.rapids.sql.format.orc.reader.type': 'PERFILE'}
[2022-07-28T23:58:46.386Z] v1_enabled_list = ''
[2022-07-28T23:58:46.386Z] 
[2022-07-28T23:58:46.386Z]     @pytest.mark.order(2)
[2022-07-28T23:58:46.386Z]     @pytest.mark.parametrize('orc_gens', orc_gens_list, ids=idfn)
[2022-07-28T23:58:46.386Z]     @pytest.mark.parametrize('read_func', [read_orc_df, read_orc_sql])
[2022-07-28T23:58:46.386Z]     @pytest.mark.parametrize('reader_confs', reader_opt_confs, ids=idfn)
[2022-07-28T23:58:46.386Z]     @pytest.mark.parametrize('v1_enabled_list', ["", "orc"])
[2022-07-28T23:58:46.386Z]     def test_read_round_trip(spark_tmp_path, orc_gens, read_func, reader_confs, v1_enabled_list):
[2022-07-28T23:58:46.386Z]         gen_list = [('_c' + str(i), gen) for i, gen in enumerate(orc_gens)]
[2022-07-28T23:58:46.386Z]         data_path = spark_tmp_path + '/ORC_DATA'
[2022-07-28T23:58:46.386Z]         with_cpu_session(
[2022-07-28T23:58:46.386Z]                 lambda spark : gen_df(spark, gen_list).write.orc(data_path))
[2022-07-28T23:58:46.386Z]         all_confs = copy_and_update(reader_confs, {'spark.sql.sources.useV1SourceList': v1_enabled_list})
[2022-07-28T23:58:46.386Z] >       assert_gpu_and_cpu_are_equal_collect(
[2022-07-28T23:58:46.386Z]                 read_func(data_path),
[2022-07-28T23:58:46.386Z]                 conf=all_confs)
[2022-07-28T23:58:46.386Z] 
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/orc_test.py�[0m:142: 
[2022-07-28T23:58:46.386Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/asserts.py�[0m:508: in assert_gpu_and_cpu_are_equal_collect
[2022-07-28T23:58:46.386Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/asserts.py�[0m:427: in _assert_gpu_and_cpu_are_equal
[2022-07-28T23:58:46.386Z]     run_on_cpu()
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/asserts.py�[0m:413: in run_on_cpu
[2022-07-28T23:58:46.386Z]     from_cpu = with_cpu_session(bring_back, conf=conf)
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:115: in with_cpu_session
[2022-07-28T23:58:46.386Z]     return with_spark_session(func, conf=copy)
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:99: in with_spark_session
[2022-07-28T23:58:46.386Z]     ret = func(_spark)
[2022-07-28T23:58:46.386Z] �[1m�[31m../../src/main/python/asserts.py�[0m:201: in <lambda>
[2022-07-28T23:58:46.387Z]     bring_back = lambda spark: limit_func(spark).collect()
[2022-07-28T23:58:46.387Z] �[1m�[31m../../src/main/python/orc_test.py�[0m:31: in <lambda>
[2022-07-28T23:58:46.387Z]     return lambda spark : spark.sql('select * from orc.`{}`'.format(data_path))
[2022-07-28T23:58:46.387Z] �[1m�[31m../../../.download/spark-3.1.1-bin-hadoop3.2/python/pyspark/sql/session.py�[0m:723: in sql
[2022-07-28T23:58:46.387Z]     return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
[2022-07-28T23:58:46.387Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-5244-ci-2/.download/spark-3.1.1-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py�[0m:1304: in __call__
[2022-07-28T23:58:46.387Z]     return_value = get_return_value(
[2022-07-28T23:58:46.387Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-07-28T23:58:46.387Z] 
[2022-07-28T23:58:46.387Z] a = ('xro13438', <py4j.java_gateway.GatewayClient object at 0x7f63bbbf14c0>, 'o62', 'sql')
[2022-07-28T23:58:46.387Z] kw = {}
[2022-07-28T23:58:46.387Z] converted = AnalysisException('java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current per...e.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)\n\t... 74 more\n', JavaObject id=o13439)
[2022-07-28T23:58:46.387Z] 
[2022-07-28T23:58:46.387Z]     def deco(*a, **kw):
[2022-07-28T23:58:46.387Z]         try:
[2022-07-28T23:58:46.387Z]             return f(*a, **kw)
[2022-07-28T23:58:46.387Z]         except py4j.protocol.Py4JJavaError as e:
[2022-07-28T23:58:46.387Z]             converted = convert_exception(e.java_exception)
[2022-07-28T23:58:46.387Z]             if not isinstance(converted, UnknownException):
[2022-07-28T23:58:46.387Z]                 # Hide where the exception came from that shows a non-Pythonic
[2022-07-28T23:58:46.387Z]                 # JVM exception message.
[2022-07-28T23:58:46.387Z] >               raise converted from None
[2022-07-28T23:58:46.387Z] �[1m�[31mE               pyspark.sql.utils.AnalysisException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxr-xr-x�[0m
[2022-07-28T23:58:46.387Z] 
[2022-07-28T23:58:46.387Z] �[1m�[31m../../../.download/spark-3.1.1-bin-hadoop3.2/python/pyspark/sql/utils.py�[0m:117: AnalysisException

The text was updated successfully, but these errors were encountered:

pxLi · 2022-07-29T02:17:44Z

this was firstly reported in one of #5941 premerge builds

pxLi · 2022-08-15T07:05:22Z

Closing this as not showing anymore in recent days. Please reopen if see this again

gerashegalov · 2022-10-07T18:47:58Z

another instance of this issue popped up on CI

Fixes NVIDIA#6146 - Create a place holder for global initialization for pytests - Add /tmp/hive provisioning. It's a world-writable dir where hive creates user-writable dirs `/tmp/hive/$USER` Signed-off-by: Gera Shegalov <gera@apache.org>

Fixes #6146 - Create a place holder for global initialization for pytests - Add /tmp/hive provisioning. It's a world-writable dir where hive creates user-writable dirs `/tmp/hive/$USER` Signed-off-by: Gera Shegalov <gera@apache.org> Signed-off-by: Gera Shegalov <gera@apache.org>

pxLi added bug Something isn't working test Only impacts tests labels Jul 29, 2022

This was referenced Jul 29, 2022

Fixes threaded shuffle writer test mocks for spark 3.3.0+ [databricks] #6141

Merged

Fix Alluxio inferring partitions for BooleanType with Hive #6136

Merged

sameerz added ? - Needs Triage Need team to review and classify and removed ? - Needs Triage Need team to review and classify labels Jul 29, 2022

pxLi closed this as completed Aug 15, 2022

gerashegalov reopened this Oct 7, 2022

gerashegalov mentioned this issue Oct 7, 2022

Provision hive scratch dir before test execution #6726

Merged

jlowe linked a pull request Oct 11, 2022 that will close this issue

Provision hive scratch dir before test execution #6726

Merged

jlowe closed this as completed in #6726 Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] intermittent orc test_read_round_trip failed due to /tmp/hive location #6146

[BUG] intermittent orc test_read_round_trip failed due to /tmp/hive location #6146

pxLi commented Jul 29, 2022

pxLi commented Jul 29, 2022 •

edited

Loading

pxLi commented Aug 15, 2022

gerashegalov commented Oct 7, 2022 •

edited

Loading

[BUG] intermittent orc test_read_round_trip failed due to /tmp/hive location #6146

[BUG] intermittent orc test_read_round_trip failed due to /tmp/hive location #6146

Comments

pxLi commented Jul 29, 2022

pxLi commented Jul 29, 2022 • edited Loading

pxLi commented Aug 15, 2022

gerashegalov commented Oct 7, 2022 • edited Loading

pxLi commented Jul 29, 2022 •

edited

Loading

gerashegalov commented Oct 7, 2022 •

edited

Loading