Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collection_ops_test.py failed on Dataproc-2.1 with: Column 'None' does not exist #9856

Closed
NvTimLiu opened this issue Nov 27, 2023 · 0 comments · Fixed by #9865
Closed

collection_ops_test.py failed on Dataproc-2.1 with: Column 'None' does not exist #9856

NvTimLiu opened this issue Nov 27, 2023 · 0 comments · Fixed by #9865
Assignees
Labels
bug Something isn't working

Comments

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Nov 27, 2023

Describe the bug
collection_ops_test.py failed with: Column 'None' does not exist.

It’s like random failures, could be a non-deterministic test cases (or cudf deps got updated)

Failed at this latest night IT [rapids-it-dataproc-2.1-ubuntu20/#211]

The last but one job did PASS on the collection_ops_test.py(they tested with the same jars): [rapids-it-dataproc-2.1-ubuntu20/#210]

Tried to repro with single test against collection_ops_test.py on Dataproc-2.1, also got PASS

 ----------------------------- Captured stdout call -----------------------------
 ### CPU RUN ###
 ___________________ test_sequence_with_step[Long-Long-Long2] ___________________
 
 start_gen = Long, stop_gen = Long, step_gen = Long
 
     @pytest.mark.parametrize('start_gen,stop_gen,step_gen', sequence_normal_integral_gens, ids=idfn)
     def test_sequence_with_step(start_gen, stop_gen, step_gen):
         # Get the datagen seed we use for all datagens, since we need to call start
         # on step_gen
         data_gen_seed = get_datagen_seed()
         # Get a step scalar from the 'step_gen' which follows the rules.
         step_gen.start(random.Random(data_gen_seed))
         step_lit = step_gen.gen()
 >       assert_gpu_and_cpu_are_equal_collect(
             lambda spark: three_col_df(spark, start_gen, stop_gen, step_gen).selectExpr(
                 "sequence(a, b, c)",
                 "sequence(a, b, {})".format(step_lit),
                 "sequence(a, 20, c)",
                 "sequence(a, 20, {})".format(step_lit),
                 "sequence(20, b, c)",
                 "sequence(20, 20, c)",
                 "sequence(20, b, {})".format(step_lit)))
 
 ../../src/main/python/collection_ops_test.py:268: 
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 ../../src/main/python/asserts.py:581: in assert_gpu_and_cpu_are_equal_collect
     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
 ../../src/main/python/asserts.py:486: in _assert_gpu_and_cpu_are_equal
     from_cpu = run_on_cpu()
 ../../src/main/python/asserts.py:471: in run_on_cpu
     from_cpu = with_cpu_session(bring_back, conf=conf)
 ../../src/main/python/spark_session.py:106: in with_cpu_session
     return with_spark_session(func, conf=copy)
 ../../src/main/python/spark_session.py:90: in with_spark_session
     ret = func(_spark)
 ../../src/main/python/asserts.py:205: in <lambda>
     bring_back = lambda spark: limit_func(spark).collect()
 ../../src/main/python/collection_ops_test.py:269: in <lambda>
     lambda spark: three_col_df(spark, start_gen, stop_gen, step_gen).selectExpr(
 /usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py:2048: in selectExpr
     jdf = self._jdf.selectExpr(self._jseq(expr))
 /usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321: in __call__
     return_value = get_return_value(
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 
 a = ('xro123408', <py4j.clientserver.JavaClient object at 0x7fd77cfb6cb0>, 'o123405', 'selectExpr')
 kw = {}, converted = AnalysisException()
 
     def deco(*a: Any, **kw: Any) -> Any:
         try:
             return f(*a, **kw)
         except Py4JJavaError as e:
             converted = convert_exception(e.java_exception)
             if not isinstance(converted, UnknownException):
                 # Hide where the exception came from that shows a non-Pythonic
                 # JVM exception message.
 >               raise converted from None
 E               pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you mean one of the following? [a, b, c]; line 1 pos 15;
 E               'Project [sequence(a#5214L, b#5215L, Some(c#5216L), Some(UTC)) AS sequence(a, b, c)#5220, unresolvedalias('sequence(a#5214L, b#5215L, 'None), Some(org.apache.spark.sql.Column$$Lambda$2325/0x00000008012da840@398da60b)), sequence(cast(a#5214L as bigint), cast(20 as bigint), Some(cast(c#5216L as bigint)), Some(UTC)) AS sequence(a, 20, c)#5221, unresolvedalias('sequence(a#5214L, 20, 'None), Some(org.apache.spark.sql.Column$$Lambda$2325/0x00000008012da840@398da60b)), sequence(cast(20 as bigint), cast(b#5215L as bigint), Some(cast(c#5216L as bigint)), Some(UTC)) AS sequence(20, b, c)#5222, sequence(cast(20 as bigint), cast(20 as bigint), Some(cast(c#5216L as bigint)), Some(UTC)) AS sequence(20, 20, c)#5223, unresolvedalias('sequence(20, b#5215L, 'None), Some(org.apache.spark.sql.Column$$Lambda$2325/0x00000008012da840@398da60b))]
 E               +- LogicalRDD [a#5214L, b#5215L, c#5216L], false
 
 /usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py:196: AnalysisException

 =========================== short test summary info ============================
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_with_step[Byte-Byte-Byte0][DATAGEN_SEED=1701010504, INJECT_OOM] - pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you ...
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_with_step[Short-Short-Short0][DATAGEN_SEED=1701010504] - pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you ...
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_with_step[Integer-Integer-Integer0][DATAGEN_SEED=1701010504, INJECT_OOM] - pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you ...
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_with_step[Long-Long-Long0][DATAGEN_SEED=1701010504, INJECT_OOM] - pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you ...
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_with_step[Byte-Byte-Byte1][DATAGEN_SEED=1701010504] - pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you ...
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_with_step[Short-Short-Short1][DATAGEN_SEED=1701010504] - pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you ...
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_with_step[Integer-Integer-Integer1][DATAGEN_SEED=1701010504] - pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you ...
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_with_step[Long-Long-Long1][DATAGEN_SEED=1701010504] - pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you ...
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_with_step[Byte-Byte-Byte2][DATAGEN_SEED=1701010504, INJECT_OOM] - pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you ...
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_with_step[Short-Short-Short2][DATAGEN_SEED=1701010504, INJECT_OOM] - pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you ...
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_with_step[Integer-Integer-Integer2][DATAGEN_SEED=1701010504] - pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you ...
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_with_step[Long-Long-Long2][DATAGEN_SEED=1701010504, INJECT_OOM] - pyspark.sql.utils.AnalysisException: Column 'None' does not exist. Did you ...
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_too_long_sequence[Integer][DATAGEN_SEED=1701010504, INJECT_OOM] - Failed: DID NOT RAISE <class 'Exception'>
 FAILED ../../src/main/python/collection_ops_test.py::test_sequence_too_long_sequence[Long][DATAGEN_SEED=1701010504, INJECT_OOM] - Failed: DID NOT RAISE <class 'Exception'>
 = 14 failed, 1264 passed, 326 skipped, 20509 deselected, 48 xfailed, 8 xpassed, 9 warnings in 2464.36s (0:41:04) =
@NvTimLiu NvTimLiu added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 27, 2023
@NvTimLiu NvTimLiu changed the title collection_ops_test.py failed with: Column 'None' does not exist. collection_ops_test.py failed on Dataproc-2.1 with: Column 'None' does not exist Nov 27, 2023
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Nov 27, 2023
@jlowe jlowe closed this as completed Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants