[BUG] regex_test failed in nightly #6127

pxLi · 2022-07-27T22:51:36Z

Describe the bug

failed cases,

20:41:11  =========================== short test summary info ============================
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_split_re_negative_limit - p...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_split_re_zero_limit - pyspa...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_split_re_one_limit - pyspar...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_split_re_positive_limit - p...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_split_re_no_limit - pyspark...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace - pyspark.sql.ut...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_repetition - pys...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_backrefs - pyspa...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_anchors - pyspar...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_backrefs_idx_out_of_bounds
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_backrefs_escaped
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_escaped - pyspar...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_null - pyspark.s...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_replace - pyspark.sq...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_replace_character_set_negated
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract - pyspark.sq...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract_no_match - p...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract_multiline - ...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract_multiline_negated_character_class
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract_idx_0 - pysp...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_word_boundaries - pyspark.s...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_character_classes - pyspark...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_hexadecimal_digits
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_whitespace - pyspark...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_horizontal_vertical_whitespace
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_linebreak - pyspark....
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_octal_digits - pyspa...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_replace_digit - pysp...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_replace_word - pyspa...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_predefined_character_classes
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_rlike - pyspark.sql.utils.I...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_rlike_embedded_null - pyspa...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_rlike_escape - pyspark.sql....
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_rlike_multi_line - pyspark....
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_rlike_missing_escape - pysp...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract_all_idx_zero
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract_all_idx_positive
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_rlike_unicode_support - pys...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_replace_unicode_support
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_split_unicode_support

most cases failed
pyspark.sql.utils.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec

details log,

[2022-07-21T12:46:03.625Z] �[31m�[1m____________________________ test_regexp_whitespace ____________________________�[0m
[2022-07-21T12:46:03.625Z] 
[2022-07-21T12:46:03.625Z]     def test_regexp_whitespace():
[2022-07-21T12:46:03.625Z]         gen = mk_str_gen('\u001e[abcd]\t\n{1,3} [0-9]\n {1,3}\x0b\t[abcd]\r\f[0-9]{0,10}')
[2022-07-21T12:46:03.625Z] >       assert_gpu_and_cpu_are_equal_collect(
[2022-07-21T12:46:03.625Z]                 lambda spark: unary_op_df(spark, gen).selectExpr(
[2022-07-21T12:46:03.625Z]                     'rlike(a, "\\\\s")',
[2022-07-21T12:46:03.625Z]                     'rlike(a, "\\\\s{3}")',
[2022-07-21T12:46:03.625Z]                     'rlike(a, "[abcd]+\\\\s+[0-9]+")',
[2022-07-21T12:46:03.625Z]                     'rlike(a, "\\\\S{3}")',
[2022-07-21T12:46:03.625Z]                     'rlike(a, "[abcd]+\\\\s+\\\\S{2,3}")',
[2022-07-21T12:46:03.625Z]                     'regexp_extract(a, "([a-d]+)(\\\\s[0-9]+)([a-d]+)", 2)',
[2022-07-21T12:46:03.625Z]                     'regexp_extract(a, "([a-d]+)(\\\\S+)([0-9]+)", 2)',
[2022-07-21T12:46:03.625Z]                     'regexp_extract(a, "([a-d]+)(\\\\S+)([0-9]+)", 3)',
[2022-07-21T12:46:03.625Z]                     'regexp_replace(a, "(\\\\s+)", "@")',
[2022-07-21T12:46:03.625Z]                     'regexp_replace(a, "(\\\\S+)", "#")',
[2022-07-21T12:46:03.625Z]                 ),
[2022-07-21T12:46:03.625Z]             conf=_regexp_conf)
[2022-07-21T12:46:03.625Z] 
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/regexp_test.py�[0m:489: 
[2022-07-21T12:46:03.625Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/asserts.py�[0m:508: in assert_gpu_and_cpu_are_equal_collect
[2022-07-21T12:46:03.625Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/asserts.py�[0m:428: in _assert_gpu_and_cpu_are_equal
[2022-07-21T12:46:03.625Z]     run_on_gpu()
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/asserts.py�[0m:422: in run_on_gpu
[2022-07-21T12:46:03.625Z]     from_gpu = with_gpu_session(bring_back, conf=conf)
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:132: in with_gpu_session
[2022-07-21T12:46:03.625Z]     return with_spark_session(func, conf=copy)
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:99: in with_spark_session
[2022-07-21T12:46:03.625Z]     ret = func(_spark)
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/asserts.py�[0m:201: in <lambda>
[2022-07-21T12:46:03.625Z]     bring_back = lambda spark: limit_func(spark).collect()
[2022-07-21T12:46:03.625Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_integration-dev-github-495-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/dataframe.py�[0m:677: in collect
[2022-07-21T12:46:03.625Z]     sock_info = self._jdf.collectToPython()
[2022-07-21T12:46:03.625Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_integration-dev-github-495-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py�[0m:1304: in __call__
[2022-07-21T12:46:03.625Z]     return_value = get_return_value(
[2022-07-21T12:46:03.625Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-07-21T12:46:03.625Z] 
[2022-07-21T12:46:03.625Z] a = ('xro6355', <py4j.java_gateway.GatewayClient object at 0x7f63cd4fcf70>, 'o6354', 'collectToPython')
[2022-07-21T12:46:03.625Z] kw = {}
[2022-07-21T12:46:03.625Z] converted = IllegalArgumentException('Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec\nProject [...:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:748)\n', None)
[2022-07-21T12:46:03.625Z] 
[2022-07-21T12:46:03.625Z]     def deco(*a, **kw):
[2022-07-21T12:46:03.625Z]         try:
[2022-07-21T12:46:03.625Z]             return f(*a, **kw)
[2022-07-21T12:46:03.625Z]         except py4j.protocol.Py4JJavaError as e:
[2022-07-21T12:46:03.625Z]             converted = convert_exception(e.java_exception)
[2022-07-21T12:46:03.625Z]             if not isinstance(converted, UnknownException):
[2022-07-21T12:46:03.625Z]                 # Hide where the exception came from that shows a non-Pythonic
[2022-07-21T12:46:03.625Z]                 # JVM exception message.
[2022-07-21T12:46:03.625Z] >               raise converted from None
[2022-07-21T12:46:03.625Z] �[1m�[31mE               pyspark.sql.utils.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec�[0m
[2022-07-21T12:46:03.625Z] �[1m�[31mE               Project [a#879 RLIKE \s AS a RLIKE \s#881, a#879 RLIKE \s{3} AS a RLIKE \s{3}#882, a#879 RLIKE [abcd]+\s+[0-9]+ AS a RLIKE [abcd]+\s+[0-9]+#883, a#879 RLIKE \S{3} AS a RLIKE \S{3}#884, a#879 RLIKE [abcd]+\s+\S{2,3} AS a RLIKE [abcd]+\s+\S{2,3}#885, regexp_extract(a#879, ([a-d]+)(\s[0-9]+)([a-d]+), 2) AS regexp_extract(a, ([a-d]+)(\s[0-9]+)([a-d]+), 2)#886, regexp_extract(a#879, ([a-d]+)(\S+)([0-9]+), 2) AS regexp_extract(a, ([a-d]+)(\S+)([0-9]+), 2)#887, regexp_extract(a#879, ([a-d]+)(\S+)([0-9]+), 3) AS regexp_extract(a, ([a-d]+)(\S+)([0-9]+), 3)#888, regexp_replace(a#879, (\s+), @, 1) AS regexp_replace(a, (\s+), @, 1)#889, regexp_replace(a#879, (\S+), #, 1) AS regexp_replace(a, (\S+), #, 1)#890]�[0m
[2022-07-21T12:46:03.625Z] �[1m�[31mE               +- Scan ExistingRDD[a#879]�[0m
[2022-07-21T12:46:03.625Z] 
[2022-07-21T12:46:03.625Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_integration-dev-github-495-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py�[0m:117: IllegalArgumentException
[2022-07-21T12:46:03.625Z] ----------------------------- Captured stdout call -----------------------------
[2022-07-21T12:46:03.625Z] ### CPU RUN ###
[2022-07-21T12:46:03.625Z] ### GPU RUN ###
[2022-07-21T12:46:03.625Z] �[31m�[1m__________________ test_regexp_horizontal_vertical_whitespace __________________�[0m
[2022-07-21T12:46:03.626Z] 
[2022-07-21T12:46:03.626Z]     def test_regexp_horizontal_vertical_whitespace():
[2022-07-21T12:46:03.626Z]         gen = mk_str_gen(
[2022-07-21T12:46:03.626Z]             '''\xA0\u1680\u180e[abcd]\t\n{1,3} [0-9]\n {1,3}\x0b\t[abcd]\r\f[0-9]{0,10}
[2022-07-21T12:46:03.626Z]                 [\u2001-\u200a]{1,3}\u202f\u205f\u3000\x85\u2028\u2029
[2022-07-21T12:46:03.626Z]             ''')
[2022-07-21T12:46:03.626Z] >       assert_gpu_and_cpu_are_equal_collect(
[2022-07-21T12:46:03.626Z]                 lambda spark: unary_op_df(spark, gen).selectExpr(
[2022-07-21T12:46:03.626Z]                     'rlike(a, "\\\\h{2}")',
[2022-07-21T12:46:03.626Z]                     'rlike(a, "\\\\v{3}")',
[2022-07-21T12:46:03.626Z]                     'rlike(a, "[abcd]+\\\\h+[0-9]+")',
[2022-07-21T12:46:03.626Z]                     'rlike(a, "[abcd]+\\\\v+[0-9]+")',
[2022-07-21T12:46:03.626Z]                     'rlike(a, "\\\\H")',
[2022-07-21T12:46:03.626Z]                     'rlike(a, "\\\\V")',
[2022-07-21T12:46:03.626Z]                     'rlike(a, "[abcd]+\\\\h+\\\\V{2,3}")',
[2022-07-21T12:46:03.626Z]                     'regexp_extract(a, "([a-d]+)([0-9]+\\\\v)([a-d]+)", 2)',
[2022-07-21T12:46:03.626Z]                     'regexp_extract(a, "([a-d]+)(\\\\H+)([0-9]+)", 2)',
[2022-07-21T12:46:03.626Z]                     'regexp_extract(a, "([a-d]+)(\\\\V+)([0-9]+)", 3)',
[2022-07-21T12:46:03.626Z]                     'regexp_replace(a, "(\\\\v+)", "@")',
[2022-07-21T12:46:03.626Z]                     'regexp_replace(a, "(\\\\H+)", "#")',
[2022-07-21T12:46:03.626Z]                 ),
[2022-07-21T12:46:03.626Z]             conf=_regexp_conf)
[2022-07-21T12:46:03.626Z] 
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/regexp_test.py�[0m:509: 
[2022-07-21T12:46:03.626Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/asserts.py�[0m:508: in assert_gpu_and_cpu_are_equal_collect
[2022-07-21T12:46:03.626Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/asserts.py�[0m:428: in _assert_gpu_and_cpu_are_equal
[2022-07-21T12:46:03.626Z]     run_on_gpu()
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/asserts.py�[0m:422: in run_on_gpu
[2022-07-21T12:46:03.626Z]     from_gpu = with_gpu_session(bring_back, conf=conf)
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:132: in with_gpu_session
[2022-07-21T12:46:03.626Z]     return with_spark_session(func, conf=copy)
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:99: in with_spark_session
[2022-07-21T12:46:03.626Z]     ret = func(_spark)
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/asserts.py�[0m:201: in <lambda>
[2022-07-21T12:46:03.626Z]     bring_back = lambda spark: limit_func(spark).collect()
[2022-07-21T12:46:03.626Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_integration-dev-github-495-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/dataframe.py�[0m:677: in collect
[2022-07-21T12:46:03.626Z]     sock_info = self._jdf.collectToPython()
[2022-07-21T12:46:03.626Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_integration-dev-github-495-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py�[0m:1304: in __call__

related to #6041

The text was updated successfully, but these errors were encountered:

pxLi · 2022-07-27T22:52:51Z

~~seems there is still some locale check issue~~
my mistake, this is unrelated issue

pxLi added bug Something isn't working test Only impacts tests labels Jul 27, 2022

pxLi closed this as completed Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] regex_test failed in nightly #6127

[BUG] regex_test failed in nightly #6127

pxLi commented Jul 27, 2022

pxLi commented Jul 27, 2022 •

edited

Loading

[BUG] regex_test failed in nightly #6127

[BUG] regex_test failed in nightly #6127

Comments

pxLi commented Jul 27, 2022

pxLi commented Jul 27, 2022 • edited Loading

pxLi commented Jul 27, 2022 •

edited

Loading