-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation in the test cases of sql/core and sql/hive #22270
Conversation
Test build #95432 has finished for PR 22270 at commit
|
retest this please |
Test build #95435 has finished for PR 22270 at commit
|
Test build #95457 has finished for PR 22270 at commit
|
import com.github.fommil.netlib.BLAS.{getInstance => blas} | ||
import org.apache.commons.io.FileUtils | ||
import org.apache.commons.io.filefilter.TrueFileFilter | ||
import org.scalatest.BeforeAndAfterEach | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert the blank lines?
withClue("transform should fail when ids exceed integer range. ") { | ||
val model = als.fit(df) | ||
def testTransformIdExceedsIntRange[A : Encoder](dataFrame: DataFrame): Unit = { | ||
withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> "") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only want to disable ConvertToLocalRelation
in sql/core and sql/hive, maybe we can set this sql conf at MLTest
?
I don't see any usage of withSQLConf
in ml tests. It looks a bit weird to see it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya Thanks !! Actually currently i am just making changes to move forward with testing to identify the failures. I will open separate pr for code/test fix. So lets discuss the right way to fix the problem there ? I agree with your suggestion here :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. :)
@@ -59,7 +60,8 @@ object TestHive | |||
.set("spark.sql.warehouse.dir", TestHiveContext.makeWarehouseDir().toURI.getPath) | |||
// SPARK-8910 | |||
.set("spark.ui.enabled", "false") | |||
.set("spark.unsafe.exceptionOnMemoryLeak", "true"))) | |||
.set("spark.unsafe.exceptionOnMemoryLeak", "true") | |||
.set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, ConvertToLocalRelation.ruleName))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya Sure.. i will add.
Test build #95458 has finished for PR 22270 at commit
|
877ad96
to
7953ebd
Compare
Test build #95484 has finished for PR 22270 at commit
|
Test build #95516 has finished for PR 22270 at commit
|
Test build #95529 has finished for PR 22270 at commit
|
s""" | ||
for (int $i = 0; $i < $arr.numElements(); $i ++) { | ||
if ($arr.isNullAt($i)) { | ||
${ev.isNull} = true; | ||
${setIsNullCode} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ${setIsNullCode}
-> $setIsNullCode
btw, you found some bugs when excluding the ConvertToLocalRelation
rule in tests, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please open separate JIRAs and PRs for the codegen fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile OK.. Sean.
} else if (${ctx.genEqual(right.dataType, value, getValue)}) { | ||
${ev.isNull} = false; | ||
${unsetIsNullCode} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu will change
@@ -1730,9 +1730,8 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { | |||
|
|||
test("SPARK-9083: sort with non-deterministic expressions") { | |||
import org.apache.spark.util.random.XORShiftRandom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this import into the top? (this is not related to this pr though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu will do.
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { | |||
} | |||
|
|||
val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") | |||
intercept[RuntimeException] { | |||
intercept[Exception] { | |||
df5.select(map_from_arrays($"k", $"v")).collect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the concrete exception this query throws?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu I will double check and get back. But i think it was SparkException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu We get a SparkException here which in turn wraps a RuntimeException. When we have ConvertToLocalRelation active, we get a RuntimeException from driver. But when we disable it, the error is raised from the executor with a SparkException as the top level exception. Please let me know if my understanding is not correct.
@@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: Expression) | |||
nullSafeCodeGen(ctx, ev, (arr, value) => { | |||
val i = ctx.freshName("i") | |||
val getValue = CodeGenerator.getValue(arr, right.dataType, i) | |||
val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else "" | |||
val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else "" | |||
s""" | |||
for (int $i = 0; $i < $arr.numElements(); $i ++) { | |||
if ($arr.isNullAt($i)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is also not related to this pr though....) when left.dataType.asInstanceOf[ArrayType].containsNull = false
, I think we don't need this condition if ($arr.isNullAt($i)) {
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu You are right !! I will try to optimize this in the other pr i am going to open. please check if you like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many thanks! ya, plz ping me to review ;)
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { | |||
} | |||
|
|||
val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") | |||
intercept[RuntimeException] { | |||
intercept[Exception] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us capture the exception and compare the error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Thanks.. I am checking the message now.
Thanks for looking into this! I think we have to clean up our test framework later (after 2.4). We should identify the test cases that are actually testing the expressions, and run it with/without enabling the local relation optimization, to test both codegen and interpreted code paths. Since the current test suites are a little messy, this will be a lot of work, to reorganize them. I'm looking forward to seeing us accomplish it in Spark 3.0! |
@cloud-fan Thank you Wenchen. Do we want to fix the two codegen compile errors in 2.4 ? One is in ArrayContains and the other is in ArraySort. I will work on re-organizing the suites to test both codegen and non codegen path for spark 3.0. |
@dilipbiswal Let's fix the two functions in 2.4. Could you open a pr to fix the two functions? Thanks! |
@ueshin Thank you.. will do. |
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { | |||
} | |||
|
|||
val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") | |||
intercept[RuntimeException] { | |||
intercept[Exception] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also catch specific exception per https://github.com/databricks/scala-style-guide#testing-intercepting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon We get a SparkException here which in turn wraps a RuntimeException. When we have ConvertToLocalRelation active, we get a RuntimeException from driver. But when we disable it, the error is raised from the executor with a SparkException as the top level exception. Thats the reason i changed it to intercept Exception
so that this test can run both when the rule is active vs when its not.
…/core and sql/hive
@dilipbiswal Any update on this PR? |
@gatorsmile Yeah.. i will push the changes tonight for you to take a look. |
Thank you! |
78cce41
to
507f89c
Compare
Test build #95735 has finished for PR 22270 at commit
|
retest this please |
Test build #95742 has finished for PR 22270 at commit
|
cc @gatorsmile |
LGTM Thanks! Merged to master/2.4 |
…ases of sql/core and sql/hive ## What changes were proposed in this pull request? In SharedSparkSession and TestHive, we need to disable the rule ConvertToLocalRelation for better test case coverage. ## How was this patch tested? Identify the failures after excluding "ConvertToLocalRelation" rule. Closes #22270 from dilipbiswal/SPARK-25267-final. Authored-by: Dilip Biswal <dbiswal@us.ibm.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com> (cherry picked from commit 6d7bc5a) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Thanks a lot @gatorsmile |
val seed = 33 | ||
val df = (1 to 100).map(Tuple1.apply).toDF("i") | ||
val df = (1 to 100).map(Tuple1.apply).toDF("i").repartition(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't follow this thread closely. Why do we need these repartition(1) changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I was just trying get this test case to pass when ConvertToLocalRelation
is enabled as well as disabled. So when this rule is active, i saw that all the calls to random.nextXXX happens in one thread. When this rule is disabled, the random values get evaluated under the project
operator and gets called from multiple threads. Thats why i am repartitioning the data frame to enforce single threaded execution. Is this not the right thing to do ? Please let me know ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, do we still test the local relation conversion, which might be more common to users as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon We are leaving this optimization on for MLTest as of now. Should we open it up for TestHive and keep it disabled it for SharedSparkSession ? cc @gatorsmile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this change It's okay. Was wondering if we actually make the coverage lower for local relation specifically, or if some other tests should be added additionally.
@@ -85,14 +85,16 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { | |||
} | |||
|
|||
val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") | |||
intercept[RuntimeException] { | |||
val msg1 = intercept[Exception] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #22270 (comment)
Didn't we disable the local relation test? Why don't we catch explicit SparkExection
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon Yeah... we could have caught SparkException here. My intention was to have this test case pass both when location relation optimization is on and off. Thats why i changed it a a generic exception along with verifying the error text.
What changes were proposed in this pull request?
In SharedSparkSession and TestHive, we need to disable the rule ConvertToLocalRelation for better test case coverage.
How was this patch tested?
Identify the failures after excluding "ConvertToLocalRelation" rule.