forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync #3
Merged
Merged
sync #3
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…sages with allGather() call" ### What changes were proposed in this pull request? Change from `messages.toList.iterator` to `Iterator.single(messages.toList)`. ### Why are the changes needed? In this test, the expected result of `rdd2.collect().head` should actually be `List("0", "1", "2", "3")` but is `"0"` now. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated test. Thanks WeichenXu123 reported this problem. Closes #28596 from Ngone51/fix_allgather_test. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
### What changes were proposed in this pull request? The function `HistoryServerDiskManager`.`updateAccessTime()` would recompute the application store directory size every time it's triggered, this effort could be avoided because we already computed the new size outside the function call. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing test cases. Closes #28579 from jiangxb1987/updateInfo. Authored-by: Xingbo Jiang <xingbo.jiang@databricks.com> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
…p string ### What changes were proposed in this pull request? This PR allows missing hour fields when parsing date/timestamp string, with 0 as the default value. If the year field is missing, this PR still fail the query by default, but provides a new legacy config to allow it and use 1970 as the default value. It's not a good default value, as it is not a leap year, which means that it would never parse Feb 29. We just pick it for backward compatibility. ### Why are the changes needed? To keep backward compatibility with Spark 2.4. ### Does this PR introduce _any_ user-facing change? Yes. Spark 2.4: ``` scala> sql("select to_timestamp('16', 'dd')").show +------------------------+ |to_timestamp('16', 'dd')| +------------------------+ | 1970-01-16 00:00:00| +------------------------+ scala> sql("select to_date('16', 'dd')").show +-------------------+ |to_date('16', 'dd')| +-------------------+ | 1970-01-16| +-------------------+ scala> sql("select to_timestamp('2019 40', 'yyyy mm')").show +----------------------------------+ |to_timestamp('2019 40', 'yyyy mm')| +----------------------------------+ | 2019-01-01 00:40:00| +----------------------------------+ scala> sql("select to_timestamp('2019 10:10:10', 'yyyy hh:mm:ss')").show +----------------------------------------------+ |to_timestamp('2019 10:10:10', 'yyyy hh:mm:ss')| +----------------------------------------------+ | 2019-01-01 10:10:10| +----------------------------------------------+ ``` in branch 3.0 ``` scala> sql("select to_timestamp('16', 'dd')").show +--------------------+ |to_timestamp(16, dd)| +--------------------+ | null| +--------------------+ scala> sql("select to_date('16', 'dd')").show +---------------+ |to_date(16, dd)| +---------------+ | null| +---------------+ scala> sql("select to_timestamp('2019 40', 'yyyy mm')").show +------------------------------+ |to_timestamp(2019 40, yyyy mm)| +------------------------------+ | 2019-01-01 00:00:00| +------------------------------+ scala> sql("select to_timestamp('2019 10:10:10', 'yyyy hh:mm:ss')").show +------------------------------------------+ |to_timestamp(2019 10:10:10, yyyy hh:mm:ss)| +------------------------------------------+ | 2019-01-01 00:00:00| +------------------------------------------+ ``` After this PR, the behavior becomes the same as 2.4, if the legacy config is enabled. ### How was this patch tested? new tests Closes #28576 from cloud-fan/bug. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…ween Hive and Spark ### What changes were proposed in this pull request? add docs for sql migration-guide ### Why are the changes needed? let user know more about the cast scenarios in which Hive and Spark generate different results ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? no need to test Closes #28605 from GuoPhilipse/spark-docs. Lead-authored-by: GuoPhilipse <guofei_ok@126.com> Co-authored-by: GuoPhilipse <46367746+GuoPhilipse@users.noreply.github.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request? This PR mainly adds two things. 1. Real headless browser support for UI test 2. A test suite using headless Chrome as one instance of those browsers. Also, for environment where Chrome and Chrome driver is not installed, `ChromeUITest` tag is added to filter out the test suite. ### Why are the changes needed? In the current master, there are two problems for UI test. 1. Lots of tests especially JavaScript related ones are done manually. Appearance is better to be confirmed by our eyes but logic should be tested by test cases ideally. 2. Compared to the real web browsers, HtmlUnit doesn't seem to support JavaScript enough. I added a JavaScript related test before for SPARK-31534 using HtmlUnit which is simple library based headless browser for test. The test I added works somehow but some JavaScript related error is shown in unit-tests.log. ``` ======= EXCEPTION START ======== Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException] com.gargoylesoftware.htmlunit.ScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:904) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:835) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:807) at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:52) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:102) at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:426) at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:157) at java.lang.Thread.run(Thread.java:748) Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null|function)". (http://192.168.1.209:60724/static/jquery-3.4.1.min.js#2) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1009) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:800) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:413) at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3264) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$4.doRun(JavaScriptEngine.java:828) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:889) ... 10 more JavaScriptException value = Error: TOOLTIP: Option "sanitizeFn" provided type "window" but expected type "(null|function)". == CALLING JAVASCRIPT == function () { throw e; } ======= EXCEPTION END ======== ``` I tried to upgrade HtmlUnit to 2.40.0 but what is worse, the test become not working even though it works on real browsers like Chrome, Safari and Firefox without error. ``` [info] UISeleniumSuite: [info] - SPARK-31534: text for tooltip should be escaped *** FAILED *** (17 seconds, 745 milliseconds) [info] The code passed to eventually never returned normally. Attempted 2 times over 12.910785232 seconds. Last failure message: com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: Assignment to undefined "regeneratorRuntime" in strict mode (http://192.168.1.209:62132/static/vis-timeline-graph2d.min.js#52(Function)#1) ``` To resolve those problems, it's better to support headless browser for UI test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I tested with following patterns. Both Chrome and Chrome driver should be installed to test. 1. sbt / with chromedriver / include tag (expect to succeed) `build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"` 2. sbt / with chromedriver / exclude tag (expect to be ignored) `build/sbt -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly org.apache.spark.ui.ChromeUISeleniumSuite -l org.apache.spark.tags.ChromeUITest"` 3. sbt / without chromedriver / include tag (expect to be failed) `build/sbt "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"` 4. sbt / without chromedriver / exclude tag (expect to be skipped) `build/sbt -Dtest.exclude.tags=org.apache.spark.tags.ChromeUITest "testOnly org.apache.spark.ui.ChromeUISeleniumSuite"` 5. Maven / wth chromedriver / include tag (expect to succeed) `build/mvn -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test` 6. Maven / with chromedriver / exclude tag (expect to be skipped) `build/mvn -Dtest.exclude.tags="org.apache.spark.tags.ChromeUITest" -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test` 7. Maven / without chromedriver / include tag (expect to be failed) `build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test` 8. Maven / without chromedriver / exclude tag (expect to be skipped) `build/mvn -Dtest.exclude.tags=org.apache.spark.tags.ChromeUITest -Dtest=none -DwildcardSuites=org.apache.spark.ui.ChromeUISeleniumSuite test` Closes #28578 from sarutak/real-headless-browser-support. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Sean Owen <srowen@gmail.com>
…TAMP_MICROS functions ### What changes were proposed in this pull request? Add and register three new functions: `TIMESTAMP_SECONDS`, `TIMESTAMP_MILLIS` and `TIMESTAMP_MICROS` A test is added. Reference: [BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions?hl=en#timestamp_seconds) ### Why are the changes needed? People will have convenient way to get timestamps from seconds,milliseconds and microseconds. ### Does this PR introduce _any_ user-facing change? Yes, people will have the following ways to get timestamp: ```scala sql("select TIMESTAMP_SECONDS(t.a) as timestamp from values(1230219000),(-1230219000) as t(a)").show(false) ``` ``` +-------------------------+ |timestamp | +-------------------------+ |2008-12-25 23:30:00| |1931-01-07 16:30:00| +-------------------------+ ``` ```scala sql("select TIMESTAMP_MILLIS(t.a) as timestamp from values(1230219000123),(-1230219000123) as t(a)").show(false) ``` ``` +-------------------------------+ |timestamp | +-------------------------------+ |2008-12-25 23:30:00.123| |1931-01-07 16:29:59.877| +-------------------------------+ ``` ```scala sql("select TIMESTAMP_MICROS(t.a) as timestamp from values(1230219000123123),(-1230219000123123) as t(a)").show(false) ``` ``` +------------------------------------+ |timestamp | +------------------------------------+ |2008-12-25 23:30:00.123123| |1931-01-07 16:29:59.876877| +------------------------------------+ ``` ### How was this patch tested? Unit test. Closes #28534 from TJX2014/master-SPARK-31710. Authored-by: TJX2014 <xiaoxingstack@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
GuoPhilipse
pushed a commit
that referenced
this pull request
Jun 10, 2020
### What changes were proposed in this pull request? This PR proposes to make `PythonFunction` holds `Seq[Byte]` instead of `Array[Byte]` to be able to compare if the byte array has the same values for the cache manager. ### Why are the changes needed? Currently the cache manager doesn't use the cache for `udf` if the `udf` is created again even if the functions is the same. ```py >>> func = lambda x: x >>> df = spark.range(1) >>> df.select(udf(func)("id")).cache() ``` ```py >>> df.select(udf(func)("id")).explain() == Physical Plan == *(2) Project [pythonUDF0#14 AS <lambda>(id)#12] +- BatchEvalPython [<lambda>(id#0L)], [pythonUDF0#14] +- *(1) Range (0, 1, step=1, splits=12) ``` This is because `PythonFunction` holds `Array[Byte]`, and `equals` method of array equals only when the both array is the same instance. ### Does this PR introduce _any_ user-facing change? Yes, if the user reuse the Python function for the UDF, the cache manager will detect the same function and use the cache for it. ### How was this patch tested? I added a test case and manually. ```py >>> df.select(udf(func)("id")).explain() == Physical Plan == InMemoryTableScan [<lambda>(id)#12] +- InMemoryRelation [<lambda>(id)#12], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(2) Project [pythonUDF0#5 AS <lambda>(id)#3] +- BatchEvalPython [<lambda>(id#0L)], [pythonUDF0#5] +- *(1) Range (0, 1, step=1, splits=12) ``` Closes apache#28774 from ueshin/issues/SPARK-31945/udf_cache. Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
GuoPhilipse
pushed a commit
that referenced
this pull request
Jul 9, 2020
… without WindowExpression ### What changes were proposed in this pull request? Add WindowFunction check at `CheckAnalysis`. ### Why are the changes needed? Provide friendly error msg. **BEFORE** ```scala scala> sql("select rank() from values(1)").show java.lang.UnsupportedOperationException: Cannot generate code for expression: rank() ``` **AFTER** ```scala scala> sql("select rank() from values(1)").show org.apache.spark.sql.AnalysisException: Window function rank() requires an OVER clause.;; Project [rank() AS RANK()#3] +- LocalRelation [col1#2] ``` ### Does this PR introduce _any_ user-facing change? Yes, user wiill be given a better error msg. ### How was this patch tested? Pass the newly added UT. Closes apache#28808 from ulysses-you/SPARK-31975. Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
sync