SHS-NG M2: Store FsHistoryProvider listing data in LevelDB. #4

vanzin · 2017-04-17T20:09:31Z

The application listing is still generated from event logs, but is now stored
in LevelDB. No data (except for the internal LevelDB pages) is kept in memory.
The actual app UIs are, as of now, still untouched.

The provider stores things internally using the public REST API types; I believe
this is better going forward since it will make it easier to get rid of the
internal history server API which is mostly redundant at this point.

I also added a finalizer to LevelDBIterator, to make sure that resources are
eventually released. This helps when code iterates but does not exhaust the
iterator, thus not triggering the auto-close code.

This change adds an in-memory implementation of KVStore that can be used by the live UI. The implementation is definitely not optimized, neither for speed nor space, but should be fast enough for using in the listener bus. The change slightly changes the semantics of KVStore views; now, by default, they don't guarantee ordering, which makes it faster to iterate over data in the in-memory store when trimming the number of elements (a feature that will be useful later on). The disk store by default still returns elements in sorted order.

The application listing is still generated from event logs, but is now stored in a KVStore instance. By default an in-memory store is used, but a new config allows setting a local disk path to store the data, in which case a LevelDB store will be created. The provider stores things internally using the public REST API types; I believe this is better going forward since it will make it easier to get rid of the internal history server API which is mostly redundant at this point. I also added a finalizer to LevelDBIterator, to make sure that resources are eventually released. This helps when code iterates but does not exhaust the iterator, thus not triggering the auto-close code. HistoryServerSuite was modified to not re-start the history server unnecessarily; this makes the json validation tests run more quickly.

…pressions ## What changes were proposed in this pull request? This PR changes the direction of expression transformation in the DecimalPrecision rule. Previously, the expressions were transformed down, which led to incorrect result types when decimal expressions had other decimal expressions as their operands. The root cause of this issue was in visiting outer nodes before their children. Consider the example below: ``` val inputSchema = StructType(StructField("col", DecimalType(26, 6)) :: Nil) val sc = spark.sparkContext val rdd = sc.parallelize(1 to 2).map(_ => Row(BigDecimal(12))) val df = spark.createDataFrame(rdd, inputSchema) // Works correctly since no nested decimal expression is involved // Expected result type: (26, 6) * (26, 6) = (38, 12) df.select($"col" * $"col").explain(true) df.select($"col" * $"col").printSchema() // Gives a wrong result since there is a nested decimal expression that should be visited first // Expected result type: ((26, 6) * (26, 6)) * (26, 6) = (38, 12) * (26, 6) = (38, 18) df.select($"col" * $"col" * $"col").explain(true) df.select($"col" * $"col" * $"col").printSchema() ``` The example above gives the following output: ``` // Correct result without sub-expressions == Parsed Logical Plan == 'Project [('col * 'col) AS (col * col)#4] +- LogicalRDD [col#1] == Analyzed Logical Plan == (col * col): decimal(38,12) Project [CheckOverflow((promote_precision(cast(col#1 as decimal(26,6))) * promote_precision(cast(col#1 as decimal(26,6)))), DecimalType(38,12)) AS (col * col)#4] +- LogicalRDD [col#1] == Optimized Logical Plan == Project [CheckOverflow((col#1 * col#1), DecimalType(38,12)) AS (col * col)#4] +- LogicalRDD [col#1] == Physical Plan == *Project [CheckOverflow((col#1 * col#1), DecimalType(38,12)) AS (col * col)#4] +- Scan ExistingRDD[col#1] // Schema root |-- (col * col): decimal(38,12) (nullable = true) // Incorrect result with sub-expressions == Parsed Logical Plan == 'Project [(('col * 'col) * 'col) AS ((col * col) * col)#11] +- LogicalRDD [col#1] == Analyzed Logical Plan == ((col * col) * col): decimal(38,12) Project [CheckOverflow((promote_precision(cast(CheckOverflow((promote_precision(cast(col#1 as decimal(26,6))) * promote_precision(cast(col#1 as decimal(26,6)))), DecimalType(38,12)) as decimal(26,6))) * promote_precision(cast(col#1 as decimal(26,6)))), DecimalType(38,12)) AS ((col * col) * col)#11] +- LogicalRDD [col#1] == Optimized Logical Plan == Project [CheckOverflow((cast(CheckOverflow((col#1 * col#1), DecimalType(38,12)) as decimal(26,6)) * col#1), DecimalType(38,12)) AS ((col * col) * col)#11] +- LogicalRDD [col#1] == Physical Plan == *Project [CheckOverflow((cast(CheckOverflow((col#1 * col#1), DecimalType(38,12)) as decimal(26,6)) * col#1), DecimalType(38,12)) AS ((col * col) * col)#11] +- Scan ExistingRDD[col#1] // Schema root |-- ((col * col) * col): decimal(38,12) (nullable = true) ``` ## How was this patch tested? This PR was tested with available unit tests. Moreover, there are tests to cover previously failing scenarios. Author: aokolnychyi <anton.okolnychyi@sap.com> Closes apache#18583 from aokolnychyi/spark-21332.

…pressions ## What changes were proposed in this pull request? This PR changes the direction of expression transformation in the DecimalPrecision rule. Previously, the expressions were transformed down, which led to incorrect result types when decimal expressions had other decimal expressions as their operands. The root cause of this issue was in visiting outer nodes before their children. Consider the example below: ``` val inputSchema = StructType(StructField("col", DecimalType(26, 6)) :: Nil) val sc = spark.sparkContext val rdd = sc.parallelize(1 to 2).map(_ => Row(BigDecimal(12))) val df = spark.createDataFrame(rdd, inputSchema) // Works correctly since no nested decimal expression is involved // Expected result type: (26, 6) * (26, 6) = (38, 12) df.select($"col" * $"col").explain(true) df.select($"col" * $"col").printSchema() // Gives a wrong result since there is a nested decimal expression that should be visited first // Expected result type: ((26, 6) * (26, 6)) * (26, 6) = (38, 12) * (26, 6) = (38, 18) df.select($"col" * $"col" * $"col").explain(true) df.select($"col" * $"col" * $"col").printSchema() ``` The example above gives the following output: ``` // Correct result without sub-expressions == Parsed Logical Plan == 'Project [('col * 'col) AS (col * col)#4] +- LogicalRDD [col#1] == Analyzed Logical Plan == (col * col): decimal(38,12) Project [CheckOverflow((promote_precision(cast(col#1 as decimal(26,6))) * promote_precision(cast(col#1 as decimal(26,6)))), DecimalType(38,12)) AS (col * col)#4] +- LogicalRDD [col#1] == Optimized Logical Plan == Project [CheckOverflow((col#1 * col#1), DecimalType(38,12)) AS (col * col)#4] +- LogicalRDD [col#1] == Physical Plan == *Project [CheckOverflow((col#1 * col#1), DecimalType(38,12)) AS (col * col)#4] +- Scan ExistingRDD[col#1] // Schema root |-- (col * col): decimal(38,12) (nullable = true) // Incorrect result with sub-expressions == Parsed Logical Plan == 'Project [(('col * 'col) * 'col) AS ((col * col) * col)#11] +- LogicalRDD [col#1] == Analyzed Logical Plan == ((col * col) * col): decimal(38,12) Project [CheckOverflow((promote_precision(cast(CheckOverflow((promote_precision(cast(col#1 as decimal(26,6))) * promote_precision(cast(col#1 as decimal(26,6)))), DecimalType(38,12)) as decimal(26,6))) * promote_precision(cast(col#1 as decimal(26,6)))), DecimalType(38,12)) AS ((col * col) * col)#11] +- LogicalRDD [col#1] == Optimized Logical Plan == Project [CheckOverflow((cast(CheckOverflow((col#1 * col#1), DecimalType(38,12)) as decimal(26,6)) * col#1), DecimalType(38,12)) AS ((col * col) * col)#11] +- LogicalRDD [col#1] == Physical Plan == *Project [CheckOverflow((cast(CheckOverflow((col#1 * col#1), DecimalType(38,12)) as decimal(26,6)) * col#1), DecimalType(38,12)) AS ((col * col) * col)#11] +- Scan ExistingRDD[col#1] // Schema root |-- ((col * col) * col): decimal(38,12) (nullable = true) ``` ## How was this patch tested? This PR was tested with available unit tests. Moreover, there are tests to cover previously failing scenarios. Author: aokolnychyi <anton.okolnychyi@sap.com> Closes apache#18583 from aokolnychyi/spark-21332. (cherry picked from commit 0be5fb4) Signed-off-by: gatorsmile <gatorsmile@gmail.com>

…ver)QueryTestSuite ### What changes were proposed in this pull request? This PR adds 2 changes regarding exception handling in `SQLQueryTestSuite` and `ThriftServerQueryTestSuite` - fixes an expected output sorting issue in `ThriftServerQueryTestSuite` as if there is an exception then there is no need for sort - introduces common exception handling in those 2 suites with a new `handleExceptions` method ### Why are the changes needed? Currently `ThriftServerQueryTestSuite` passes on master, but it fails on one of my PRs (apache#23531) with this error (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111651/testReport/org.apache.spark.sql.hive.thriftserver/ThriftServerQueryTestSuite/sql_3/): ``` org.scalatest.exceptions.TestFailedException: Expected " [Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit org.apache.spark.SparkException] ", but got " [org.apache.spark.SparkException Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit] " Result did not match for query #4 WITH RECURSIVE r(level) AS ( VALUES (0) UNION ALL SELECT level + 1 FROM r ) SELECT * FROM r ``` The unexpected reversed order of expected output (error message comes first, then the exception class) is due to this line: https://github.com/apache/spark/pull/26028/files#diff-b3ea3021602a88056e52bf83d8782de8L146. It should not sort the expected output if there was an error during execution. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes apache#26028 from peter-toth/SPARK-29359-better-exception-handling. Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>

vanzin force-pushed the shs-ng/M2 branch from 9a3c105 to 12080a6 Compare April 17, 2017 21:41

vanzin force-pushed the shs-ng/M2 branch 2 times, most recently from ee7aba3 to 03f1a8e Compare April 26, 2017 18:11

vanzin force-pushed the shs-ng/M1 branch from 11ca222 to f1f599c Compare April 26, 2017 18:11

vanzin force-pushed the shs-ng/M2 branch 2 times, most recently from 464bfc8 to d5c6fdc Compare April 27, 2017 18:14

vanzin force-pushed the shs-ng/M1 branch from d563636 to 5ff50be Compare April 27, 2017 18:14

vanzin force-pushed the shs-ng/M2 branch from d5c6fdc to 7ba4f04 Compare April 27, 2017 21:31

vanzin force-pushed the shs-ng/M1 branch from 5ff50be to 86ea388 Compare April 27, 2017 21:32

vanzin force-pushed the shs-ng/M2 branch from 7ba4f04 to fa9b1a8 Compare May 1, 2017 22:58

vanzin force-pushed the shs-ng/M1 branch from 86ea388 to 98838e6 Compare May 1, 2017 22:58

vanzin force-pushed the shs-ng/M2 branch from fa9b1a8 to 1df5e55 Compare May 5, 2017 21:19

vanzin force-pushed the shs-ng/M1 branch from 98838e6 to b4559a7 Compare May 5, 2017 21:19

vanzin force-pushed the shs-ng/M2 branch from 1df5e55 to a2e9220 Compare May 5, 2017 22:57

vanzin force-pushed the shs-ng/M1 branch from b4559a7 to a77f0cb Compare May 5, 2017 22:57

vanzin force-pushed the shs-ng/M2 branch from a2e9220 to dbb4721 Compare May 8, 2017 17:25

vanzin force-pushed the shs-ng/M1 branch from a77f0cb to 5197c21 Compare May 8, 2017 17:25

vanzin force-pushed the shs-ng/M2 branch 2 times, most recently from 251f642 to 02a5342 Compare May 15, 2017 20:45

vanzin force-pushed the shs-ng/M2 branch from 02a5342 to 4d3e6b6 Compare May 26, 2017 18:53

Marcelo Vanzin added 3 commits May 26, 2017 11:54

SHS-NG M1: small fix in loop condition.

c433065

vanzin force-pushed the shs-ng/M2 branch from 4d3e6b6 to 2fa4f3f Compare May 30, 2017 23:04

vanzin closed this May 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SHS-NG M2: Store FsHistoryProvider listing data in LevelDB. #4

SHS-NG M2: Store FsHistoryProvider listing data in LevelDB. #4

vanzin commented Apr 17, 2017

SHS-NG M2: Store FsHistoryProvider listing data in LevelDB. #4

SHS-NG M2: Store FsHistoryProvider listing data in LevelDB. #4

Conversation

vanzin commented Apr 17, 2017