forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort listed files by name. Use local files for WAL tests. #19
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
||
before { | ||
pathForTest = hdfsUrl + getRandomString() | ||
tempDir = Files.createTempDir() | ||
dirForTest = "file:///" + tempDir.toString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good! Thats what i had in mind when I had asked to put all common initializations in before - both file and dir initialized here.
tdas
added a commit
that referenced
this pull request
Oct 23, 2014
Sort listed files by name. Use local files for WAL tests.
Thanks man!!! This was awesome!! Just using LocalFileSystem from HDFS. Didnt think that the flush of that will work. |
tdas
pushed a commit
that referenced
this pull request
Oct 24, 2014
As part of the effort to avoid data loss on Spark Streaming driver failure, we want to implement a write ahead log that can write received data to HDFS. This allows the received data to be persist across driver failures. So when the streaming driver is restarted, it can find and reprocess all the data that were received but not processed. This was primarily implemented by @harishreedharan. This is still WIP, as he is going to improve the unitests by using HDFS mini cluster. Author: Hari Shreedharan <hshreedharan@apache.org> Author: Tathagata Das <tathagata.das1565@gmail.com> Closes apache#2882 from tdas/driver-ha-wal and squashes the following commits: e4bee20 [Tathagata Das] Removed synchronized, Path.getFileSystem is threadsafe 55514e2 [Tathagata Das] Minor changes based on PR comments. d29fddd [Tathagata Das] Merge pull request #20 from harishreedharan/driver-ha-wal a317a4d [Hari Shreedharan] Directory deletion should not fail tests 9514dc8 [Tathagata Das] Added unit tests to test reading of corrupted data and other minor edits 3881706 [Tathagata Das] Merge pull request #19 from harishreedharan/driver-ha-wal 4705fff [Hari Shreedharan] Sort listed files by name. Use local files for WAL tests. eb356ca [Tathagata Das] Merge pull request #18 from harishreedharan/driver-ha-wal 82ce56e [Hari Shreedharan] Fix file ordering issue in WALManager tests 5ff90ee [Hari Shreedharan] Fix tests to not ignore ordering and also assert all data is present ef8db09 [Tathagata Das] Merge pull request #17 from harishreedharan/driver-ha-wal 7e40e56 [Hari Shreedharan] Restore old build directory after tests 587b876 [Hari Shreedharan] Fix broken test. Call getFileSystem only from synchronized method. b4be0c1 [Hari Shreedharan] Remove unused method edcbee1 [Hari Shreedharan] Tests reading and writing data using writers now use Minicluster. 5c70d1f [Hari Shreedharan] Remove underlying stream from the WALWriter. 4ab602a [Tathagata Das] Refactored write ahead stuff from streaming.storage to streaming.util b06be2b [Tathagata Das] Adding missing license. 5182ffb [Hari Shreedharan] Added documentation 172358d [Tathagata Das] Pulled WriteAheadLog-related stuff from tdas/spark/tree/driver-ha-working
tdas
pushed a commit
that referenced
this pull request
Jan 21, 2015
…ommands. Adding support for defining schema in foreign DDL commands. Now foreign DDL support commands like: ``` CREATE TEMPORARY TABLE avroTable USING org.apache.spark.sql.avro OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro") ``` With this PR user can define schema instead of infer from file, so support ddl command as follows: ``` CREATE TEMPORARY TABLE avroTable(a int, b string) USING org.apache.spark.sql.avro OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro") ``` Author: scwf <wangfei1@huawei.com> Author: Yin Huai <yhuai@databricks.com> Author: Fei Wang <wangfei1@huawei.com> Author: wangfei <wangfei1@huawei.com> Closes apache#3431 from scwf/ddl and squashes the following commits: 7e79ce5 [Fei Wang] Merge pull request #22 from yhuai/pr3431yin 38f634e [Yin Huai] Remove Option from createRelation. 65e9c73 [Yin Huai] Revert all changes since applying a given schema has not been testd. a852b10 [scwf] remove cleanIdentifier f336a16 [Fei Wang] Merge pull request #21 from yhuai/pr3431yin baf79b5 [Yin Huai] Test special characters quoted by backticks. 50a03b0 [Yin Huai] Use JsonRDD.nullTypeToStringType to convert NullType to StringType. 1eeb769 [Fei Wang] Merge pull request #20 from yhuai/pr3431yin f5c22b0 [Yin Huai] Refactor code and update test cases. f1cffe4 [Yin Huai] Revert "minor refactory" b621c8f [scwf] minor refactory d02547f [scwf] fix HiveCompatibilitySuite test failure 8dfbf7a [scwf] more tests for complex data type ddab984 [Fei Wang] Merge pull request #19 from yhuai/pr3431yin 91ad91b [Yin Huai] Parse data types in DDLParser. cf982d2 [scwf] fixed test failure 445b57b [scwf] address comments 02a662c [scwf] style issue 44eb70c [scwf] fix decimal parser issue 83b6fc3 [scwf] minor fix 9bf12f8 [wangfei] adding test case 7787ec7 [wangfei] added SchemaRelationProvider 0ba70df [wangfei] draft version
tdas
pushed a commit
that referenced
this pull request
Jun 8, 2016
## What changes were proposed in this pull request? This issue add a new optimizer `ReorderAssociativeOperator` by taking advantage of integral associative property. Currently, Spark works like the following. 1) Can optimize `1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + a` into `45 + a`. 2) Cannot optimize `a + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9`. This PR can handle Case 2 for **Add/Multiply** expression whose data types are `ByteType`, `ShortType`, `IntegerType`, and `LongType`. The followings are the plan comparison between `before` and `after` this issue. **Before** ```scala scala> sql("select a+1+2+3+4+5+6+7+8+9 from (select explode(array(1)) a)").explain == Physical Plan == WholeStageCodegen : +- Project [(((((((((a#7 + 1) + 2) + 3) + 4) + 5) + 6) + 7) + 8) + 9) AS (((((((((a + 1) + 2) + 3) + 4) + 5) + 6) + 7) + 8) + 9)#8] : +- INPUT +- Generate explode([1]), false, false, [a#7] +- Scan OneRowRelation[] scala> sql("select a*1*2*3*4*5*6*7*8*9 from (select explode(array(1)) a)").explain == Physical Plan == *Project [(((((((((a#18 * 1) * 2) * 3) * 4) * 5) * 6) * 7) * 8) * 9) AS (((((((((a * 1) * 2) * 3) * 4) * 5) * 6) * 7) * 8) * 9)#19] +- Generate explode([1]), false, false, [a#18] +- Scan OneRowRelation[] ``` **After** ```scala scala> sql("select a+1+2+3+4+5+6+7+8+9 from (select explode(array(1)) a)").explain == Physical Plan == WholeStageCodegen : +- Project [(a#7 + 45) AS (((((((((a + 1) + 2) + 3) + 4) + 5) + 6) + 7) + 8) + 9)#8] : +- INPUT +- Generate explode([1]), false, false, [a#7] +- Scan OneRowRelation[] scala> sql("select a*1*2*3*4*5*6*7*8*9 from (select explode(array(1)) a)").explain == Physical Plan == *Project [(a#18 * 362880) AS (((((((((a * 1) * 2) * 3) * 4) * 5) * 6) * 7) * 8) * 9)#19] +- Generate explode([1]), false, false, [a#18] +- Scan OneRowRelation[] ``` This PR is greatly generalized by cloud-fan 's key ideas; he should be credited for the work he did. ## How was this patch tested? Pass the Jenkins tests including new testsuite. Author: Dongjoon Hyun <dongjoon@apache.org> Closes apache#12850 from dongjoon-hyun/SPARK-15076.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.