-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Branch 1.2 #3880
Branch 1.2 #3880
Commits on Nov 21, 2014
-
[SPARK-4522][SQL] Parse schema with missing metadata.
This is just a quick fix for 1.2. SPARK-4523 describes a more complete solution. Author: Michael Armbrust <michael@databricks.com> Closes #3392 from marmbrus/parquetMetadata and squashes the following commits: bcc6626 [Michael Armbrust] Parse schema with missing metadata. (cherry picked from commit 90a6a46) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 668643b - Browse repository at this point
Copy the full SHA 668643bView commit details -
[SPARK-4472][Shell] Print "Spark context available as sc." only when …
…SparkContext is created... ... successfully It's weird that printing "Spark context available as sc" when creating SparkContext unsuccessfully. Author: zsxwing <zsxwing@gmail.com> Closes #3341 from zsxwing/SPARK-4472 and squashes the following commits: 4850093 [zsxwing] Print "Spark context available as sc." only when SparkContext is created successfully (cherry picked from commit f1069b8) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 6f70e02 - Browse repository at this point
Copy the full SHA 6f70e02View commit details -
SPARK-4532: Fix bug in detection of Hive in Spark 1.2
Because the Hive profile is no longer defined in the root pom, we need to check specifically in the sql/hive pom when we perform the check in make-distribtion.sh. Author: Patrick Wendell <pwendell@gmail.com> Closes #3398 from pwendell/make-distribution and squashes the following commits: 8a58279 [Patrick Wendell] Fix bug in detection of Hive in Spark 1.2 (cherry picked from commit a81918c) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 6a01689 - Browse repository at this point
Copy the full SHA 6a01689View commit details -
[SPARK-4531] [MLlib] cache serialized java object
The Pyrolite is pretty slow (comparing to the adhoc serializer in 1.1), it cause much performance regression in 1.2, because we cache the serialized Python object in JVM, deserialize them into Java object in each step. This PR change to cache the deserialized JavaRDD instead of PythonRDD to avoid the deserialization of Pyrolite. It should have similar memory usage as before, but much faster. Author: Davies Liu <davies@databricks.com> Closes #3397 from davies/cache and squashes the following commits: 7f6e6ce [Davies Liu] Update -> Updater 4b52edd [Davies Liu] using named argument 63b984e [Davies Liu] fix 7da0332 [Davies Liu] add unpersist() dff33e1 [Davies Liu] address comments c2bdfc2 [Davies Liu] refactor d572f00 [Davies Liu] Merge branch 'master' into cache f1063e1 [Davies Liu] cache serialized java object (cherry picked from commit ce95bd8) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9309ddf - Browse repository at this point
Copy the full SHA 9309ddfView commit details
Commits on Nov 22, 2014
-
[SPARK-4431][MLlib] Implement efficient foreachActive for dense and s…
…parse vector Previously, we were using Breeze's activeIterator to access the non-zero elements in dense/sparse vector. Due to the overhead, we switched back to native `while loop` in #SPARK-4129. However, #SPARK-4129 requires de-reference the dv.values/sv.values in each access to the value, which is very expensive. Also, in MultivariateOnlineSummarizer, we're using Breeze's dense vector to store the partial stats, and this is very expensive compared with using primitive scala array. In this PR, efficient foreachActive is implemented to unify the code path for dense and sparse vector operation which makes codebase easier to maintain. Breeze dense vector is replaced by primitive array to reduce the overhead further. Benchmarking with mnist8m dataset on single JVM with first 200 samples loaded in memory, and repeating 5000 times. Before change: Sparse Vector - 30.02 Dense Vector - 38.27 With this PR: Sparse Vector - 6.29 Dense Vector - 11.72 Author: DB Tsai <dbtsai@alpinenow.com> Closes #3288 from dbtsai/activeIterator and squashes the following commits: 844b0e6 [DB Tsai] formating 03dd693 [DB Tsai] futher performance tunning. 1907ae1 [DB Tsai] address feedback 98448bb [DB Tsai] Made the override final, and had a local copy of variables which made the accessing a single step operation. c0cbd5a [DB Tsai] fix a bug 6441f92 [DB Tsai] Finished SPARK-4431 (cherry picked from commit b5d17ef) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 4b68cab - Browse repository at this point
Copy the full SHA 4b68cabView commit details
Commits on Nov 24, 2014
-
SPARK-4457. Document how to build for Hadoop versions greater than 2.4
Author: Sandy Ryza <sandy@cloudera.com> Closes #3322 from sryza/sandy-spark-4457 and squashes the following commits: 5e72b77 [Sandy Ryza] Feedback 0cf05c1 [Sandy Ryza] Caveat be8084b [Sandy Ryza] SPARK-4457. Document how to build for Hadoop versions greater than 2.4 (cherry picked from commit 29372b6) Signed-off-by: Thomas Graves <tgraves@apache.org>
Configuration menu - View commit details
-
Copy full SHA for 1a12ca3 - Browse repository at this point
Copy the full SHA 1a12ca3View commit details -
[SPARK-4479][SQL] Avoids unnecessary defensive copies when sort based…
… shuffle is on This PR is a workaround for SPARK-4479. Two changes are introduced: when merge sort is bypassed in `ExternalSorter`, 1. also bypass RDD elements buffering as buffering is the reason that `MutableRow` backed row objects must be copied, and 2. avoids defensive copies in `Exchange` operator <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3422) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3422 from liancheng/avoids-defensive-copies and squashes the following commits: 591f2e9 [Cheng Lian] Passes all shuffle suites 0c3c91e [Cheng Lian] Fixes shuffle write metrics when merge sort is bypassed ed5df3c [Cheng Lian] Fixes styling changes f75089b [Cheng Lian] Avoids unnecessary defensive copies when sort based shuffle is on (cherry picked from commit a6d7b61) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for ee1bc89 - Browse repository at this point
Copy the full SHA ee1bc89View commit details -
This file is for Hive 0.13.1 I think. Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #3432 from darabos/patch-2 and squashes the following commits: 4fd22ed [Daniel Darabos] Fix comment. This file is for Hive 0.13.1. (cherry picked from commit d5834f0) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 1e3d22b - Browse repository at this point
Copy the full SHA 1e3d22bView commit details -
[SQL] Fix path in HiveFromSpark
It require us to run ```HiveFromSpark``` in specified dir because ```HiveFromSpark``` use relative path, this leads to ```run-example``` error(http://apache-spark-developers-list.1001551.n3.nabble.com/src-main-resources-kv1-txt-not-found-in-example-of-HiveFromSpark-td9100.html). Author: scwf <wangfei1@huawei.com> Closes #3415 from scwf/HiveFromSpark and squashes the following commits: ed3d6c9 [scwf] revert no need change b00e20c [scwf] fix path usring spark_home dbd321b [scwf] fix path in hivefromspark (cherry picked from commit b384119) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 0e7fa7f - Browse repository at this point
Copy the full SHA 0e7fa7fView commit details -
[SPARK-4487][SQL] Fix attribute reference resolution error when using…
… ORDER BY. When we use ORDER BY clause, at first, attributes referenced by projection are resolved (1). And then, attributes referenced at ORDER BY clause are resolved (2). But when resolving attributes referenced at ORDER BY clause, the resolution result generated in (1) is discarded so for example, following query fails. SELECT c1 + c2 FROM mytable ORDER BY c1; The query above fails because when resolving the attribute reference 'c1', the resolution result of 'c2' is discarded. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3363 from sarutak/SPARK-4487 and squashes the following commits: fd314f3 [Kousuke Saruta] Fixed attribute resolution logic in Analyzer 6e60c20 [Kousuke Saruta] Fixed conflicts cb5b7e9 [Kousuke Saruta] Added test case for SPARK-4487 282d529 [Kousuke Saruta] Fixed attributes reference resolution error b6123e6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into concat-feature 317b7fb [Kousuke Saruta] WIP (cherry picked from commit dd1c9cb) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 97b7eb4 - Browse repository at this point
Copy the full SHA 97b7eb4View commit details -
This PR adds two new pages to the Spark Web UI: - A jobs overview page, which shows details on running / completed / failed jobs. - A job details page, which displays information on an individual job's stages. The jobs overview page is now the default UI homepage; the old homepage is still accessible at `/stages`. ### Screenshots #### New UI homepage ![image](https://cloud.githubusercontent.com/assets/50748/5119035/fd0a69e6-701f-11e4-89cb-db7e9705714f.png) #### Job details page (This is effectively a per-job version of the stages page that can be extended later with other things, such as DAG visualizations) ![image](https://cloud.githubusercontent.com/assets/50748/5134910/50b340d4-70c7-11e4-88e1-6b73237ea7c8.png) ### Key changes in this PR - Rename `JobProgressPage` to `AllStagesPage` - Expose `StageInfo` objects in the ``SparkListenerJobStart` event; add backwards-compatibility tests to JsonProtocol. - Add additional data structures to `JobProgressListener` to map from stages to jobs. - Add several fields to `JobUIData`. I also added ~150 lines of Selenium tests as I uncovered UI issues while developing this patch. ### Limitations If a job contains stages that aren't run, then its overall job progress bar may be an underestimate of the total job progress; in other words, a completed job may appear to have a progress bar that's not at 100%. If stages or tasks fail, then the progress bar will not go backwards to reflect the true amount of remaining work. Author: Josh Rosen <joshrosen@databricks.com> Closes #3009 from JoshRosen/job-page and squashes the following commits: eb05e90 [Josh Rosen] Disable kill button in completed stages tables. f00c851 [Josh Rosen] Fix JsonProtocol compatibility b89c258 [Josh Rosen] More JSON protocol backwards-compatibility fixes. ff804cd [Josh Rosen] Don't write "Stage Ids" field in JobStartEvent JSON. 6f17f3f [Josh Rosen] Only store StageInfos in SparkListenerJobStart event. 2bbf41a [Josh Rosen] Update job progress bar to reflect skipped tasks/stages. 61c265a [Josh Rosen] Add “skipped stages” table; only display non-empty tables. 1f45d44 [Josh Rosen] Incorporate a bunch of minor review feedback. 0b77e3e [Josh Rosen] More bug fixes for phantom stages. 034aa8d [Josh Rosen] Use `.max()` to find result stage for job. eebdc2c [Josh Rosen] Don’t display pending stages for completed jobs. 67080ba [Josh Rosen] Ensure that "phantom stages" don't cause memory leaks. 7d10b97 [Josh Rosen] Merge remote-tracking branch 'apache/master' into job-page d69c775 [Josh Rosen] Fix table sorting on all jobs page. 5eb39dc [Josh Rosen] Add pending stages table to job page. f2a15da [Josh Rosen] Add status field to job details page. 171b53c [Josh Rosen] Move `startTime` to the start of SparkContext. e2f2c43 [Josh Rosen] Fix sorting of stages in job details page. 8955f4c [Josh Rosen] Display information for pending stages on jobs page. 8ab6c28 [Josh Rosen] Compute numTasks from job start stage infos. 5884f91 [Josh Rosen] Add StageInfos to SparkListenerJobStart event. 79793cd [Josh Rosen] Track indices of completed stage to avoid overcounting when failures occur. d62ea7b [Josh Rosen] Add failing Selenium test for stage overcounting issue. 1145c60 [Josh Rosen] Display text instead of progress bar for stages. 3d0a007 [Josh Rosen] Merge remote-tracking branch 'origin/master' into job-page 8a2351b [Josh Rosen] Add help tooltip to Spark Jobs page. b7bf30e [Josh Rosen] Add stages progress bar; fix bug where active stages show as completed. 4846ce4 [Josh Rosen] Hide "(Job Group") if no jobs were submitted in job groups. 4d58e55 [Josh Rosen] Change label to "Tasks (for all stages)" 85e9c85 [Josh Rosen] Extract startTime into separate variable. 1cf4987 [Josh Rosen] Fix broken kill links; add Selenium test to avoid future regressions. 56701fa [Josh Rosen] Move last stage name / description logic out of markup. a475ea1 [Josh Rosen] Add progress bars to jobs page. 45343b8 [Josh Rosen] More comments 4b206fb [Josh Rosen] Merge remote-tracking branch 'origin/master' into job-page bfce2b9 [Josh Rosen] Address review comments, except for progress bar. 4487dcb [Josh Rosen] [SPARK-4145] Web UI job pages 2568a6c [Josh Rosen] Rename JobProgressPage to AllStagesPage: (cherry picked from commit 4a90276) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 2d35cc0 - Browse repository at this point
Copy the full SHA 2d35cc0View commit details -
[SPARK-4518][SPARK-4519][Streaming] Refactored file stream to prevent…
… files from being processed multiple times Because of a corner case, a file already selected for batch t can get considered again for batch t+2. This refactoring fixes it by remembering all the files selected in the last 1 minute, so that this corner case does not arise. Also uses spark context's hadoop configuration to access the file system API for listing directories. pwendell Please take look. I still have not run long-running integration tests, so I cannot say for sure whether this has indeed solved the issue. You could do a first pass on this in the meantime. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #3419 from tdas/filestream-fix2 and squashes the following commits: c19dd8a [Tathagata Das] Addressed PR comments. 513b608 [Tathagata Das] Updated docs. d364faf [Tathagata Das] Added the current time condition back 5526222 [Tathagata Das] Removed unnecessary imports. 38bb736 [Tathagata Das] Fix long line. 203bbc7 [Tathagata Das] Un-ignore tests. eaef4e1 [Tathagata Das] Fixed SPARK-4519 9dbd40a [Tathagata Das] Refactored FileInputDStream to remember last few batches. (cherry picked from commit cb0e9b0) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 6fa3e41 - Browse repository at this point
Copy the full SHA 6fa3e41View commit details
Commits on Nov 25, 2014
-
[SPARK-4562] [MLlib] speedup vector
This PR change the underline array of DenseVector to numpy.ndarray to avoid the conversion, because most of the users will using numpy.array. It also improve the serialization of DenseVector. Before this change: trial | trainingTime | testTime -------|--------|-------- 0 | 5.126 | 1.786 1 |2.698 |1.693 After the change: trial | trainingTime | testTime -------|--------|-------- 0 |4.692 |0.554 1 |2.307 |0.525 This could partially fix the performance regression during test. Author: Davies Liu <davies@databricks.com> Closes #3420 from davies/ser2 and squashes the following commits: 0e1e6f3 [Davies Liu] fix tests 426f5db [Davies Liu] impove toArray() 44707ec [Davies Liu] add name for ISO-8859-1 fa7d791 [Davies Liu] address comments 1cfb137 [Davies Liu] handle zero sparse vector 2548ee2 [Davies Liu] fix tests 9e6389d [Davies Liu] bugfix 470f702 [Davies Liu] speed up DenseMatrix f0d3c40 [Davies Liu] speedup SparseVector ef6ce70 [Davies Liu] speed up dense vector (cherry picked from commit b660de7) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9ea67fc - Browse repository at this point
Copy the full SHA 9ea67fcView commit details -
get raw vectors for further processing in Word2Vec
e.g. clustering Author: tkaessmann <tobias.kaessmann@s24.com> Closes #3309 from tkaessmann/branch-1.2 and squashes the following commits: e3a3142 [tkaessmann] changes the comment for getVectors 58d3d83 [tkaessmann] removes sign from comment a5be213 [tkaessmann] fixes getVectors to fit code guidelines 3782fa9 [tkaessmann] get raw vectors for further processing
Configuration menu - View commit details
-
Copy full SHA for 2acbd28 - Browse repository at this point
Copy the full SHA 2acbd28View commit details -
[SPARK-4578] fix asDict() with nested Row()
The Row object is created on the fly once the field is accessed, so we should access them by getattr() in asDict(0 Author: Davies Liu <davies@databricks.com> Closes #3434 from davies/fix_asDict and squashes the following commits: b20f1e7 [Davies Liu] fix asDict() with nested Row() (cherry picked from commit 050616b) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 8371bc2 - Browse repository at this point
Copy the full SHA 8371bc2View commit details -
[SPARK-4548] []SPARK-4517] improve performance of python broadcast
Re-implement the Python broadcast using file: 1) serialize the python object using cPickle, write into disks. 2) Create a wrapper in JVM (for the dumped file), it read data from during serialization 3) Using TorrentBroadcast or HttpBroadcast to transfer the data (compressed) into executors 4) During deserialization, writing the data into disk. 5) Passing the path into Python worker, read data from disk and unpickle it into python object, until the first access. It fixes the performance regression introduced in #2659, has similar performance as 1.1, but support object larger than 2G, also improve the memory efficiency (only one compressed copy in driver and executor). Testing with a 500M broadcast and 4 tasks (excluding the benefit from reused worker in 1.2): name | 1.1 | 1.2 with this patch | improvement ---------|--------|---------|-------- python-broadcast-w-bytes | 25.20 | 9.33 | 170.13% | python-broadcast-w-set | 4.13 | 4.50 | -8.35% | Testing with 100 tasks (16 CPUs): name | 1.1 | 1.2 with this patch | improvement ---------|--------|---------|-------- python-broadcast-w-bytes | 38.16 | 8.40 | 353.98% python-broadcast-w-set | 23.29 | 9.59 | 142.80% Author: Davies Liu <davies@databricks.com> Closes #3417 from davies/pybroadcast and squashes the following commits: 50a58e0 [Davies Liu] address comments b98de1d [Davies Liu] disable gc while unpickle e5ee6b9 [Davies Liu] support large string 09303b8 [Davies Liu] read all data into memory dde02dd [Davies Liu] improve performance of python broadcast (cherry picked from commit 6cf5076) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 841f247 - Browse repository at this point
Copy the full SHA 841f247View commit details -
[SPARK-4266] [Web-UI] Reduce stage page load time.
The commit changes the java script used to show/hide additional metrics in order to reduce page load time. SPARK-4016 significantly increased page load time for the stage page when stages had a lot (thousands or tens of thousands) of tasks, due to the additional Javascript to hide some metrics by default and stripe the tables. This commit reduces page load time in two ways: (1) Now, all of the metrics that are hidden by default are hidden by setting "display: none;" using CSS for the page, rather than hiding them using javascript after the page loads. Without this change, for stages with thousands of tasks, there was a few second delay after page load, where first the additional metrics were shown, and then after a delay were hidden once the relevant JS finished running. (2) CSS is used to stripe all of the tables except for the summary table. The summary table needs javascript to do the striping because some rows are hidden, but the javascript striping is slower, which again resulted in a delay when it was used for the task table (where for a few seconds after page load, all of the rows in the task table would be white, while the browser finished running the JS to stripe the table). cc pwendell This change is intended to be backported to 1.2 to avoid a regression in UI performance when users run large jobs. Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #3328 from kayousterhout/SPARK-4266 and squashes the following commits: f964091 [Kay Ousterhout] [SPARK-4266] [Web-UI] Reduce stage page load time. (cherry picked from commit d24d5bf) Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 47d4fce - Browse repository at this point
Copy the full SHA 47d4fceView commit details -
[SPARK-4525] Mesos should decline unused offers
Functionally, this is just a small change on top of #3393 (by jongyoul). The issue being addressed is discussed in the comments there. I have not yet added a test for the bug there. I will add one shortly. I've also done some minor renaming/clean-up of variables in this class and tests. Author: Patrick Wendell <pwendell@gmail.com> Author: Jongyoul Lee <jongyoul@gmail.com> Closes #3436 from pwendell/mesos-issue and squashes the following commits: 58c35b5 [Patrick Wendell] Adding unit test for this situation c4f0697 [Patrick Wendell] Additional clean-up and fixes on top of existing fix f20f1b3 [Jongyoul Lee] [SPARK-4525] MesosSchedulerBackend.resourceOffers cannot decline unused offers from acceptedOffers - Added code for declining unused offers among acceptedOffers - Edited testCase for checking declining unused offers (cherry picked from commit b043c27) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 4b47973 - Browse repository at this point
Copy the full SHA 4b47973View commit details -
Revert "[SPARK-4525] Mesos should decline unused offers"
This reverts commit 4b47973. I accidentally committed this using my own authorship credential. However, I should have given authoriship to the original author: Jongyoul Lee.
Configuration menu - View commit details
-
Copy full SHA for e7b8bf0 - Browse repository at this point
Copy the full SHA e7b8bf0View commit details -
[SPARK-4525] Mesos should decline unused offers
Functionally, this is just a small change on top of #3393 (by jongyoul). The issue being addressed is discussed in the comments there. I have not yet added a test for the bug there. I will add one shortly. I've also done some minor renaming/clean-up of variables in this class and tests. Author: Patrick Wendell <pwendell@gmail.com> Author: Jongyoul Lee <jongyoul@gmail.com> Closes #3436 from pwendell/mesos-issue and squashes the following commits: 58c35b5 [Patrick Wendell] Adding unit test for this situation c4f0697 [Patrick Wendell] Additional clean-up and fixes on top of existing fix f20f1b3 [Jongyoul Lee] [SPARK-4525] MesosSchedulerBackend.resourceOffers cannot decline unused offers from acceptedOffers - Added code for declining unused offers among acceptedOffers - Edited testCase for checking declining unused offers (cherry picked from commit b043c27) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 10e4339 - Browse repository at this point
Copy the full SHA 10e4339View commit details -
[SQL] Compute timeTaken correctly
```timeTaken``` should not count the time of printing result. Author: w00228970 <wangfei1@huawei.com> Closes #3423 from scwf/time-taken-bug and squashes the following commits: da7e102 [w00228970] compute time taken correctly (cherry picked from commit 723be60) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 259cb26 - Browse repository at this point
Copy the full SHA 259cb26View commit details -
[DOC][Build] Wrong cmd for build spark with apache hadoop 2.4.X and h…
…ive 12 Author: wangfei <wangfei1@huawei.com> Closes #3335 from scwf/patch-10 and squashes the following commits: d343113 [wangfei] add '-Phive' 60d595e [wangfei] [DOC] Wrong cmd for build spark with apache hadoop 2.4.X and Hive 12 support (cherry picked from commit 0fe54cf) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 1f4d1ac - Browse repository at this point
Copy the full SHA 1f4d1acView commit details -
[SPARK-4596][MLLib] Refactorize Normalizer to make code cleaner
In this refactoring, the performance will be slightly increased due to removing the overhead from breeze vector. The bottleneck is still in breeze norm which is implemented by activeIterator. This inefficiency of breeze norm will be addressed in next PR. At least, this PR makes the code more consistent in the codebase. Author: DB Tsai <dbtsai@alpinenow.com> Closes #3446 from dbtsai/normalizer and squashes the following commits: e20a2b9 [DB Tsai] first commit (cherry picked from commit 89f9122) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 7457199 - Browse repository at this point
Copy the full SHA 7457199View commit details -
[SPARK-4526][MLLIB]GradientDescent get a wrong gradient value accordi…
…ng to the gradient formula. This is caused by the miniBatchSize parameter.The number of `RDD.sample` returns is not fixed. cc mengxr Author: GuoQiang Li <witgo@qq.com> Closes #3399 from witgo/GradientDescent and squashes the following commits: 13cb228 [GuoQiang Li] review commit 668ab66 [GuoQiang Li] Double to Long b6aa11a [GuoQiang Li] Check miniBatchSize is greater than 0 0b5c3e3 [GuoQiang Li] Minor fix 12e7424 [GuoQiang Li] GradientDescent get a wrong gradient value according to the gradient formula, which is caused by the miniBatchSize parameter. (cherry picked from commit f515f94) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for d117f8f - Browse repository at this point
Copy the full SHA d117f8fView commit details -
[SPARK-4535][Streaming] Fix the error in comments
change `NetworkInputDStream` to `ReceiverInputDStream` change `ReceiverInputTracker` to `ReceiverTracker` Author: q00251598 <qiyadong@huawei.com> Closes #3400 from watermen/fix-comments and squashes the following commits: 75d795c [q00251598] change 'NetworkInputDStream' to 'ReceiverInputDStream' && change 'ReceiverInputTracker' to 'ReceiverTracker' (cherry picked from commit a51118a) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> Conflicts: examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala
Configuration menu - View commit details
-
Copy full SHA for 42b9d0d - Browse repository at this point
Copy the full SHA 42b9d0dView commit details -
[SPARK-4381][Streaming]Add warning log when user set spark.master to …
…local in Spark Streaming and there's no job executed Author: jerryshao <saisai.shao@intel.com> Closes #3244 from jerryshao/SPARK-4381 and squashes the following commits: d2486c7 [jerryshao] Improve the warning log d726e85 [jerryshao] Add local[1] to the filter condition eca428b [jerryshao] Add warning log (cherry picked from commit fef27b2) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for b026546 - Browse repository at this point
Copy the full SHA b026546View commit details -
[SPARK-4344][DOCS] adding documentation on spark.yarn.user.classpath.…
…first The documentation for the two parameters is the same with a pointer from the standalone parameter to the yarn parameter Author: arahuja <aahuja11@gmail.com> Closes #3209 from arahuja/yarn-classpath-first-param and squashes the following commits: 51cb9b2 [arahuja] [SPARK-4344][DOCS] adding documentation for YARN on userClassPathFirst (cherry picked from commit d240760) Signed-off-by: Thomas Graves <tgraves@apache.org>
Configuration menu - View commit details
-
Copy full SHA for a689ab9 - Browse repository at this point
Copy the full SHA a689ab9View commit details -
[SPARK-4601][Streaming] Set correct call site for streaming jobs so t…
…hat it is displayed correctly on the Spark UI When running the NetworkWordCount, the description of the word count jobs are set as "getCallsite at DStream:xxx" . This should be set to the line number of the streaming application that has the output operation that led to the job being created. This is because the callsite is incorrectly set in the thread launching the jobs. This PR fixes that. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #3455 from tdas/streaming-callsite-fix and squashes the following commits: 69fc26f [Tathagata Das] Set correct call site for streaming jobs so that it is displayed correctly on the Spark UI (cherry picked from commit 69cd53e) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 96f76fc - Browse repository at this point
Copy the full SHA 96f76fcView commit details -
[SPARK-4581][MLlib] Refactorize StandardScaler to improve the transfo…
…rmation performance The following optimizations are done to improve the StandardScaler model transformation performance. 1) Covert Breeze dense vector to primitive vector to reduce the overhead. 2) Since mean can be potentially a sparse vector, we explicitly convert it to dense primitive vector. 3) Have a local reference to `shift` and `factor` array so JVM can locate the value with one operation call. 4) In pattern matching part, we use the mllib SparseVector/DenseVector instead of breeze's vector to make the codebase cleaner. Benchmark with mnist8m dataset: Before, DenseVector withMean and withStd: 50.97secs DenseVector withMean and withoutStd: 42.11secs DenseVector withoutMean and withStd: 8.75secs SparseVector withoutMean and withStd: 5.437secs With this PR, DenseVector withMean and withStd: 5.76secs DenseVector withMean and withoutStd: 5.28secs DenseVector withoutMean and withStd: 5.30secs SparseVector withoutMean and withStd: 1.27secs Note that without the local reference copy of `factor` and `shift` arrays, the runtime is almost three time slower. DenseVector withMean and withStd: 18.15secs DenseVector withMean and withoutStd: 18.05secs DenseVector withoutMean and withStd: 18.54secs SparseVector withoutMean and withStd: 2.01secs The following code, ```scala while (i < size) { values(i) = (values(i) - shift(i)) * factor(i) i += 1 } ``` will generate the bytecode ``` L13 LINENUMBER 106 L13 FRAME FULL [org/apache/spark/mllib/feature/StandardScalerModel org/apache/spark/mllib/linalg/Vector org/apache/spark/mllib/linalg/Vector org/apache/spark/mllib/linalg/DenseVector T [D I I] [] ILOAD 7 ILOAD 6 IF_ICMPGE L14 L15 LINENUMBER 107 L15 ALOAD 5 ILOAD 7 ALOAD 5 ILOAD 7 DALOAD ALOAD 0 INVOKESPECIAL org/apache/spark/mllib/feature/StandardScalerModel.shift ()[D ILOAD 7 DALOAD DSUB ALOAD 0 INVOKESPECIAL org/apache/spark/mllib/feature/StandardScalerModel.factor ()[D ILOAD 7 DALOAD DMUL DASTORE L16 LINENUMBER 108 L16 ILOAD 7 ICONST_1 IADD ISTORE 7 GOTO L13 ``` , while with the local reference of the `shift` and `factor` arrays, the bytecode will be ``` L14 LINENUMBER 107 L14 ALOAD 0 INVOKESPECIAL org/apache/spark/mllib/feature/StandardScalerModel.factor ()[D ASTORE 9 L15 LINENUMBER 108 L15 FRAME FULL [org/apache/spark/mllib/feature/StandardScalerModel org/apache/spark/mllib/linalg/Vector [D org/apache/spark/mllib/linalg/Vector org/apache/spark/mllib/linalg/DenseVector T [D I I [D] [] ILOAD 8 ILOAD 7 IF_ICMPGE L16 L17 LINENUMBER 109 L17 ALOAD 6 ILOAD 8 ALOAD 6 ILOAD 8 DALOAD ALOAD 2 ILOAD 8 DALOAD DSUB ALOAD 9 ILOAD 8 DALOAD DMUL DASTORE L18 LINENUMBER 110 L18 ILOAD 8 ICONST_1 IADD ISTORE 8 GOTO L15 ``` You can see that with local reference, the both of the arrays will be in the stack, so JVM can access the value without calling `INVOKESPECIAL`. Author: DB Tsai <dbtsai@alpinenow.com> Closes #3435 from dbtsai/standardscaler and squashes the following commits: 85885a9 [DB Tsai] revert to have lazy in shift array. daf2b06 [DB Tsai] Address the feedback cdb5cef [DB Tsai] small change 9c51eef [DB Tsai] style fc795e4 [DB Tsai] update 5bffd3d [DB Tsai] first commit (cherry picked from commit bf1a6aa) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 1e356a8 - Browse repository at this point
Copy the full SHA 1e356a8View commit details -
[SPARK-4196][SPARK-4602][Streaming] Fix serialization issue in PairDS…
…treamFunctions.saveAsNewAPIHadoopFiles Solves two JIRAs in one shot - Makes the ForechDStream created by saveAsNewAPIHadoopFiles serializable for checkpoints - Makes the default configuration object used saveAsNewAPIHadoopFiles be the Spark's hadoop configuration Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #3457 from tdas/savefiles-fix and squashes the following commits: bb4729a [Tathagata Das] Same treatment for saveAsHadoopFiles b382ea9 [Tathagata Das] Fix serialization issue in PairDStreamFunctions.saveAsNewAPIHadoopFiles. (cherry picked from commit 8838ad7) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for a9944c8 - Browse repository at this point
Copy the full SHA a9944c8View commit details -
Configuration menu - View commit details
-
Copy full SHA for a2c01ae - Browse repository at this point
Copy the full SHA a2c01aeView commit details -
[SPARK-4592] Avoid duplicate worker registrations in standalone mode
**Summary.** On failover, the Master may receive duplicate registrations from the same worker, causing the worker to exit. This is caused by this commit 4afe9a4, which adds logic for the worker to re-register with the master in case of failures. However, the following race condition may occur: (1) Master A fails and Worker attempts to reconnect to all masters (2) Master B takes over and notifies Worker (3) Worker responds by registering with Master B (4) Meanwhile, Worker's previous reconnection attempt reaches Master B, causing the same Worker to register with Master B twice **Fix.** Instead of attempting to register with all known masters, the worker should re-register with only the one that it has been communicating with. This is safe because the fact that a failover has occurred means the old master must have died. Then, when the worker is finally notified of a new master, it gives up on the old one in favor of the new one. **Caveat.** Even this fix is subject to more obscure race conditions. For instance, if Master B fails and Master A recovers immediately, then Master A may still observe duplicate worker registrations. However, this and other potential race conditions summarized in [SPARK-4592](https://issues.apache.org/jira/browse/SPARK-4592), are much, much less likely than the one described above, which is deterministically reproducible. Author: Andrew Or <andrew@databricks.com> Closes #3447 from andrewor14/standalone-failover and squashes the following commits: 0d9716c [Andrew Or] Move re-registration logic to actor for thread-safety 79286dc [Andrew Or] Preserve old behavior for initial retries 83b321c [Andrew Or] Tweak wording 1fce6a9 [Andrew Or] Active master actor could be null in the beginning b6f269e [Andrew Or] Avoid duplicate worker registrations (cherry picked from commit 1b2ab1c) Signed-off-by: Andrew Or <andrew@databricks.com>
Andrew Or committedNov 25, 2014 Configuration menu - View commit details
-
Copy full SHA for ee03175 - Browse repository at this point
Copy the full SHA ee03175View commit details -
[SPARK-4546] Improve HistoryServer first time user experience
The documentation points the user to run the following ``` sbin/start-history-server.sh ``` The first thing this does is throw an exception that complains a log directory is not specified. The exception message itself does not say anything about what to set. Instead we should have a default and a landing page with a better message. The new default log directory is `file:/tmp/spark-events`. This is what it looks like as of this PR: ![after](https://issues.apache.org/jira/secure/attachment/12682985/after.png) Author: Andrew Or <andrew@databricks.com> Closes #3411 from andrewor14/minor-history-improvements and squashes the following commits: f33d6b3 [Andrew Or] Point user to set config if default log dir does not exist fc4c17a [Andrew Or] Improve HistoryServer UX (cherry picked from commit 9afcbe4) Signed-off-by: Andrew Or <andrew@databricks.com>
Andrew Or committedNov 25, 2014 Configuration menu - View commit details
-
Copy full SHA for 58c840d - Browse repository at this point
Copy the full SHA 58c840dView commit details -
Fix SPARK-4471: blockManagerIdFromJson function throws exception whil…
…e B... Fix [SPARK-4471](https://issues.apache.org/jira/browse/SPARK-4471): blockManagerIdFromJson function throws exception while BlockManagerId be null in MetadataFetchFailedException Author: hushan[胡珊] <hushan@xiaomi.com> Closes #3340 from suyanNone/fix-blockmanagerId-jnothing-2 and squashes the following commits: 159f9a3 [hushan[胡珊]] Refine test code for blockmanager is null 4380d73 [hushan[胡珊]] remove useless blank line 3ccf651 [hushan[胡珊]] Fix SPARK-4471: blockManagerIdFromJson function throws exception while metadata fetch failed (cherry picked from commit 9bdf5da) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 93b914d - Browse repository at this point
Copy the full SHA 93b914dView commit details
Commits on Nov 26, 2014
-
[Spark-4509] Revert EC2 tag-based cluster membership patch
This PR reverts changes related to tag-based cluster membership. As discussed in SPARK-3332, we didn't figure out a safe strategy to use tags to determine cluster membership, because tagging is not atomic. The following changes are reverted: SPARK-2333: 94053a7 SPARK-3213: 7faf755 SPARK-3608: 78d4220. I tested launch, login, and destroy. It is easy to check the diff by comparing it to Josh's patch for branch-1.1: https://github.com/apache/spark/pull/2225/files JoshRosen I sent the PR to master. It might be easier for us to keep master and branch-1.2 the same at this time. We can always re-apply the patch once we figure out a stable solution. Author: Xiangrui Meng <meng@databricks.com> Closes #3453 from mengxr/SPARK-4509 and squashes the following commits: f0b708b [Xiangrui Meng] revert 94053a7 4298ea5 [Xiangrui Meng] revert 7faf755 35963a1 [Xiangrui Meng] Revert "SPARK-3608 Break if the instance tag naming succeeds" (cherry picked from commit 7eba0fb) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for a48ea3c - Browse repository at this point
Copy the full SHA a48ea3cView commit details -
[SPARK-4583] [mllib] LogLoss for GradientBoostedTrees fix + doc updates
Currently, the LogLoss used by GradientBoostedTrees has 2 issues: * the gradient (and therefore loss) does not match that used by Friedman (1999) * the error computation uses 0/1 accuracy, not log loss This PR updates LogLoss. It also adds some doc for boosting and forests. I tested it on sample data and made sure the log loss is monotonically decreasing with each boosting iteration. CC: mengxr manishamde codedeft Author: Joseph K. Bradley <joseph@databricks.com> Closes #3439 from jkbradley/gbt-loss-fix and squashes the following commits: cfec17e [Joseph K. Bradley] removed forgotten temp comments a27eb6d [Joseph K. Bradley] corrections to last log loss commit ed5da2c [Joseph K. Bradley] updated LogLoss (boosting) for numerical stability 5e52bff [Joseph K. Bradley] * Removed the 1/2 from SquaredError. This also required updating the test suite since it effectively doubles the gradient and loss. * Added doc for developers within RandomForest. * Small cleanup in test suite (generating data only once) e57897a [Joseph K. Bradley] Fixed LogLoss for GradientBoostedTrees, and updated doc for losses, forests, and boosting (cherry picked from commit c251fd7) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 6880b46 - Browse repository at this point
Copy the full SHA 6880b46View commit details -
Configuration menu - View commit details
-
Copy full SHA for 37d58aa - Browse repository at this point
Copy the full SHA 37d58aaView commit details -
[SPARK-4604][MLLIB] make MatrixFactorizationModel public
User could construct an MF model directly. I added a note about the performance. Author: Xiangrui Meng <meng@databricks.com> Closes #3459 from mengxr/SPARK-4604 and squashes the following commits: f64bcd3 [Xiangrui Meng] organize imports ed08214 [Xiangrui Meng] check preconditions and unit tests a624c12 [Xiangrui Meng] make MatrixFactorizationModel public (cherry picked from commit b5fb141) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 2756d0d - Browse repository at this point
Copy the full SHA 2756d0dView commit details -
[SPARK-4516] Cap default number of Netty threads at 8
In practice, only 2-4 cores should be required to transfer roughly 10 Gb/s, and each core that we use will have an initial overhead of roughly 32 MB of off-heap memory, which comes at a premium. Thus, this value should still retain maximum throughput and reduce wasted off-heap memory allocation. It can be overridden by setting the number of serverThreads and clientThreads manually in Spark's configuration. Author: Aaron Davidson <aaron@databricks.com> Closes #3469 from aarondav/fewer-pools2 and squashes the following commits: 087c59f [Aaron Davidson] [SPARK-4516] Cap default number of Netty threads at 8 (cherry picked from commit f5f2d27) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 1e12f59 - Browse repository at this point
Copy the full SHA 1e12f59View commit details -
Revert "Preparing development version 1.2.1-SNAPSHOT"
This reverts commit d7ac601.
Configuration menu - View commit details
-
Copy full SHA for b028aaf - Browse repository at this point
Copy the full SHA b028aafView commit details -
Revert "Preparing Spark release v1.2.0-snapshot1"
This reverts commit 38c1fbd.
Configuration menu - View commit details
-
Copy full SHA for 0127178 - Browse repository at this point
Copy the full SHA 0127178View commit details -
Preparing Spark release v1.2.0-rc1
Ubuntu committedNov 26, 2014 Configuration menu - View commit details
-
Copy full SHA for db7f4a8 - Browse repository at this point
Copy the full SHA db7f4a8View commit details -
Preparing development version 1.2.1-SNAPSHOT
Ubuntu committedNov 26, 2014 Configuration menu - View commit details
-
Copy full SHA for d7b1ecb - Browse repository at this point
Copy the full SHA d7b1ecbView commit details -
Revert "Preparing development version 1.2.1-SNAPSHOT"
This reverts commit d7b1ecb.
Configuration menu - View commit details
-
Copy full SHA for 68a217c - Browse repository at this point
Copy the full SHA 68a217cView commit details -
Revert "Preparing Spark release v1.2.0-rc1"
This reverts commit db7f4a8.
Configuration menu - View commit details
-
Copy full SHA for ce6200b - Browse repository at this point
Copy the full SHA ce6200bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5247dd8 - Browse repository at this point
Copy the full SHA 5247dd8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 79df6b4 - Browse repository at this point
Copy the full SHA 79df6b4View commit details -
Revert "Preparing development version 1.2.1-SNAPSHOT"
This reverts commit 79df6b4.
Configuration menu - View commit details
-
Copy full SHA for 37bc7a8 - Browse repository at this point
Copy the full SHA 37bc7a8View commit details -
Revert "Preparing Spark release v1.2.0-rc1"
This reverts commit 5247dd8.
Configuration menu - View commit details
-
Copy full SHA for de8029b - Browse repository at this point
Copy the full SHA de8029bView commit details -
Configuration menu - View commit details
-
Copy full SHA for dfb8c65 - Browse repository at this point
Copy the full SHA dfb8c65View commit details -
Configuration menu - View commit details
-
Copy full SHA for cc2c05e - Browse repository at this point
Copy the full SHA cc2c05eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 380eba5 - Browse repository at this point
Copy the full SHA 380eba5View commit details -
[SPARK-4516] Avoid allocating Netty PooledByteBufAllocators unnecessa…
…rily Turns out we are allocating an allocator pool for every TransportClient (which means that the number increases with the number of nodes in the cluster), when really we should just reuse one for all clients. This patch, as expected, greatly decreases off-heap memory allocation, and appears to make allocation only proportional to the number of cores. Author: Aaron Davidson <aaron@databricks.com> Closes #3465 from aarondav/fewer-pools and squashes the following commits: 36c49da [Aaron Davidson] [SPARK-4516] Avoid allocating unnecessarily Netty PooledByteBufAllocators (cherry picked from commit 346bc17) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for c7185f0 - Browse repository at this point
Copy the full SHA c7185f0View commit details -
Revert "Preparing development version 1.2.1-SNAPSHOT"
This reverts commit 380eba5.
Configuration menu - View commit details
-
Copy full SHA for 537d699 - Browse repository at this point
Copy the full SHA 537d699View commit details -
Revert "Preparing Spark release v1.2.0-rc1"
This reverts commit cc2c05e.
Configuration menu - View commit details
-
Copy full SHA for 8f5ebcb - Browse repository at this point
Copy the full SHA 8f5ebcbView commit details -
Revert "[SPARK-4583] [mllib] LogLoss for GradientBoostedTrees fix + d…
…oc updates" This reverts commit 6880b46.
Configuration menu - View commit details
-
Copy full SHA for 17a4b8e - Browse repository at this point
Copy the full SHA 17a4b8eView commit details -
Revert "[SPARK-4604][MLLIB] make MatrixFactorizationModel public"
This reverts commit 2756d0d.
Configuration menu - View commit details
-
Copy full SHA for 69d021b - Browse repository at this point
Copy the full SHA 69d021bView commit details -
[SPARK-4612] Reduce task latency and increase scheduling throughput b…
…y making configuration initialization lazy https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L337 creates a configuration object for every task that is launched, even if there is no new dependent file/JAR to update. This is a heavy-weight creation that should be avoided if there is no new file/JAR to update. This PR makes that creation lazy. Quick local test in spark-perf scheduling throughput tests gives the following numbers in a local standalone scheduler mode. 1 job with 10000 tasks: before 7.8395 seconds, after 2.6415 seconds = 3x increase in task scheduling throughput pwendell JoshRosen Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #3463 from tdas/lazy-config and squashes the following commits: c791c1e [Tathagata Das] Reduce task latency by making configuration initialization lazy (cherry picked from commit e7f4d25) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for e866972 - Browse repository at this point
Copy the full SHA e866972View commit details -
Removing confusing TripletFields
After additional discussion with rxin, I think having all the possible `TripletField` options is confusing. This pull request reduces the triplet fields to: ```java /** * None of the triplet fields are exposed. */ public static final TripletFields None = new TripletFields(false, false, false); /** * Expose only the edge field and not the source or destination field. */ public static final TripletFields EdgeOnly = new TripletFields(false, false, true); /** * Expose the source and edge fields but not the destination field. (Same as Src) */ public static final TripletFields Src = new TripletFields(true, false, true); /** * Expose the destination and edge fields but not the source field. (Same as Dst) */ public static final TripletFields Dst = new TripletFields(false, true, true); /** * Expose all the fields (source, edge, and destination). */ public static final TripletFields All = new TripletFields(true, true, true); ``` Author: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com> Closes #3472 from jegonzal/SimplifyTripletFields and squashes the following commits: 91796b5 [Joseph E. Gonzalez] removing confusing triplet fields (cherry picked from commit 288ce58) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9f3b159 - Browse repository at this point
Copy the full SHA 9f3b159View commit details -
[BRANCH-1.2][SPARK-4604][MLLIB] make MatrixFactorizationModel public
We reverted #3459 in branch-1.2 due to missing `import o.a.s.SparkContext._`, which is no longer needed in master (#3262). This PR adds #3459 back to branch-1.2 with correct imports. Github is out-of-sync now. The real changes are the last two commits. Author: Xiangrui Meng <meng@databricks.com> Closes #3473 from mengxr/SPARK-4604-1.2 and squashes the following commits: a7638a5 [Xiangrui Meng] add import o.a.s.SparkContext._ for v1.2 b749000 [Xiangrui Meng] [SPARK-4604][MLLIB] make MatrixFactorizationModel public
Configuration menu - View commit details
-
Copy full SHA for 9b63900 - Browse repository at this point
Copy the full SHA 9b63900View commit details -
[BRANCH-1.2][SPARK-4614][MLLIB] Slight API changes in Matrix and Matr…
…ices This is #3468 for branch-1.2, same content except mima excludes. Author: Xiangrui Meng <meng@databricks.com> Closes #3482 from mengxr/SPARK-4614-1.2 and squashes the following commits: ea4f08d [Xiangrui Meng] hide transposeMultiply; add rng to rand and randn; add unit tests
Configuration menu - View commit details
-
Copy full SHA for 8fc19e5 - Browse repository at this point
Copy the full SHA 8fc19e5View commit details -
[BRANCH-1.2][SPARK-4583][MLLIB] LogLoss for GradientBoostedTrees fix …
…+ doc updates We reverted #3439 in branch-1.2 due to missing `import o.a.s.SparkContext._`, which is no longer needed in master (#3262). This PR adds #3439 back to branch-1.2 with correct imports. Github is out-of-sync now. The real changes are the last two commits. Author: Joseph K. Bradley <joseph@databricks.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3474 from mengxr/SPARK-4583-1.2 and squashes the following commits: aca2abb [Xiangrui Meng] add import o.a.s.SparkContext._ for v1.2 6b5564a [Joseph K. Bradley] [SPARK-4583] [mllib] LogLoss for GradientBoostedTrees fix + doc updates
Configuration menu - View commit details
-
Copy full SHA for 69550f7 - Browse repository at this point
Copy the full SHA 69550f7View commit details
Commits on Nov 27, 2014
-
[SPARK-732][SPARK-3628][CORE][RESUBMIT] eliminate duplicate update on…
… accmulator https://issues.apache.org/jira/browse/SPARK-3628 In current implementation, the accumulator will be updated for every successfully finished task, even the task is from a resubmitted stage, which makes the accumulator counter-intuitive In this patch, I changed the way for the DAGScheduler to update the accumulator, DAGScheduler maintains a HashTable, mapping the stage id to the received <accumulator_id , value> pairs. Only when the stage becomes independent, (no job needs it any more), we accumulate the values of the <accumulator_id , value> pairs, when a task finished, we check if the HashTable has contained such stageId, it saves the accumulator_id, value only when the task is the first finished task of a new stage or the stage is running for the first attempt... Author: CodingCat <zhunansjtu@gmail.com> Closes #2524 from CodingCat/SPARK-732-1 and squashes the following commits: 701a1e8 [CodingCat] roll back change on Accumulator.scala 1433e6f [CodingCat] make MIMA happy b233737 [CodingCat] address Matei's comments 02261b8 [CodingCat] rollback some changes 6b0aff9 [CodingCat] update document 2b2e8cf [CodingCat] updateAccumulator 83b75f8 [CodingCat] style fix 84570d2 [CodingCat] re-enable the bad accumulator guard 1e9e14d [CodingCat] add NPE guard 21b6840 [CodingCat] simplify the patch 88d1f03 [CodingCat] fix rebase error f74266b [CodingCat] add test case for resubmitted result stage 5cf586f [CodingCat] de-duplicate on task level 138f9b3 [CodingCat] make MIMA happy 67593d2 [CodingCat] make if allowing duplicate update as an option of accumulator (cherry picked from commit 5af53ad) Signed-off-by: Matei Zaharia <matei@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 66cc243 - Browse repository at this point
Copy the full SHA 66cc243View commit details -
[Release] Automate generation of contributors list
This commit provides a script that computes the contributors list by linking the github commits with JIRA issues. Automatically translating github usernames remains a TODO at this point.
Andrew Or committedNov 27, 2014 Configuration menu - View commit details
-
Copy full SHA for a0aa07b - Browse repository at this point
Copy the full SHA a0aa07bView commit details -
[SPARK-4626] Kill a task only if the executorId is (still) registered…
… with the scheduler Author: roxchkplusony <roxchkplusony@gmail.com> Closes #3483 from roxchkplusony/bugfix/4626 and squashes the following commits: aba9184 [roxchkplusony] replace warning message per review 5e7fdea [roxchkplusony] [SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler (cherry picked from commit 84376d3) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for bfba8bf - Browse repository at this point
Copy the full SHA bfba8bfView commit details
Commits on Nov 28, 2014
-
[SPARK-4613][Core] Java API for JdbcRDD
This PR introduces a set of Java APIs for using `JdbcRDD`: 1. Trait (interface) `JdbcRDD.ConnectionFactory`: equivalent to the `getConnection: () => Connection` parameter in `JdbcRDD` constructor. 2. Two overloaded versions of `Jdbc.create`: used to create `JavaRDD` that wraps a `JdbcRDD`. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3478) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3478 from liancheng/japi-jdbc-rdd and squashes the following commits: 9a54625 [Cheng Lian] Only shutdowns a single DB rather than the whole Derby driver d4cedc5 [Cheng Lian] Moves Java JdbcRDD test case to a separate test suite ffcdf2e [Cheng Lian] Java API for JdbcRDD (cherry picked from commit 120a350) Signed-off-by: Matei Zaharia <matei@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 0928004 - Browse repository at this point
Copy the full SHA 0928004View commit details -
[SPARK-4619][Storage]delete redundant time suffix
Time suffix exists in Utils.getUsedTimeMs(startTime), no need to append again, delete that Author: maji2014 <maji3@asiainfo.com> Closes #3475 from maji2014/SPARK-4619 and squashes the following commits: df0da4e [maji2014] delete redundant time suffix (cherry picked from commit ceb6281) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for e924426 - Browse repository at this point
Copy the full SHA e924426View commit details -
[SPARK-4308][SQL] Sets SQL operation state to ERROR when exception is…
… thrown In `HiveThriftServer2`, when an exception is thrown during a SQL execution, the SQL operation state should be set to `ERROR`, but now it remains `RUNNING`. This affects the result of the `GetOperationStatus` Thrift API. Author: Cheng Lian <lian@databricks.com> Closes #3175 from liancheng/fix-op-state and squashes the following commits: 6d4c1fe [Cheng Lian] Sets SQL operation state to ERROR when exception is thrown
Configuration menu - View commit details
-
Copy full SHA for 7fa5fff - Browse repository at this point
Copy the full SHA 7fa5fffView commit details -
[SPARK-4645][SQL] Disables asynchronous execution in Hive 0.13.1 Hive…
…ThriftServer2 This PR disables HiveThriftServer2 asynchronous execution by setting `runInBackground` argument in `ExecuteStatementOperation` to `false`, and reverting `SparkExecuteStatementOperation.run` in Hive 13 shim to Hive 12 version. This change makes Simba ODBC driver v1.0.0.1000 work. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3506) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3506 from liancheng/disable-async-exec and squashes the following commits: 593804d [Cheng Lian] Disables asynchronous execution in Hive 0.13.1 HiveThriftServer2
Configuration menu - View commit details
-
Copy full SHA for 8cf1227 - Browse repository at this point
Copy the full SHA 8cf1227View commit details -
[SPARK-4193][BUILD] Disable doclint in Java 8 to prevent from build e…
…rror. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3058 from ueshin/issues/SPARK-4193 and squashes the following commits: e096bb1 [Takuya UESHIN] Add a plugin declaration to pluginManagement. 6762ec2 [Takuya UESHIN] Fix usage of -Xdoclint javadoc option. fdb280a [Takuya UESHIN] Fix Javadoc errors. 4745f3c [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4193 923e2f0 [Takuya UESHIN] Use doclint option `-missing` instead of `none`. 30d6718 [Takuya UESHIN] Fix Javadoc errors. b548017 [Takuya UESHIN] Disable doclint in Java 8 to prevent from build error. (cherry picked from commit e464f0a) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 3219834 - Browse repository at this point
Copy the full SHA 3219834View commit details -
[SPARK-4584] [yarn] Remove security manager from Yarn AM.
The security manager adds a lot of overhead to the runtime of the app, and causes a severe performance regression. Even stubbing out all unneeded methods (all except checkExit()) does not help. So, instead, penalize users who do an explicit System.exit() by leaving them in "undefined behavior" territory: if they do that, the Yarn backend won't be able to report the final app status to the RM. The result is that the final status of the application might not match the user's expectations. One side-effect of the change is that users who do an explicit System.exit() will lose the AM retry functionality. Since there is no way to know if the exit was because of success or failure, the AM right now errs on the side of it being a successful exit. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3484 from vanzin/SPARK-4584 and squashes the following commits: 21f2502 [Marcelo Vanzin] Do not retry apps that use System.exit(). 4198b3b [Marcelo Vanzin] [SPARK-4584] [yarn] Remove security manager from Yarn AM. (cherry picked from commit 915f8ee) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 8cec431 - Browse repository at this point
Copy the full SHA 8cec431View commit details -
Configuration menu - View commit details
-
Copy full SHA for 39c7d1c - Browse repository at this point
Copy the full SHA 39c7d1cView commit details -
Configuration menu - View commit details
-
Copy full SHA for fc7bff0 - Browse repository at this point
Copy the full SHA fc7bff0View commit details -
Revert "Preparing development version 1.2.1-SNAPSHOT"
This reverts commit fc7bff0.
Configuration menu - View commit details
-
Copy full SHA for 6e0269c - Browse repository at this point
Copy the full SHA 6e0269cView commit details -
Revert "Preparing Spark release v1.2.0-rc1"
This reverts commit 39c7d1c.
Configuration menu - View commit details
-
Copy full SHA for 88f1a6a - Browse repository at this point
Copy the full SHA 88f1a6aView commit details -
Configuration menu - View commit details
-
Copy full SHA for eb4d457 - Browse repository at this point
Copy the full SHA eb4d457View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1056e9e - Browse repository at this point
Copy the full SHA 1056e9eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 00316cc - Browse repository at this point
Copy the full SHA 00316ccView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3a4609e - Browse repository at this point
Copy the full SHA 3a4609eView commit details
Commits on Nov 29, 2014
-
[SPARK-4597] Use proper exception and reset variable in Utils.createT…
…empDir() `File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should be reset too. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3449 from viirya/fix_createtempdir and squashes the following commits: 36cacbd [Liang-Chi Hsieh] Use proper exception and reset variable. (cherry picked from commit 49fe879) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 854fade - Browse repository at this point
Copy the full SHA 854fadeView commit details
Commits on Nov 30, 2014
-
[DOCS][BUILD] Add instruction to use change-version-to-2.11.sh in 'Bu…
…ilding for Scala 2.11'. To build with Scala 2.11, we have to execute `change-version-to-2.11.sh` before Maven execute, otherwise inter-module dependencies are broken. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3361 from ueshin/docs/building-spark_2.11 and squashes the following commits: 1d29126 [Takuya UESHIN] Add instruction to use change-version-to-2.11.sh in 'Building for Scala 2.11'. (cherry picked from commit 0fcd24c) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for e07dbd8 - Browse repository at this point
Copy the full SHA e07dbd8View commit details -
SPARK-2143 [WEB UI] Add Spark version to UI footer
This PR adds the Spark version number to the UI footer; this is how it looks: ![screen shot 2014-11-21 at 22 58 40](https://cloud.githubusercontent.com/assets/822522/5157738/f4822094-7316-11e4-98f1-333a535fdcfa.png) Author: Sean Owen <sowen@cloudera.com> Closes #3410 from srowen/SPARK-2143 and squashes the following commits: e9b3a7a [Sean Owen] Add Spark version to footer
Configuration menu - View commit details
-
Copy full SHA for d324728 - Browse repository at this point
Copy the full SHA d324728View commit details
Commits on Dec 1, 2014
-
[SPARK-4656][Doc] Typo in Programming Guide markdown
Grammatical error in Programming Guide document Author: lewuathe <lewuathe@me.com> Closes #3412 from Lewuathe/typo-programming-guide and squashes the following commits: a3e2f00 [lewuathe] Typo in Programming Guide markdown (cherry picked from commit a217ec5) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for c899f03 - Browse repository at this point
Copy the full SHA c899f03View commit details -
[DOC] Fixes formatting typo in SQL programming guide
<!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3498) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3498 from liancheng/fix-sql-doc-typo and squashes the following commits: 865ecd7 [Cheng Lian] Fixes formatting typo in SQL programming guide (cherry picked from commit 2a4d389) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 0f4dad4 - Browse repository at this point
Copy the full SHA 0f4dad4View commit details -
SPARK-2192 [BUILD] Examples Data Not in Binary Distribution
Simply, add data/ to distributions. This adds about 291KB (compressed) to the tarball, FYI. Author: Sean Owen <sowen@cloudera.com> Closes #3480 from srowen/SPARK-2192 and squashes the following commits: 47688f1 [Sean Owen] Add data/ to distributions (cherry picked from commit 6384f42) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9b8a769 - Browse repository at this point
Copy the full SHA 9b8a769View commit details -
[SPARK-4661][Core] Minor code and docs cleanup
Author: zsxwing <zsxwing@gmail.com> Closes #3521 from zsxwing/SPARK-4661 and squashes the following commits: 03cbe3f [zsxwing] Minor code and docs cleanup (cherry picked from commit 30a86ac) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 67a2c13 - Browse repository at this point
Copy the full SHA 67a2c13View commit details -
Documentation: add description for repartitionAndSortWithinPartitions
Author: Madhu Siddalingaiah <madhu@madhu.com> Closes #3390 from msiddalingaiah/master and squashes the following commits: cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again) 332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code> cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions (cherry picked from commit 2b233f5) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 35bc338 - Browse repository at this point
Copy the full SHA 35bc338View commit details -
[SPARK-4258][SQL][DOC] Documents spark.sql.parquet.filterPushdown
Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by default and when it's safe to be turned on. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3440 from liancheng/parquet-filter-pushdown-doc and squashes the following commits: 2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown (cherry picked from commit 5db8dca) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9c9b4bd - Browse repository at this point
Copy the full SHA 9c9b4bdView commit details -
Configuration menu - View commit details
-
Copy full SHA for e0a6d36 - Browse repository at this point
Copy the full SHA e0a6d36View commit details -
[SPARK-4358][SQL] Let BigDecimal do checking type compatibility
Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3208 from viirya/more_numericLit and squashes the following commits: e9834b4 [Liang-Chi Hsieh] Remove byte and short types for number literal. 1bd1825 [Liang-Chi Hsieh] Fix Indentation and make the modification clearer. cf1a997 [Liang-Chi Hsieh] Modified for comment to add a rule of analysis that adds a cast. 91fe489 [Liang-Chi Hsieh] add Byte and Short. 1bdc69d [Liang-Chi Hsieh] Let BigDecimal do checking type compatibility. (cherry picked from commit b57365a) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for f2bb90a - Browse repository at this point
Copy the full SHA f2bb90aView commit details -
[SPARK-4650][SQL] Supporting multi column support in countDistinct fu…
…nction like count(distinct c1,c2..) in Spark SQL Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL Author: ravipesala <ravindra.pesala@huawei.com> Author: Michael Armbrust <michael@databricks.com> Closes #3511 from ravipesala/countdistinct and squashes the following commits: cc4dbb1 [ravipesala] style 070e12a [ravipesala] Supporting multi column support in count(distinct c1,c2..) in Spark SQL (cherry picked from commit 6a9ff19) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 5006aab - Browse repository at this point
Copy the full SHA 5006aabView commit details -
[SPARK-4658][SQL] Code documentation issue in DDL of datasource API
Configuration menu - View commit details
-
Copy full SHA for b39cfee - Browse repository at this point
Copy the full SHA b39cfeeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 31cf51b - Browse repository at this point
Copy the full SHA 31cf51bView commit details -
Configuration menu - View commit details
-
Copy full SHA for e66f816 - Browse repository at this point
Copy the full SHA e66f816View commit details
Commits on Dec 2, 2014
-
[SPARK-4529] [SQL] support view with column alias
Support view definition like CREATE VIEW view3(valoo) TBLPROPERTIES ("fear" = "factor") AS SELECT upper(value) FROM src WHERE key=86; [valoo as the alias of upper(value)]. This is missing part of SPARK-4239, for a fully view support. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3396 from adrian-wang/viewcolumn and squashes the following commits: 4d001d0 [Daoyuan Wang] support view with column alias (cherry picked from commit 4df60a8) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 445fc95 - Browse repository at this point
Copy the full SHA 445fc95View commit details -
[SPARK-4611][MLlib] Implement the efficient vector norm
The vector norm in breeze is implemented by `activeIterator` which is known to be very slow. In this PR, an efficient vector norm is implemented, and with this API, `Normalizer` and `k-means` have big performance improvement. Here is the benchmark against mnist8m dataset. a) `Normalizer` Before DenseVector: 68.25secs SparseVector: 17.01secs With this PR DenseVector: 12.71secs SparseVector: 2.73secs b) `k-means` Before DenseVector: 83.46secs SparseVector: 61.60secs With this PR DenseVector: 70.04secs SparseVector: 59.05secs Author: DB Tsai <dbtsai@alpinenow.com> Closes #3462 from dbtsai/norm and squashes the following commits: 63c7165 [DB Tsai] typo 0c3637f [DB Tsai] add import org.apache.spark.SparkContext._ back 6fa616c [DB Tsai] address feedback 9b7cb56 [DB Tsai] move norm to static method 0b632e6 [DB Tsai] kmeans dbed124 [DB Tsai] style c1a877c [DB Tsai] first commit (cherry picked from commit 64f3175) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 3783e15 - Browse repository at this point
Copy the full SHA 3783e15View commit details -
[SPARK-4686] Link to allowed master URLs is broken
The link points to the old scala programming guide; it should point to the submitting applications page. This should be backported to 1.1.2 (it's been broken as of 1.0). Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #3542 from kayousterhout/SPARK-4686 and squashes the following commits: a8fc43b [Kay Ousterhout] [SPARK-4686] Link to allowed master URLs is broken (cherry picked from commit d9a148b) Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for b97c27f - Browse repository at this point
Copy the full SHA b97c27fView commit details -
[SPARK-4536][SQL] Add sqrt and abs to Spark SQL DSL
Spark SQL has embeded sqrt and abs but DSL doesn't support those functions. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3401 from sarutak/dsl-missing-operator and squashes the following commits: 07700cf [Kousuke Saruta] Modified Literal(null, NullType) to Literal(null) in DslQuerySuite 8f366f8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator 1b88e2e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator 0396f89 [Kousuke Saruta] Added sqrt and abs to Spark SQL DSL (cherry picked from commit e75e04f) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 1850d90 - Browse repository at this point
Copy the full SHA 1850d90View commit details -
[SPARK-4663][sql]add finally to avoid resource leak
Author: baishuo <vc_java@hotmail.com> Closes #3526 from baishuo/master-trycatch and squashes the following commits: d446e14 [baishuo] correct the code style b36bf96 [baishuo] correct the code style ae0e447 [baishuo] add finally to avoid resource leak (cherry picked from commit 69b6fed) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for aa3d369 - Browse repository at this point
Copy the full SHA aa3d369View commit details -
[SPARK-4676][SQL] JavaSchemaRDD.schema may throw NullType MatchError …
…if sql has null val jsc = new org.apache.spark.api.java.JavaSparkContext(sc) val jhc = new org.apache.spark.sql.hive.api.java.JavaHiveContext(jsc) val nrdd = jhc.hql("select null from spark_test.for_test") println(nrdd.schema) Then the error is thrown as follows: scala.MatchError: NullType (of class org.apache.spark.sql.catalyst.types.NullType$) at org.apache.spark.sql.types.util.DataTypeConversions$.asJavaDataType(DataTypeConversions.scala:43) Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Author: Michael Armbrust <michael@databricks.com> Closes #3538 from YanTangZhai/MatchNullType and squashes the following commits: e052dff [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null 4b4bb34 [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null 896c7b7 [yantangzhai] fix NullType MatchError in JavaSchemaRDD when sql has null 6e643f8 [YanTangZhai] Merge pull request #11 from apache/master e249846 [YanTangZhai] Merge pull request #10 from apache/master d26d982 [YanTangZhai] Merge pull request #9 from apache/master 76d4027 [YanTangZhai] Merge pull request #8 from apache/master 03b62b0 [YanTangZhai] Merge pull request #7 from apache/master 8a00106 [YanTangZhai] Merge pull request #6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master (cherry picked from commit 1066427) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 06129cd - Browse repository at this point
Copy the full SHA 06129cdView commit details -
[SPARK-4593][SQL] Return null when denominator is 0
SELECT max(1/0) FROM src would return a very large number, which is obviously not right. For hive-0.12, hive would return `Infinity` for 1/0, while for hive-0.13.1, it is `NULL` for 1/0. I think it is better to keep our behavior with newer Hive version. This PR ensures that when the divider is 0, the result of expression should be NULL, same with hive-0.13.1 Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3443 from adrian-wang/div and squashes the following commits: 2e98677 [Daoyuan Wang] fix code gen for divide 0 85c28ba [Daoyuan Wang] temp 36236a5 [Daoyuan Wang] add test cases 6f5716f [Daoyuan Wang] fix comments cee92bd [Daoyuan Wang] avoid evaluation 2 times 22ecd9a [Daoyuan Wang] fix style cf28c58 [Daoyuan Wang] divide fix 2dfe50f [Daoyuan Wang] return null when divider is 0 of Double type (cherry picked from commit f6df609) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 97dc238 - Browse repository at this point
Copy the full SHA 97dc238View commit details -
[SPARK-4670] [SQL] wrong symbol for bitwise not
We should use `~` instead of `-` for bitwise NOT. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3528 from adrian-wang/symbol and squashes the following commits: affd4ad [Daoyuan Wang] fix code gen test case 56efb79 [Daoyuan Wang] ensure bitwise NOT over byte and short persist data type f55fbae [Daoyuan Wang] wrong symbol for bitwise not (cherry picked from commit 1f5ddf1) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for adc5d6f - Browse repository at this point
Copy the full SHA adc5d6fView commit details -
[SPARK-4695][SQL] Get result using executeCollect
Using ```executeCollect``` to collect the result, because executeCollect is a custom implementation of collect in spark sql which better than rdd's collect Author: wangfei <wangfei1@huawei.com> Closes #3547 from scwf/executeCollect and squashes the following commits: a5ab68e [wangfei] Revert "adding debug info" a60d680 [wangfei] fix test failure 0db7ce8 [wangfei] adding debug info 184c594 [wangfei] using executeCollect instead collect (cherry picked from commit 3ae0cda) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 658fe8f - Browse repository at this point
Copy the full SHA 658fe8fView commit details
Commits on Dec 3, 2014
-
[Release] Translate unknown author names automatically
Andrew Or committedDec 3, 2014 Configuration menu - View commit details
-
Copy full SHA for 5e026a3 - Browse repository at this point
Copy the full SHA 5e026a3View commit details -
[SPARK-4672][GraphX]Perform checkpoint() on PartitionsRDD to shorten …
…the lineage The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672 Iterative GraphX applications always have long lineage, while checkpoint() on EdgeRDD and VertexRDD themselves cannot shorten the lineage. In contrast, if we perform checkpoint() on their ParitionsRDD, the long lineage can be cut off. Moreover, the existing operations such as cache() in this code is performed on the PartitionsRDD, so checkpoint() should do the same way. More details and explanation can be found in the JIRA. Author: JerryLead <JerryLead@163.com> Author: Lijie Xu <csxulijie@gmail.com> Closes #3549 from JerryLead/my_graphX_checkpoint and squashes the following commits: d1aa8d8 [JerryLead] Perform checkpoint() on PartitionsRDD not VertexRDD and EdgeRDD themselves ff08ed4 [JerryLead] Merge branch 'master' of https://github.com/apache/spark c0169da [JerryLead] Merge branch 'master' of https://github.com/apache/spark 52799e3 [Lijie Xu] Merge pull request #1 from apache/master (cherry picked from commit fc0a147) Signed-off-by: Ankur Dave <ankurdave@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for f1859fc - Browse repository at this point
Copy the full SHA f1859fcView commit details -
[SPARK-4672][GraphX]Non-transient PartitionsRDDs will lead to StackOv…
…erflow error The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672 In a nutshell, if `val partitionsRDD` in EdgeRDDImpl and VertexRDDImpl are non-transient, the serialization chain can become very long in iterative algorithms and finally lead to the StackOverflow error. More details and explanation can be found in the JIRA. Author: JerryLead <JerryLead@163.com> Author: Lijie Xu <csxulijie@gmail.com> Closes #3544 from JerryLead/my_graphX and squashes the following commits: 628f33c [JerryLead] set PartitionsRDD to be transient in EdgeRDDImpl and VertexRDDImpl c0169da [JerryLead] Merge branch 'master' of https://github.com/apache/spark 52799e3 [Lijie Xu] Merge pull request #1 from apache/master (cherry picked from commit 17c162f) Signed-off-by: Ankur Dave <ankurdave@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 528cce8 - Browse repository at this point
Copy the full SHA 528cce8View commit details -
[SPARK-4672][Core]Checkpoint() should clear f to shorten the serializ…
…ation chain The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672 The f closure of `PartitionsRDD(ZippedPartitionsRDD2)` contains a `$outer` that references EdgeRDD/VertexRDD, which causes task's serialization chain become very long in iterative GraphX applications. As a result, StackOverflow error will occur. If we set "f = null" in `clearDependencies()`, checkpoint() can cut off the long serialization chain. More details and explanation can be found in the JIRA. Author: JerryLead <JerryLead@163.com> Author: Lijie Xu <csxulijie@gmail.com> Closes #3545 from JerryLead/my_core and squashes the following commits: f7faea5 [JerryLead] checkpoint() should clear the f to avoid StackOverflow error c0169da [JerryLead] Merge branch 'master' of https://github.com/apache/spark 52799e3 [Lijie Xu] Merge pull request #1 from apache/master (cherry picked from commit 77be8b9) Signed-off-by: Ankur Dave <ankurdave@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 667f7ff - Browse repository at this point
Copy the full SHA 667f7ffView commit details -
[SPARK-4710] [mllib] Eliminate MLlib compilation warnings
Renamed StreamingKMeans to StreamingKMeansExample to avoid warning about name conflict with StreamingKMeans class. Added import to DecisionTreeRunner to eliminate warning. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #3568 from jkbradley/ml-compilation-warnings and squashes the following commits: 64d6bc4 [Joseph K. Bradley] Updated DecisionTreeRunner.scala and StreamingKMeans.scala to eliminate compilation warnings, including renaming StreamingKMeans to StreamingKMeansExample. (cherry picked from commit 4ac2151) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for fb14bfd - Browse repository at this point
Copy the full SHA fb14bfdView commit details -
[SPARK-4708][MLLib] Make k-mean runs two/three times faster with dens…
…e/sparse sample Note that the usage of `breezeSquaredDistance` in `org.apache.spark.mllib.util.MLUtils.fastSquaredDistance` is in the critical path, and `breezeSquaredDistance` is slow. We should replace it with our own implementation. Here is the benchmark against mnist8m dataset. Before DenseVector: 70.04secs SparseVector: 59.05secs With this PR DenseVector: 30.58secs SparseVector: 21.14secs Author: DB Tsai <dbtsai@alpinenow.com> Closes #3565 from dbtsai/kmean and squashes the following commits: 08bc068 [DB Tsai] restyle de24662 [DB Tsai] address feedback b185a77 [DB Tsai] cleanup 4554ddd [DB Tsai] first commit (cherry picked from commit 7fc49ed) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 8ff7a28 - Browse repository at this point
Copy the full SHA 8ff7a28View commit details -
[SPARK-4717][MLlib] Optimize BLAS library to avoid de-reference multi…
…ple times in loop Have a local reference to `values` and `indices` array in the `Vector` object so JVM can locate the value with one operation call. See `SPARK-4581` for similar optimization, and the bytecode analysis. Author: DB Tsai <dbtsai@alpinenow.com> Closes #3577 from dbtsai/blasopt and squashes the following commits: 62d38c4 [DB Tsai] formating 0316cef [DB Tsai] first commit (cherry picked from commit d005429) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for b63e941 - Browse repository at this point
Copy the full SHA b63e941View commit details -
SPARK-2624 add datanucleus jars to the container in yarn-cluster
If `spark-submit` finds the datanucleus jars, it adds them to the driver's classpath, but does not add it to the container. This patch modifies the yarn deployment class to copy all `datanucleus-*` jars found in `[spark-home]/libs` to the container. Author: Jim Lim <jim@quixey.com> Closes #3238 from jimjh/SPARK-2624 and squashes the following commits: 3633071 [Jim Lim] SPARK-2624 update documentation and comments fe95125 [Jim Lim] SPARK-2624 keep java imports together 6c31fe0 [Jim Lim] SPARK-2624 update documentation 6690fbf [Jim Lim] SPARK-2624 add tests d28d8e9 [Jim Lim] SPARK-2624 add spark.yarn.datanucleus.dir option 84e6cba [Jim Lim] SPARK-2624 add datanucleus jars to the container in yarn-cluster
Jim Lim authored and Andrew Or committedDec 3, 2014 Configuration menu - View commit details
-
Copy full SHA for 163fd78 - Browse repository at this point
Copy the full SHA 163fd78View commit details -
Modified typo. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Closes #3560 from tsudukim/feature/SPARK-4701 and squashes the following commits: ed2a3f1 [Masayoshi TSUZUKI] Another whitespace position error. 1af3a35 [Masayoshi TSUZUKI] [SPARK-4701] Typo in sbt/sbt (cherry picked from commit 96786e3) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 614e686 - Browse repository at this point
Copy the full SHA 614e686View commit details -
[SPARK-4715][Core] Make sure tryToAcquire won't return a negative value
ShuffleMemoryManager.tryToAcquire may return a negative value. The unit test demonstrates this bug. It will output `0 did not equal -200 granted is negative`. Author: zsxwing <zsxwing@gmail.com> Closes #3575 from zsxwing/SPARK-4715 and squashes the following commits: a193ae6 [zsxwing] Make sure tryToAcquire won't return a negative value (cherry picked from commit edd3cd4) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 1ee65b4 - Browse repository at this point
Copy the full SHA 1ee65b4View commit details -
[SPARK-4642] Add description about spark.yarn.queue to running-on-YAR…
…N document. Added descriptions about these parameters. - spark.yarn.queue Modified description about the defalut value of this parameter. - spark.yarn.submit.file.replication Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Closes #3500 from tsudukim/feature/SPARK-4642 and squashes the following commits: ce99655 [Masayoshi TSUZUKI] better gramatically. 21cf624 [Masayoshi TSUZUKI] Removed intentionally undocumented properties. 88cac9b [Masayoshi TSUZUKI] [SPARK-4642] Documents about running-on-YARN needs update (cherry picked from commit 692f493) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 4a71e08 - Browse repository at this point
Copy the full SHA 4a71e08View commit details -
[HOT FIX] [YARN] Check whether
/lib
exists before listing its filesThis is caused by a975dc3 Author: Andrew Or <andrew@databricks.com> Closes #3589 from andrewor14/yarn-hot-fix and squashes the following commits: a4fad5f [Andrew Or] Check whether lib directory exists before listing its files (cherry picked from commit 90ec643) Signed-off-by: Andrew Or <andrew@databricks.com>
Andrew Or committedDec 3, 2014 Configuration menu - View commit details
-
Copy full SHA for 38cb2c3 - Browse repository at this point
Copy the full SHA 38cb2c3View commit details -
[SPARK-4552][SQL] Avoid exception when reading empty parquet data thr…
…ough Hive This is a very small fix that catches one specific exception and returns an empty table. #3441 will address this in a more principled way. Author: Michael Armbrust <michael@databricks.com> Closes #3586 from marmbrus/fixEmptyParquet and squashes the following commits: 2781d9f [Michael Armbrust] Handle empty lists for newParquet 04dd376 [Michael Armbrust] Avoid exception when reading empty parquet data through Hive (cherry picked from commit 513ef82) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 4793197 - Browse repository at this point
Copy the full SHA 4793197View commit details -
[SPARK-4498][core] Don't transition ExecutorInfo to RUNNING until Dri…
…ver adds Executor The ExecutorInfo only reaches the RUNNING state if the Driver is alive to send the ExecutorStateChanged message to master. Else, appInfo.resetRetryCount() is never called and failing Executors will eventually exceed ApplicationState.MAX_NUM_RETRY, resulting in the application being removed from the master's accounting. Author: Mark Hamstra <markhamstra@gmail.com> Closes #3550 from markhamstra/SPARK-4498 and squashes the following commits: 8f543b1 [Mark Hamstra] Don't transition ExecutorInfo to RUNNING until Executor is added by Driver
Configuration menu - View commit details
-
Copy full SHA for 6b6b779 - Browse repository at this point
Copy the full SHA 6b6b779View commit details
Commits on Dec 4, 2014
-
[SPARK-4085] Propagate FetchFailedException when Spark fails to read …
…local shuffle file. cc aarondav kayousterhout pwendell This should go into 1.2? Author: Reynold Xin <rxin@databricks.com> Closes #3579 from rxin/SPARK-4085 and squashes the following commits: 255b4fd [Reynold Xin] Updated test. f9814d9 [Reynold Xin] Code review feedback. 2afaf35 [Reynold Xin] [SPARK-4085] Propagate FetchFailedException when Spark fails to read local shuffle file. (cherry picked from commit 1826372) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for fe28ee2 - Browse repository at this point
Copy the full SHA fe28ee2View commit details -
[SPARK-4711] [mllib] [docs] Programming guide advice on choosing opti…
…mizer I have heard requests for the docs to include advice about choosing an optimization method. The programming guide could include a brief statement about this (so the user does not have to read the whole optimization section). CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #3569 from jkbradley/lr-doc and squashes the following commits: 654aeb5 [Joseph K. Bradley] updated section header for mllib-optimization 5035ad0 [Joseph K. Bradley] updated based on review 94f6dec [Joseph K. Bradley] Updated linear methods and optimization docs with quick advice on choosing an optimization method (cherry picked from commit 27ab0b8) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 4259ca8 - Browse repository at this point
Copy the full SHA 4259ca8View commit details -
[SPARK-4580] [SPARK-4610] [mllib] [docs] Documentation for tree ensem…
…bles + DecisionTree API fix Major changes: * Added programming guide sections for tree ensembles * Added examples for tree ensembles * Updated DecisionTree programming guide with more info on parameters * **API change**: Standardized the tree parameter for the number of classes (for classification) Minor changes: * Updated decision tree documentation * Updated existing tree and tree ensemble examples * Use train/test split, and compute test error instead of training error. * Fixed decision_tree_runner.py to actually use the number of classes it computes from data. (small bug fix) Note: I know this is a lot of lines, but most is covered by: * Programming guide sections for gradient boosting and random forests. (The changes are probably best viewed by generating the docs locally.) * New examples (which were copied from the programming guide) * The "numClasses" renaming I have run all examples and relevant unit tests. CC: mengxr manishamde codedeft Author: Joseph K. Bradley <joseph@databricks.com> Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com> Closes #3461 from jkbradley/ensemble-docs and squashes the following commits: 70a75f3 [Joseph K. Bradley] updated forest vs boosting comparison d1de753 [Joseph K. Bradley] Added note about toString and toDebugString for DecisionTree to migration guide 8e87f8f [Joseph K. Bradley] Combined GBT and RandomForest guides into one ensembles guide 6fab846 [Joseph K. Bradley] small fixes based on review b9f8576 [Joseph K. Bradley] updated decision tree doc 375204c [Joseph K. Bradley] fixed python style 2b60b6e [Joseph K. Bradley] merged Java RandomForest examples into 1 file. added header. Fixed small bug in same example in the programming guide. 706d332 [Joseph K. Bradley] updated python DT runner to print full model if it is small c76c823 [Joseph K. Bradley] added migration guide for mllib abe5ed7 [Joseph K. Bradley] added examples for random forest in Java and Python to examples folder 07fc11d [Joseph K. Bradley] Renamed numClassesForClassification to numClasses everywhere in trees and ensembles. This is a breaking API change, but it was necessary to correct an API inconsistency in Spark 1.1 (where Python DecisionTree used numClasses but Scala used numClassesForClassification). cdfdfbc [Joseph K. Bradley] added examples for GBT 6372a2b [Joseph K. Bradley] updated decision tree examples to use random split. tested all of them. ad3e695 [Joseph K. Bradley] added gbt and random forest to programming guide. still need to update their examples (cherry picked from commit 657a888) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9880bb4 - Browse repository at this point
Copy the full SHA 9880bb4View commit details -
[Release] Correctly translate contributors name in release notes
This commit involves three main changes: (1) It separates the translation of contributor names from the generation of the contributors list. This is largely motivated by the Github API limit; even if we exceed this limit, we should at least be able to proceed manually as before. This is why the translation logic is abstracted into its own script translate-contributors.py. (2) When we look for candidate replacements for invalid author names, we should look for the assignees of the associated JIRAs too. As a result, the intermediate file must keep track of these. (3) This provides an interactive mode with which the user can sit at the terminal and manually pick the candidate replacement that he/she thinks makes the most sense. As before, there is a non-interactive mode that picks the first candidate that the script considers "valid." TODO: We should have a known_contributors file that stores known mappings so we don't have to go through all of this translation every time. This is also valuable because some contributors simply cannot be automatically translated. Conflicts: .gitignore
Andrew Or committedDec 4, 2014 Configuration menu - View commit details
-
Copy full SHA for f9e1f89 - Browse repository at this point
Copy the full SHA f9e1f89View commit details -
[SPARK-4685] Include all spark.ml and spark.mllib packages in JavaDoc…
…'s MLlib group This is #3554 from Lewuathe except that I put both `spark.ml` and `spark.mllib` in the group 'MLlib`. Closes #3554 jkbradley Author: lewuathe <lewuathe@me.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3598 from mengxr/Lewuathe-modify-javadoc-setting and squashes the following commits: 184609a [Xiangrui Meng] merge spark.ml and spark.mllib into the same group in javadoc f7535e6 [lewuathe] [SPARK-4685] Update JavaDoc settings to include spark.ml and all spark.mllib subpackages in the right sections (cherry picked from commit 20bfea4) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 2605acb - Browse repository at this point
Copy the full SHA 2605acbView commit details -
[SQL] Minor: Avoid calling Seq#size in a loop
Just found this instance while doing some jstack-based profiling of a Spark SQL job. It is very unlikely that this is causing much of a perf issue anywhere, but it is unnecessarily suboptimal. Author: Aaron Davidson <aaron@databricks.com> Closes #3593 from aarondav/seq-opt and squashes the following commits: 962cdfc [Aaron Davidson] [SQL] Minor: Avoid calling Seq#size in a loop (cherry picked from commit c6c7165) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for dec838b - Browse repository at this point
Copy the full SHA dec838bView commit details -
[docs] Fix outdated comment in tuning guide
When you use the SPARK_JAVA_OPTS env variable, Spark complains: ``` SPARK_JAVA_OPTS was detected (set to ' -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps '). This is deprecated in Spark 1.0+. Please instead use: - ./spark-submit with conf/spark-defaults.conf to set defaults for an application - ./spark-submit with --driver-java-options to set -X options for a driver - spark.executor.extraJavaOptions to set -X options for executors - SPARK_DAEMON_JAVA_OPTS to set java options for standalone daemons (master or worker) ``` This updates the docs to redirect the user to the relevant part of the configuration docs. CC: mengxr but please CC someone else as needed Author: Joseph K. Bradley <joseph@databricks.com> Closes #3592 from jkbradley/tuning-doc and squashes the following commits: 0760ce1 [Joseph K. Bradley] fixed outdated comment in tuning guide (cherry picked from commit 529439b) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for bf720ef - Browse repository at this point
Copy the full SHA bf720efView commit details -
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes
Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works. (cherry picked from commit 469a6e5) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 266a814 - Browse repository at this point
Copy the full SHA 266a814View commit details -
[FIX][DOC] Fix broken links in ml-guide.md
and some minor changes in ScalaDoc. Author: Xiangrui Meng <meng@databricks.com> Closes #3601 from mengxr/SPARK-4575-fix and squashes the following commits: c559768 [Xiangrui Meng] minor code update ce94da8 [Xiangrui Meng] Java Bean -> JavaBean 0b5c182 [Xiangrui Meng] fix links in ml-guide (cherry picked from commit 7e758d7) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 34fdca0 - Browse repository at this point
Copy the full SHA 34fdca0View commit details -
[SPARK-4683][SQL] Add a beeline.cmd to run on Windows
Tested locally with a Win7 VM. Connected to a Spark SQL Thrift server instance running on Mac OS X with the following command line: ``` bin\beeline.cmd -u jdbc:hive2://10.0.2.2:10000 -n lian ``` <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3599) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3599 from liancheng/beeline.cmd and squashes the following commits: 79092e7 [Cheng Lian] Windows script for BeeLine (cherry picked from commit 28c7aca) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 2fbe488 - Browse repository at this point
Copy the full SHA 2fbe488View commit details -
Revert "HOTFIX: Rolling back incorrect version change"
This reverts commit 3a4609e.
Configuration menu - View commit details
-
Copy full SHA for 2c6e287 - Browse repository at this point
Copy the full SHA 2c6e287View commit details -
Revert "Preparing development version 1.2.1-SNAPSHOT"
This reverts commit 00316cc.
Configuration menu - View commit details
-
Copy full SHA for 701019b - Browse repository at this point
Copy the full SHA 701019bView commit details -
Revert "Preparing Spark release v1.2.0-rc1"
This reverts commit 1056e9e.
Configuration menu - View commit details
-
Copy full SHA for 078894c - Browse repository at this point
Copy the full SHA 078894cView commit details -
[SPARK-4253] Ignore spark.driver.host in yarn-cluster and standalone-…
…cluster modes In yarn-cluster and standalone-cluster modes, we don't know where driver will run until it is launched. If the `spark.driver.host` property is set on the submitting machine and propagated to the driver through SparkConf then this will lead to errors when the driver launches. This patch fixes this issue by dropping the `spark.driver.host` property in SparkSubmit when running in a cluster deploy mode. Author: WangTaoTheTonic <barneystinson@aliyun.com> Author: WangTao <barneystinson@aliyun.com> Closes #3112 from WangTaoTheTonic/SPARK4253 and squashes the following commits: ed1a25c [WangTaoTheTonic] revert unrelated formatting issue 02c4e49 [WangTao] add comment 32a3f3f [WangTaoTheTonic] ingore it in SparkSubmit instead of SparkContext 667cf24 [WangTaoTheTonic] document fix ff8d5f7 [WangTaoTheTonic] also ignore it in standalone cluster mode 2286e6b [WangTao] ignore spark.driver.host in yarn-cluster mode (cherry picked from commit 8106b1e) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for d9aee07 - Browse repository at this point
Copy the full SHA d9aee07View commit details -
[HOTFIX] Fixing two issues with the release script.
1. The version replacement was still producing some false changes. 2. Uploads to the staging repo specifically. Author: Patrick Wendell <pwendell@gmail.com> Closes #3608 from pwendell/release-script and squashes the following commits: 3c63294 [Patrick Wendell] Fixing two issues with the release script: (cherry picked from commit 8dae26f) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for ead01b6 - Browse repository at this point
Copy the full SHA ead01b6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2b72c56 - Browse repository at this point
Copy the full SHA 2b72c56View commit details -
Configuration menu - View commit details
-
Copy full SHA for bc05df8 - Browse repository at this point
Copy the full SHA bc05df8View commit details -
[SPARK-4745] Fix get_existing_cluster() function with multiple securi…
…ty groups The current get_existing_cluster() function would only find an instance belonged to a cluster if the instance's security groups == cluster_name + "-master" (or "-slaves"). This fix allows for multiple security groups by checking if the cluster_name + "-master" security group is in the list of groups for a particular instance. Author: alexdebrie <alexdebrie1@gmail.com> Closes #3596 from alexdebrie/master and squashes the following commits: 9d51232 [alexdebrie] Fix get_existing_cluster() function with multiple security groups (cherry picked from commit 794f3ae) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for a00d0aa - Browse repository at this point
Copy the full SHA a00d0aaView commit details -
[SPARK-4459] Change groupBy type parameter from K to U
Please see https://issues.apache.org/jira/browse/SPARK-4459 Author: Saldanha <saldaal1@phusca-l24858.wlan.na.novartis.net> Closes #3327 from alokito/master and squashes the following commits: 54b1095 [Saldanha] [SPARK-4459] changed type parameter for keyBy from K to U d5f73c3 [Saldanha] [SPARK-4459] added keyBy test 316ad77 [Saldanha] SPARK-4459 changed type parameter for groupBy from K to U. 62ddd4b [Saldanha] SPARK-4459 added failing unit test (cherry picked from commit 743a889) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 0d159de - Browse repository at this point
Copy the full SHA 0d159deView commit details -
[SPARK-4652][DOCS] Add docs about spark-git-repo option
There might be some cases when WIPS spark version need to be run on EC2 cluster. In order to setup this type of cluster more easily, add --spark-git-repo option description to ec2 documentation. Author: lewuathe <lewuathe@me.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #3513 from Lewuathe/doc-for-development-spark-cluster and squashes the following commits: 6dae8ee [lewuathe] Wrap consistent with other descriptions cfaf9be [lewuathe] Add docs about spark-git-repo option (Editing / cleanup by Josh Rosen) (cherry picked from commit ab8177d) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for f5c5647 - Browse repository at this point
Copy the full SHA f5c5647View commit details
Commits on Dec 5, 2014
-
[SPARK-4421] Wrong link in spark-standalone.html
Modified the link of building Spark. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Closes #3279 from tsudukim/feature/SPARK-4421 and squashes the following commits: 56e31c1 [Masayoshi TSUZUKI] Modified the link of building Spark. (cherry picked from commit ddfc09c) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for b905e11 - Browse repository at this point
Copy the full SHA b905e11View commit details -
Configuration menu - View commit details
-
Copy full SHA for 63b1bc1 - Browse repository at this point
Copy the full SHA 63b1bc1View commit details -
[SPARK-4464] Description about configuration options need to be modif…
…ied in docs. Added description about -h and -host. Modified description about -i and -ip which are now deprecated. Added description about --properties-file. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Closes #3329 from tsudukim/feature/SPARK-4464 and squashes the following commits: 6c07caf [Masayoshi TSUZUKI] [SPARK-4464] Description about configuration options need to be modified in docs. (cherry picked from commit ca37903) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 6c43631 - Browse repository at this point
Copy the full SHA 6c43631View commit details -
Revert "[HOT FIX] [YARN] Check whether
/lib
exists before listing i……ts files" This reverts commit 38cb2c3.
Andrew Or committedDec 5, 2014 Configuration menu - View commit details
-
Copy full SHA for 325babe - Browse repository at this point
Copy the full SHA 325babeView commit details -
Revert "SPARK-2624 add datanucleus jars to the container in yarn-clus…
…ter" This reverts commit a975dc3.
Andrew Or committedDec 5, 2014 Configuration menu - View commit details
-
Copy full SHA for a8d8077 - Browse repository at this point
Copy the full SHA a8d8077View commit details -
[SPARK-4753][SQL] Use catalyst for partition pruning in newParquet.
Author: Michael Armbrust <michael@databricks.com> Closes #3613 from marmbrus/parquetPartitionPruning and squashes the following commits: 4f138f8 [Michael Armbrust] Use catalyst for partition pruning in newParquet. (cherry picked from commit f5801e8) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for d12ea49 - Browse repository at this point
Copy the full SHA d12ea49View commit details -
[SPARK-4761][SQL] Enables Kryo by default in Spark SQL Thrift server
Enables Kryo and disables reference tracking by default in Spark SQL Thrift server. Configurations explicitly defined by users in `spark-defaults.conf` are respected (the Thrift server is started by `spark-submit`, which handles configuration properties properly). <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3621) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3621 from liancheng/kryo-by-default and squashes the following commits: 70c2775 [Cheng Lian] Enables Kryo by default in Spark SQL Thrift server (cherry picked from commit 6f61e1f) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for e8d8077 - Browse repository at this point
Copy the full SHA e8d8077View commit details -
Configuration menu - View commit details
-
Copy full SHA for 11446a6 - Browse repository at this point
Copy the full SHA 11446a6View commit details
Commits on Dec 6, 2014
-
[SPARK-3623][GraphX] GraphX should support the checkpoint operation
Author: GuoQiang Li <witgo@qq.com> Closes #2631 from witgo/SPARK-3623 and squashes the following commits: a70c500 [GuoQiang Li] Remove java related 4d1e249 [GuoQiang Li] Add comments e682724 [GuoQiang Li] Graph should support the checkpoint operation (cherry picked from commit e895e0c) Signed-off-by: Ankur Dave <ankurdave@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 27d9f13 - Browse repository at this point
Copy the full SHA 27d9f13View commit details
Commits on Dec 8, 2014
-
[SPARK-4646] Replace Scala.util.Sorting.quickSort with Sorter(TimSort…
…) in Spark This patch just replaces a native quick sorter with Sorter(TimSort) in Spark. It could get performance gains by ~8% in my quick experiments. Author: Takeshi Yamamuro <linguin.m.s@gmail.com> Closes #3507 from maropu/TimSortInEdgePartitionBuilderSpike and squashes the following commits: 8d4e5d2 [Takeshi Yamamuro] Remove a wildcard import 3527e00 [Takeshi Yamamuro] Replace Scala.util.Sorting.quickSort with Sorter(TimSort) in Spark (cherry picked from commit 2e6b736) Signed-off-by: Ankur Dave <ankurdave@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for a4ae7c8 - Browse repository at this point
Copy the full SHA a4ae7c8View commit details -
[SPARK-4620] Add unpersist in Graph and GraphImpl
Add an IF to uncache both vertices and edges of Graph/GraphImpl. This IF is useful when iterative graph operations build a new graph in each iteration, and the vertices and edges of previous iterations are no longer needed for following iterations. Author: Takeshi Yamamuro <linguin.m.s@gmail.com> This patch had conflicts when merged, resolved by Committer: Ankur Dave <ankurdave@gmail.com> Closes #3476 from maropu/UnpersistInGraphSpike and squashes the following commits: 77a006a [Takeshi Yamamuro] Add unpersist in Graph and GraphImpl (cherry picked from commit 8817fc7) Signed-off-by: Ankur Dave <ankurdave@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 6b9e8b0 - Browse repository at this point
Copy the full SHA 6b9e8b0View commit details -
[SPARK-4774] [SQL] Makes HiveFromSpark more portable
HiveFromSpark read the kv1.txt file from SPARK_HOME/examples/src/main/resources/kv1.txt which assumed you had a source tree checked out. Now we copy the kv1.txt file to a temporary file and delete it when the jvm shuts down. This allows us to run this example outside of a spark source tree. Author: Kostas Sakellis <kostas@cloudera.com> Closes #3628 from ksakellis/kostas-spark-4774 and squashes the following commits: 6770f83 [Kostas Sakellis] [SPARK-4774] [SQL] Makes HiveFromSpark more portable (cherry picked from commit d6a972b) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9ed5641 - Browse repository at this point
Copy the full SHA 9ed5641View commit details
Commits on Dec 9, 2014
-
SPARK-4770. [DOC] [YARN] spark.scheduler.minRegisteredResourcesRatio …
…doc... ...umented default is incorrect for YARN Author: Sandy Ryza <sandy@cloudera.com> Closes #3624 from sryza/sandy-spark-4770 and squashes the following commits: bd81a3a [Sandy Ryza] SPARK-4770. [DOC] [YARN] spark.scheduler.minRegisteredResourcesRatio documented default is incorrect for YARN (cherry picked from commit cda94d1) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for f416032 - Browse repository at this point
Copy the full SHA f416032View commit details -
[SPARK-4769] [SQL] CTAS does not work when reading from temporary tables
This is the code refactor and follow ups for #2570 Author: Cheng Hao <hao.cheng@intel.com> Closes #3336 from chenghao-intel/createtbl and squashes the following commits: 3563142 [Cheng Hao] remove the unused variable e215187 [Cheng Hao] eliminate the compiling warning 4f97f14 [Cheng Hao] fix bug in unittest 5d58812 [Cheng Hao] revert the API changes b85b620 [Cheng Hao] fix the regression of temp tabl not found in CTAS (cherry picked from commit 51b1fe1) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 31a6d4f - Browse repository at this point
Copy the full SHA 31a6d4fView commit details -
[SPARK-4785][SQL] Initilize Hive UDFs on the driver and serialize the…
…m with a wrapper Different from Hive 0.12.0, in Hive 0.13.1 UDF/UDAF/UDTF (aka Hive function) objects should only be initialized once on the driver side and then serialized to executors. However, not all function objects are serializable (e.g. GenericUDF doesn't implement Serializable). Hive 0.13.1 solves this issue with Kryo or XML serializer. Several utility ser/de methods are provided in class o.a.h.h.q.e.Utilities for this purpose. In this PR we chose Kryo for efficiency. The Kryo serializer used here is created in Hive. Spark Kryo serializer wasn't used because there's no available SparkConf instance. Author: Cheng Hao <hao.cheng@intel.com> Author: Cheng Lian <lian@databricks.com> Closes #3640 from chenghao-intel/udf_serde and squashes the following commits: 8e13756 [Cheng Hao] Update the comment 74466a3 [Cheng Hao] refactor as feedbacks 396c0e1 [Cheng Hao] avoid Simple UDF to be serialized e9c3212 [Cheng Hao] update the comment 19cbd46 [Cheng Hao] support udf instance ser/de after initialization (cherry picked from commit 383c555) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for e686742 - Browse repository at this point
Copy the full SHA e686742View commit details -
[SPARK-4765] Make GC time always shown in UI.
This commit removes the GC time for each task from the set of optional, additional metrics, and instead always shows it for each task. cc pwendell Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #3622 from kayousterhout/gc_time and squashes the following commits: 15ac242 [Kay Ousterhout] Make TaskDetailsClassNames private[spark] e71d893 [Kay Ousterhout] [SPARK-4765] Make GC time always shown in UI. (cherry picked from commit 1f51106) Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 5a3a3cc - Browse repository at this point
Copy the full SHA 5a3a3ccView commit details
Commits on Dec 10, 2014
-
SPARK-4567. Make SparkJobInfo and SparkStageInfo serializable
Author: Sandy Ryza <sandy@cloudera.com> Closes #3426 from sryza/sandy-spark-4567 and squashes the following commits: cb4b8d2 [Sandy Ryza] SPARK-4567. Make SparkJobInfo and SparkStageInfo serializable (cherry picked from commit 5e4c06f) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 51da2c5 - Browse repository at this point
Copy the full SHA 51da2c5View commit details -
SPARK-4805 [CORE] BlockTransferMessage.toByteArray() trips assertion
Allocate enough room for type byte as well as message, to avoid tripping assertion about capacity of the buffer Author: Sean Owen <sowen@cloudera.com> Closes #3650 from srowen/SPARK-4805 and squashes the following commits: 9e1d502 [Sean Owen] Allocate enough room for type byte as well as message, to avoid tripping assertion about capacity of the buffer (cherry picked from commit d8f84f2) Signed-off-by: Aaron Davidson <aaron@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for b0d64e5 - Browse repository at this point
Copy the full SHA b0d64e5View commit details -
[SPARK-4740] Create multiple concurrent connections between two peer …
…nodes in Netty. It's been reported that when the number of disks is large and the number of nodes is small, Netty network throughput is low compared with NIO. We suspect the problem is that only a small number of disks are utilized to serve shuffle files at any given point, due to connection reuse. This patch adds a new config parameter to specify the number of concurrent connections between two peer nodes, default to 2. Author: Reynold Xin <rxin@databricks.com> Closes #3625 from rxin/SPARK-4740 and squashes the following commits: ad4241a [Reynold Xin] Updated javadoc. f33c72b [Reynold Xin] Code review feedback. 0fefabb [Reynold Xin] Use double check in synchronization. 41dfcb2 [Reynold Xin] Added test case. 9076b4a [Reynold Xin] Fixed two NPEs. 3e1306c [Reynold Xin] Minor style fix. 4f21673 [Reynold Xin] [SPARK-4740] Create multiple concurrent connections between two peer nodes in Netty. (cherry picked from commit 2b9b726) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 441ec34 - Browse repository at this point
Copy the full SHA 441ec34View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5e5d8f4 - Browse repository at this point
Copy the full SHA 5e5d8f4View commit details -
[Minor] Use <sup> tag for help icon in web UI page header
This small commit makes the `(?)` web UI help link into a superscript, which should address feedback that the current design makes it look like an error occurred or like information is missing. Before: ![image](https://cloud.githubusercontent.com/assets/50748/5370611/a3ed0034-7fd9-11e4-870f-05bd9faad5b9.png) After: ![image](https://cloud.githubusercontent.com/assets/50748/5370602/6c5ca8d6-7fd9-11e4-8d1a-568d71290aa7.png) Author: Josh Rosen <joshrosen@databricks.com> Closes #3659 from JoshRosen/webui-help-sup and squashes the following commits: bd72899 [Josh Rosen] Use <sup> tag for help icon in web UI page header. (cherry picked from commit f79c1cf) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for ff6f59b - Browse repository at this point
Copy the full SHA ff6f59bView commit details -
Revert "Preparing development version 1.2.1-SNAPSHOT"
This reverts commit bc05df8.
Configuration menu - View commit details
-
Copy full SHA for a4d4a97 - Browse repository at this point
Copy the full SHA a4d4a97View commit details -
Revert "Preparing Spark release v1.2.0-rc2"
This reverts commit 2b72c56.
Configuration menu - View commit details
-
Copy full SHA for e4f20bd - Browse repository at this point
Copy the full SHA e4f20bdView commit details -
Configuration menu - View commit details
-
Copy full SHA for a428c44 - Browse repository at this point
Copy the full SHA a428c44View commit details -
Configuration menu - View commit details
-
Copy full SHA for d70c729 - Browse repository at this point
Copy the full SHA d70c729View commit details -
[SPARK-4771][Docs] Document standalone cluster supervise mode
tdas looks like streaming already refers to the supervise mode. The link from there is broken though. Author: Andrew Or <andrew@databricks.com> Closes #3627 from andrewor14/document-supervise and squashes the following commits: 9ca0908 [Andrew Or] Wording changes 2b55ed2 [Andrew Or] Document standalone cluster supervise mode
Andrew Or committedDec 10, 2014 Configuration menu - View commit details
-
Copy full SHA for 1da1937 - Browse repository at this point
Copy the full SHA 1da1937View commit details -
SPARK-3526 Add section about data locality to the tuning guide
cc kayousterhout I have a few outstanding questions from compiling this documentation: - What's the difference between NO_PREF and ANY? I understand the implications of the ordering but don't know what an example of each would be - Why is NO_PREF ahead of RACK_LOCAL? I would think it'd be better to schedule rack-local tasks ahead of no preference if you could only do one or the other. Is the idea to wait longer and hope for the rack-local tasks to turn into node-local or better? - Will there be a datacenter-local locality level in the future? Apache Cassandra for example has this level Author: Andrew Ash <andrew@andrewash.com> Closes #2519 from ash211/SPARK-3526 and squashes the following commits: 44cff28 [Andrew Ash] Link to spark.locality parameters rather than copying the list 6d5d966 [Andrew Ash] Stay focused on Spark, no astronaut architecture mumbo-jumbo 20e0e31 [Andrew Ash] SPARK-3526 Add section about data locality to the tuning guide (cherry picked from commit 652b781) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 1eb3ec5 - Browse repository at this point
Copy the full SHA 1eb3ec5View commit details
Commits on Dec 11, 2014
-
[SPARK-4806] Streaming doc update for 1.2
Important updates to the streaming programming guide - Make the fault-tolerance properties easier to understand, with information about write ahead logs - Update the information about deploying the spark streaming app with information about Driver HA - Update Receiver guide to discuss reliable vs unreliable receivers. Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Josh Rosen <joshrosen@databricks.com> Author: Josh Rosen <rosenville@gmail.com> Closes #3653 from tdas/streaming-doc-update-1.2 and squashes the following commits: f53154a [Tathagata Das] Addressed Josh's comments. ce299e4 [Tathagata Das] Minor update. ca19078 [Tathagata Das] Minor change f746951 [Tathagata Das] Mentioned performance problem with WAL 7787209 [Tathagata Das] Merge branch 'streaming-doc-update-1.2' of github.com:tdas/spark into streaming-doc-update-1.2 2184729 [Tathagata Das] Updated Kafka and Flume guides with reliability information. 2f3178c [Tathagata Das] Added more information about writing reliable receivers in the custom receiver guide. 91aa5aa [Tathagata Das] Improved API Docs menu 5707581 [Tathagata Das] Added Pythn API badge b9c8c24 [Tathagata Das] Merge pull request #26 from JoshRosen/streaming-programming-guide b8c8382 [Josh Rosen] minor fixes a4ef126 [Josh Rosen] Restructure parts of the fault-tolerance section to read a bit nicer when skipping over the headings 65f66cd [Josh Rosen] Fix broken link to fault-tolerance semantics section. f015397 [Josh Rosen] Minor grammar / pluralization fixes. 3019f3a [Josh Rosen] Fix minor Markdown formatting issues aa8bb87 [Tathagata Das] Small update. 195852c [Tathagata Das] Updated based on Josh's comments, updated receiver reliability and deploying section, and also updated configuration. 17b99fb [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-doc-update-1.2 a0217c0 [Tathagata Das] Changed Deploying menu layout 67fcffc [Tathagata Das] Added cluster mode + supervise example to submitting application guide. e45453b [Tathagata Das] Update streaming guide, added deploying section. 192c7a7 [Tathagata Das] Added more info about Python API, and rewrote the checkpointing section. (cherry picked from commit b004150) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for c3b0713 - Browse repository at this point
Copy the full SHA c3b0713View commit details
Commits on Dec 12, 2014
-
[SPARK-4825] [SQL] CTAS fails to resolve when created using saveAsTable
Fix bug when query like: ``` test("save join to table") { val testData = sparkContext.parallelize(1 to 10).map(i => TestData(i, i.toString)) sql("CREATE TABLE test1 (key INT, value STRING)") testData.insertInto("test1") sql("CREATE TABLE test2 (key INT, value STRING)") testData.insertInto("test2") testData.insertInto("test2") sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").saveAsTable("test") checkAnswer( table("test"), sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").collect().toSeq) } ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #3673 from chenghao-intel/spark_4825 and squashes the following commits: e8cbd56 [Cheng Hao] alternate the pattern matching order for logical plan:CTAS e004895 [Cheng Hao] fix bug (cherry picked from commit 0abbff2) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for c82e99d - Browse repository at this point
Copy the full SHA c82e99dView commit details
Commits on Dec 14, 2014
-
fixed spelling errors in documentation
changed "form" to "from" in 3 documentation entries for Kafka integration Author: Peter Klipfel <peter@klipfel.me> Closes #3691 from peterklipfel/master and squashes the following commits: 0fe7fc5 [Peter Klipfel] fixed spelling errors in documentation (cherry picked from commit 2a2983f) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 6eec4bc - Browse repository at this point
Copy the full SHA 6eec4bcView commit details
Commits on Dec 15, 2014
-
Configuration menu - View commit details
-
Copy full SHA for 2ec78a1 - Browse repository at this point
Copy the full SHA 2ec78a1View commit details -
[SPARK-4826] Fix generation of temp file names in WAL tests
This PR should fix SPARK-4826, an issue where a bug in how we generate temp. file names was causing spurious test failures in the write ahead log suites. Closes #3695. Closes #3701. Author: Josh Rosen <joshrosen@databricks.com> Closes #3704 from JoshRosen/SPARK-4826 and squashes the following commits: f2307f5 [Josh Rosen] Use Spark Utils class for directory creation/deletion a693ddb [Josh Rosen] remove unused Random import b275e41 [Josh Rosen] Move creation of temp. dir to beforeEach/afterEach. 9362919 [Josh Rosen] [SPARK-4826] Fix bug in generation of temp file names. in WAL suites. 86c1944 [Josh Rosen] Revert "HOTFIX: Disabling failing block manager test" (cherry picked from commit f6b8591) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for c5a9ae6 - Browse repository at this point
Copy the full SHA c5a9ae6View commit details -
[SPARK-4668] Fix some documentation typos.
Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #3523 from ryan-williams/tweaks and squashes the following commits: d2eddaa [Ryan Williams] code review feedback ce27fc1 [Ryan Williams] CoGroupedRDD comment nit c6cfad9 [Ryan Williams] remove unnecessary if statement b74ea35 [Ryan Williams] comment fix b0221f0 [Ryan Williams] fix a gendered pronoun c71ffed [Ryan Williams] use names on a few boolean parameters 89954aa [Ryan Williams] clarify some comments in {Security,Shuffle}Manager e465dac [Ryan Williams] Saved building-spark.md with Dillinger.io 83e8358 [Ryan Williams] fix pom.xml typo dc4662b [Ryan Williams] typo fixes in tuning.md, configuration.md (cherry picked from commit 8176b7a) Signed-off-by: Patrick Wendell <pwendell@gmail.com> Conflicts: pom.xml
Configuration menu - View commit details
-
Copy full SHA for ec19175 - Browse repository at this point
Copy the full SHA ec19175View commit details
Commits on Dec 16, 2014
-
[Minor][Core] fix comments in MapOutputTracker
Using driver and executor in the comments of ```MapOutputTracker``` is more clear. Author: wangfei <wangfei1@huawei.com> Closes #3700 from scwf/commentFix and squashes the following commits: aa68524 [wangfei] master and worker should be driver and executor (cherry picked from commit 5c24759) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for f1f27ec - Browse repository at this point
Copy the full SHA f1f27ecView commit details -
SPARK-4814 [CORE] Enable assertions in SBT, Maven tests / AssertionEr…
…ror from Hive's LazyBinaryInteger This enables assertions for the Maven and SBT build, but overrides the Hive module to not enable assertions. Author: Sean Owen <sowen@cloudera.com> Closes #3692 from srowen/SPARK-4814 and squashes the following commits: caca704 [Sean Owen] Disable assertions just for Hive f71e783 [Sean Owen] Enable assertions for SBT and Maven build (cherry picked from commit 81112e4) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 6bd8a96 - Browse repository at this point
Copy the full SHA 6bd8a96View commit details -
[DOCS][SQL] Add a Note on jsonFile having separate JSON objects per line
* This commit hopes to avoid the confusion I faced when trying to submit a regular, valid multi-line JSON file, also see http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html Author: Peter Vandenabeele <peter@vandenabeele.com> Closes #3517 from petervandenabeele/pv-docs-note-on-jsonFile-format/01 and squashes the following commits: 1f98e52 [Peter Vandenabeele] Revert to people.json and simple Note text 6b6e062 [Peter Vandenabeele] Change the "JSON" connotation to "txt" fca7dfb [Peter Vandenabeele] Add a Note on jsonFile having separate JSON objects per line (cherry picked from commit 1a9e35e) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 4f9916f - Browse repository at this point
Copy the full SHA 4f9916fView commit details -
[SPARK-4847][SQL]Fix "extraStrategies cannot take effect in SQLContex…
…t" issue Author: jerryshao <saisai.shao@intel.com> Closes #3698 from jerryshao/SPARK-4847 and squashes the following commits: 4741130 [jerryshao] Make later added extraStrategies effect when calling strategies (cherry picked from commit dc8280d) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 1b6fc23 - Browse repository at this point
Copy the full SHA 1b6fc23View commit details
Commits on Dec 17, 2014
-
[Release] Major improvements to generate contributors script
This commit introduces several major improvements to the script that generates the contributors list for release notes, notably: (1) Use release tags instead of a range of commits. Across branches, commits are not actually strictly two-dimensional, and so it is not sufficient to specify a start hash and an end hash. Otherwise, we end up counting commits that were already merged in an older branch. (2) Match PR numbers in addition to commit hashes. This is related to the first point in that if a PR is already merged in an older minor release tag, it should be filtered out here. This requires us to do some intelligent regex parsing on the commit description in addition to just relying on the GitHub API. (3) Relax author validity check. The old code fails on a name that has many middle names, for instance. The test was just too strict. (4) Use GitHub authentication. This allows us to make far more requests through the GitHub API than before (5000 as opposed to 60 per hour). (5) Translate from Github username, not commit author name. This is important because the commit author name is not always configured correctly by the user. For instance, the username "falaki" used to resolve to just "Hossein", which was treated as a github username and translated to something else that is completely arbitrary. (6) Add an option to use the untranslated name. If there is not a satisfactory candidate to replace the untranslated name with, at least allow the user to not translate it.
Andrew Or committedDec 17, 2014 Configuration menu - View commit details
-
Copy full SHA for 0fb0047 - Browse repository at this point
Copy the full SHA 0fb0047View commit details -
[Release] Cache known author translations locally
This bypasses unnecessary calls to the Github and JIRA API. Additionally, having a local cache allows us to remember names that we had to manually discover ourselves.
Andrew Or committedDec 17, 2014 Configuration menu - View commit details
-
Copy full SHA for 8a69ed3 - Browse repository at this point
Copy the full SHA 8a69ed3View commit details -
[Release] Update contributors list format and sort it
Additionally, we now warn the user when a duplicate author name arises, in which case he/she needs to resolve it manually.
Andrew Or committedDec 17, 2014 Configuration menu - View commit details
-
Copy full SHA for beb75ac - Browse repository at this point
Copy the full SHA beb75acView commit details -
Configuration menu - View commit details
-
Copy full SHA for b5919d1 - Browse repository at this point
Copy the full SHA b5919d1View commit details -
[SPARK-4595][Core] Fix MetricsServlet not work issue
`MetricsServlet` handler should be added to the web UI after initialized by `MetricsSystem`, otherwise servlet handler cannot be attached. Author: Saisai Shao <saisai.shao@intel.com> Author: Josh Rosen <joshrosen@databricks.com> Author: jerryshao <saisai.shao@intel.com> Closes #3444 from jerryshao/SPARK-4595 and squashes the following commits: 434d17e [Saisai Shao] Merge pull request #10 from JoshRosen/metrics-system-cleanup 87a2292 [Josh Rosen] Guard against misuse of MetricsSystem methods. f779fe0 [jerryshao] Fix MetricsServlet not work issue (cherry picked from commit cf50631) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 2f00a29 - Browse repository at this point
Copy the full SHA 2f00a29View commit details -
[SPARK-4764] Ensure that files are fetched atomically
tempFile is created in the same directory than targetFile, so that the move from tempFile to targetFile is always atomic Author: Christophe Préaud <christophe.preaud@kelkoo.com> Closes #2855 from preaudc/master and squashes the following commits: 9ba89ca [Christophe Préaud] Ensure that files are fetched atomically 54419ae [Christophe Préaud] Merge remote-tracking branch 'upstream/master' c6a5590 [Christophe Préaud] Revert commit 8ea871f 7456a33 [Christophe Préaud] Merge remote-tracking branch 'upstream/master' 8ea871f [Christophe Préaud] Ensure that files are fetched atomically (cherry picked from commit ab2abcb) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for e1d839e - Browse repository at this point
Copy the full SHA e1d839eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7ecf30e - Browse repository at this point
Copy the full SHA 7ecf30eView commit details -
SPARK-3926 [CORE] Reopened: result of JavaRDD collectAsMap() is not s…
…erializable My original 'fix' didn't fix at all. Now, there's a unit test to check whether it works. Of the two options to really fix it -- copy the `Map` to a `java.util.HashMap`, or copy and modify Scala's implementation in `Wrappers.MapWrapper`, I went with the latter. Author: Sean Owen <sowen@cloudera.com> Closes #3587 from srowen/SPARK-3926 and squashes the following commits: 8586bb9 [Sean Owen] Remove unneeded no-arg constructor, and add additional note about copied code in LICENSE 7bb0e66 [Sean Owen] Make SerializableMapWrapper actually serialize, and add unit test (cherry picked from commit e829bfa) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 26dfac6 - Browse repository at this point
Copy the full SHA 26dfac6View commit details -
[SPARK-4691][shuffle] Restructure a few lines in shuffle code
In HashShuffleReader.scala and HashShuffleWriter.scala, no need to judge "dep.aggregator.isEmpty" again as this is judged by "dep.aggregator.isDefined" In SortShuffleWriter.scala, "dep.aggregator.isEmpty" is better than "!dep.aggregator.isDefined" ? Author: maji2014 <maji3@asiainfo.com> Closes #3553 from maji2014/spark-4691 and squashes the following commits: bf7b14d [maji2014] change a elegant way for SortShuffleWriter.scala 10d0cf0 [maji2014] change a elegant way d8f52dc [maji2014] code optimization for judgement (cherry picked from commit b310744) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 51081e4 - Browse repository at this point
Copy the full SHA 51081e4View commit details -
[SPARK-4714] BlockManager.dropFromMemory() should check whether block…
… has been removed after synchronizing on BlockInfo instance. After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed. The three methods have the same logic to get the `info` lock: ``` info = blockInfo.get(id) if (info != null) { info.synchronized { // do something } } ``` So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first. The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed. But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks. This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread. Author: hushan[胡珊] <hushan@xiaomi.com> Closes #3574 from suyanNone/refine-block-concurrency and squashes the following commits: edb989d [hushan[胡珊]] Refine code style and comments position 55fa4ba [hushan[胡珊]] refine code e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn (cherry picked from commit 30dca92) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 0ebbccb - Browse repository at this point
Copy the full SHA 0ebbccbView commit details -
[SPARK-4772] Clear local copies of accumulators as soon as we're done…
… with them Accumulators keep thread-local copies of themselves. These copies were only cleared at the beginning of a task. This meant that (a) the memory they used was tied up until the next task ran on that thread, and (b) if a thread died, the memory it had used for accumulators was locked up forever on that worker. This PR clears the thread-local copies of accumulators at the end of each task, in the tasks finally block, to make sure they are cleaned up between tasks. It also stores them in a ThreadLocal object, so that if, for some reason, the thread dies, any memory they are using at the time should be freed up. Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com> Closes #3570 from nkronenfeld/Accumulator-Improvements and squashes the following commits: a581f3f [Nathan Kronenfeld] Change Accumulators to private[spark] instead of adding mima exclude to get around false positive in mima tests b6c2180 [Nathan Kronenfeld] Include MiMa exclude as per build error instructions - this version incompatibility should be irrelevent, as it will only surface if a master is talking to a worker running a different version of spark. 537baad [Nathan Kronenfeld] Fuller refactoring as intended, incorporating JR's suggestions for ThreadLocal localAccums, and keeping clear(), but also calling it in tasks' finally block, rather than just at the beginning of the task. 39a82f2 [Nathan Kronenfeld] Clear local copies of accumulators as soon as we're done with them (cherry picked from commit 94b377f) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for e635168 - Browse repository at this point
Copy the full SHA e635168View commit details -
SPARK-785 [CORE] ClosureCleaner not invoked on most PairRDDFunctions
This looked like perhaps a simple and important one. `combineByKey` looks like it should clean its arguments' closures, and that in turn covers apparently all remaining functions in `PairRDDFunctions` which delegate to it. Author: Sean Owen <sowen@cloudera.com> Closes #3690 from srowen/SPARK-785 and squashes the following commits: 8df68fe [Sean Owen] Clean context of most remaining functions in PairRDDFunctions, which ultimately call combineByKey (cherry picked from commit 2a28bc6) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 76c88c6 - Browse repository at this point
Copy the full SHA 76c88c6View commit details -
[SPARK-4841] fix zip with textFile()
UTF8Deserializer can not be used in BatchedSerializer, so always use PickleSerializer() when change batchSize in zip(). Also, if two RDD have the same batch size already, they did not need re-serialize any more. Author: Davies Liu <davies@databricks.com> Closes #3706 from davies/fix_4841 and squashes the following commits: 20ce3a3 [Davies Liu] fix bug in _reserialize() e3ebf7c [Davies Liu] add comment 379d2c8 [Davies Liu] fix zip with textFile() (cherry picked from commit c246b95) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 0429ec3 - Browse repository at this point
Copy the full SHA 0429ec3View commit details -
[SPARK-4821] [mllib] [python] [docs] Fix for pyspark.mllib.rand doc
+ small doc edit + include edit to make IntelliJ happy CC: davies mengxr Note to davies -- this does not fix the "WARNING: Literal block expected; none found." warnings since that seems to involve spacing which IntelliJ does not like. (Those warnings occur when generating the Python docs.) Author: Joseph K. Bradley <joseph@databricks.com> Closes #3669 from jkbradley/python-warnings and squashes the following commits: 4587868 [Joseph K. Bradley] fixed warning 8cb073c [Joseph K. Bradley] Updated based on davies recommendation c51eca4 [Joseph K. Bradley] Updated rst file for pyspark.mllib.rand doc. Small doc edit. Small include edit to make IntelliJ happy. (cherry picked from commit affc3f4) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for f305e7d - Browse repository at this point
Copy the full SHA f305e7dView commit details
Commits on Dec 18, 2014
-
Add mesos specific configurations into doc
Author: Timothy Chen <tnachen@gmail.com> Closes #3349 from tnachen/mesos_doc and squashes the following commits: 737ef49 [Timothy Chen] Add TOC 5ca546a [Timothy Chen] Update description around cores requested. 26283a5 [Timothy Chen] Add mesos specific configurations into doc (cherry picked from commit d9956f8) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 19efa5b - Browse repository at this point
Copy the full SHA 19efa5bView commit details -
Configuration menu - View commit details
-
Copy full SHA for ef5c236 - Browse repository at this point
Copy the full SHA ef5c236View commit details -
[SPARK-4880] remove spark.locality.wait in Analytics
spark.locality.wait set to 100000 in examples/graphx/Analytics.scala. Should be left to the user. Author: Ernest <earneyzxl@gmail.com> Closes #3730 from Earne/SPARK-4880 and squashes the following commits: d79ed04 [Ernest] remove spark.locality.wait in Analytics (cherry picked from commit a7ed6f3) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for e7f9dd5 - Browse repository at this point
Copy the full SHA e7f9dd5View commit details
Commits on Dec 19, 2014
-
[SPARK-4884]: Improve Partition docs
Rewording was based on this discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-td9804.html This is the associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-4884 Author: Madhu Siddalingaiah <madhu@madhu.com> Closes #3722 from msiddalingaiah/master and squashes the following commits: 79e679f [Madhu Siddalingaiah] [DOC]: improve documentation 51d14b9 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 38faca4 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again) 332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code> cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions (cherry picked from commit d5a596d) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 61c9b89 - Browse repository at this point
Copy the full SHA 61c9b89View commit details -
[SPARK-4837] NettyBlockTransferService should use spark.blockManager.…
…port config This is used in NioBlockTransferService here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/network/nio/NioBlockTransferService.scala#L66 Author: Aaron Davidson <aaron@databricks.com> Closes #3688 from aarondav/SPARK-4837 and squashes the following commits: ebd2007 [Aaron Davidson] [SPARK-4837] NettyBlockTransferService should use spark.blockManager.port config (cherry picked from commit 105293a) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 075b399 - Browse repository at this point
Copy the full SHA 075b399View commit details -
[SPARK-4754] Refactor SparkContext into ExecutorAllocationClient
This is such that the `ExecutorAllocationManager` does not take in the `SparkContext` with all of its dependencies as an argument. This prevents future developers of this class to tie down this class further with the `SparkContext`, which has really become quite a monstrous object. cc'ing pwendell who originally suggested this, and JoshRosen who may have thoughts about the trait mix-in style of `SparkContext`. Author: Andrew Or <andrew@databricks.com> Closes #3614 from andrewor14/dynamic-allocation-sc and squashes the following commits: 187070d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc 59baf6c [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc 347a348 [Andrew Or] Refactor SparkContext into ExecutorAllocationClient (cherry picked from commit 9804a75) Signed-off-by: Andrew Or <andrew@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala
Andrew Or committedDec 19, 2014 Configuration menu - View commit details
-
Copy full SHA for ca37639 - Browse repository at this point
Copy the full SHA ca37639View commit details -
SPARK-3428. TaskMetrics for running tasks is missing GC time metrics
Author: Sandy Ryza <sandy@cloudera.com> Closes #3684 from sryza/sandy-spark-3428 and squashes the following commits: cb827fe [Sandy Ryza] SPARK-3428. TaskMetrics for running tasks is missing GC time metrics (cherry picked from commit 283263f) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for fd7bb9d - Browse repository at this point
Copy the full SHA fd7bb9dView commit details -
[SPARK-4889] update history server example cmds
Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #3736 from ryan-williams/hist and squashes the following commits: 421d8ff [Ryan Williams] add another random typo fix 76d6a4c [Ryan Williams] remove hdfs example a2d0f82 [Ryan Williams] code review feedback 9ca7629 [Ryan Williams] [SPARK-4889] update history server example cmds (cherry picked from commit cdb2c64) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 6aa88cc - Browse repository at this point
Copy the full SHA 6aa88ccView commit details -
[SPARK-4896] don’t redundantly overwrite executor JAR deps
Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #2848 from ryan-williams/fetch-file and squashes the following commits: c14daff [Ryan Williams] Fix copy that was changed to a move inadvertently 8e39c16 [Ryan Williams] code review feedback 788ed41 [Ryan Williams] don’t redundantly overwrite executor JAR deps (cherry picked from commit 7981f96) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for f930fe8 - Browse repository at this point
Copy the full SHA f930fe8View commit details
Commits on Dec 20, 2014
-
Configuration menu - View commit details
-
Copy full SHA for 4da1039 - Browse repository at this point
Copy the full SHA 4da1039View commit details -
SPARK-2641: Passing num executors to spark arguments from properties …
…file Since we can set spark executor memory and executor cores using property file, we must also be allowed to set the executor instances. Author: Kanwaljit Singh <kanwaljit.singh@guavus.com> Closes #1657 from kjsingh/branch-1.0 and squashes the following commits: d8a5a12 [Kanwaljit Singh] SPARK-2641: Fixing how spark arguments are loaded from properties file for num executors Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
Kanwaljit Singh authored and Andrew Or committedDec 20, 2014 1Configuration menu - View commit details
-
Copy full SHA for a1a1361 - Browse repository at this point
Copy the full SHA a1a1361View commit details -
[SPARK-4140] Document dynamic allocation
Once the external shuffle service is also documented, the dynamic allocation section will link to it. Let me know if the whole dynamic allocation should be moved to its separate page; I personally think the organization might be cleaner that way. This patch builds on top of oza's work in #3689. aarondav pwendell Author: Andrew Or <andrew@databricks.com> Author: Tsuyoshi Ozawa <ozawa.tsuyoshi@gmail.com> Closes #3731 from andrewor14/document-dynamic-allocation and squashes the following commits: 1281447 [Andrew Or] Address a few comments b9843f2 [Andrew Or] Document the configs as well 246fb44 [Andrew Or] Merge branch 'SPARK-4839' of github.com:oza/spark into document-dynamic-allocation 8c64004 [Andrew Or] Add documentation for dynamic allocation (without configs) 6827b56 [Tsuyoshi Ozawa] Fixing a documentation of spark.dynamicAllocation.enabled. 53cff58 [Tsuyoshi Ozawa] Adding a documentation about dynamic resource allocation. (cherry picked from commit 15c03e1) Signed-off-by: Andrew Or <andrew@databricks.com>
Andrew Or committedDec 20, 2014 Configuration menu - View commit details
-
Copy full SHA for 96d5b00 - Browse repository at this point
Copy the full SHA 96d5b00View commit details -
[Minor] Build Failed: value defaultProperties not found
Mvn Build Failed: value defaultProperties not found .Maybe related to this pr: 1d64812 andrewor14 can you look at this problem? Author: huangzhaowei <carlmartinmax@gmail.com> Closes #3749 from SaintBacchus/Mvn-Build-Fail and squashes the following commits: 8e2917c [huangzhaowei] Build Failed: value defaultProperties not found (cherry picked from commit a764960) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 4346a2b - Browse repository at this point
Copy the full SHA 4346a2bView commit details
Commits on Dec 22, 2014
-
[SPARK-2075][Core] Make the compiler generate same bytes code for Had…
…oop 1.+ and Hadoop 2.+ `NullWritable` is a `Comparable` rather than `Comparable[NullWritable]` in Hadoop 1.+, so the compiler cannot find an implicit Ordering for it. It will generate different anonymous classes for `saveAsTextFile` in Hadoop 1.+ and Hadoop 2.+. Therefore, here we provide an Ordering for NullWritable so that the compiler will generate same codes. I used the following commands to confirm the generated byte codes are some. ``` mvn -Dhadoop.version=1.2.1 -DskipTests clean package -pl core -am javap -private -c -classpath core/target/scala-2.10/classes org.apache.spark.rdd.RDD > ~/hadoop1.txt mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package -pl core -am javap -private -c -classpath core/target/scala-2.10/classes org.apache.spark.rdd.RDD > ~/hadoop2.txt diff ~/hadoop1.txt ~/hadoop2.txt ``` However, the compiler will generate different codes for the classes which call methods of `JobContext/TaskAttemptContext`. `JobContext/TaskAttemptContext` is a class in Hadoop 1.+, and calling its method will use `invokevirtual`, while it's an interface in Hadoop 2.+, and will use `invokeinterface`. To fix it, we can use reflection to call `JobContext/TaskAttemptContext.getConfiguration`. Author: zsxwing <zsxwing@gmail.com> Closes #3740 from zsxwing/SPARK-2075 and squashes the following commits: 39d9df2 [zsxwing] Fix the code style e4ad8b5 [zsxwing] Use null for the implicit Ordering 734bac9 [zsxwing] Explicitly set the implicit parameters ca03559 [zsxwing] Use reflection to access JobContext/TaskAttemptContext.getConfiguration fa40db0 [zsxwing] Add an Ordering for NullWritable to make the compiler generate same byte codes for RDD (cherry picked from commit 6ee6aa7) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 665653d - Browse repository at this point
Copy the full SHA 665653dView commit details -
[SPARK-2075][Core] backport for branch-1.2
backport #3740 for branch-1.2 Author: zsxwing <zsxwing@gmail.com> Closes #3758 from zsxwing/SPARK-2075-branch-1.2 and squashes the following commits: b57d440 [zsxwing] SPARK-2075 backport for branch-1.2
Configuration menu - View commit details
-
Copy full SHA for b896963 - Browse repository at this point
Copy the full SHA b896963View commit details -
[SPARK-4915][YARN] Fix classname to be specified for external shuffle…
… service. Author: Tsuyoshi Ozawa <ozawa.tsuyoshi@lab.ntt.co.jp> Closes #3757 from oza/SPARK-4915 and squashes the following commits: 3b0d6d6 [Tsuyoshi Ozawa] Fix classname to be specified for external shuffle service. (cherry picked from commit 96606f6) Signed-off-by: Andrew Or <andrew@databricks.com>
Tsuyoshi Ozawa authored and Andrew Or committedDec 22, 2014 Configuration menu - View commit details
-
Copy full SHA for 31d42c4 - Browse repository at this point
Copy the full SHA 31d42c4View commit details -
[SPARK-4883][Shuffle] Add a name to the directoryCleaner thread
Author: zsxwing <zsxwing@gmail.com> Closes #3734 from zsxwing/SPARK-4883 and squashes the following commits: e6f2b61 [zsxwing] Fix the name cc74727 [zsxwing] Add a name to the directoryCleaner thread (cherry picked from commit 8773705) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 70e69ef - Browse repository at this point
Copy the full SHA 70e69efView commit details -
[Minor] Improve some code in BroadcastTest for short
Using val arr1 = (0 until num).toArray instead of val arr1 = new Array[Int](num) for (i <- 0 until arr1.length) { arr1(i) = i } for short. Author: carlmartin <carlmartinmax@gmail.com> Closes #3750 from SaintBacchus/BroadcastTest and squashes the following commits: 43adb70 [carlmartin] Improve some code in BroadcastTest for short
Configuration menu - View commit details
-
Copy full SHA for c7396b5 - Browse repository at this point
Copy the full SHA c7396b5View commit details -
[SPARK-4864] Add documentation to Netty-based configs
Author: Aaron Davidson <aaron@databricks.com> Closes #3713 from aarondav/netty-configs and squashes the following commits: 8a8b373 [Aaron Davidson] Address Patrick's comments 3b1f84e [Aaron Davidson] [SPARK-4864] Add documentation to Netty-based configs (cherry picked from commit fbca6b6) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 4b2bded - Browse repository at this point
Copy the full SHA 4b2bdedView commit details -
[SPARK-4920][UI]:current spark version in UI is not striking.
It is not convenient to see the Spark version. We can keep the same style with Spark website. ![spark_version](https://cloud.githubusercontent.com/assets/7402327/5527025/1c8c721c-8a35-11e4-8d6a-2734f3c6bdf8.jpg) Author: genmao.ygm <genmao.ygm@alibaba-inc.com> Closes #3763 from uncleGen/master-clean-141222 and squashes the following commits: 0dcb9a9 [genmao.ygm] [SPARK-4920][UI]:current spark version in UI is not striking. (cherry picked from commit de9d7d2) Signed-off-by: Andrew Or <andrew@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for a8a8e0e - Browse repository at this point
Copy the full SHA a8a8e0eView commit details -
[SPARK-4818][Core] Add 'iterator' to reduce memory consumed by join
In Scala, `map` and `flatMap` of `Iterable` will copy the contents of `Iterable` to a new `Seq`. Such as, ```Scala val iterable = Seq(1, 2, 3).map(v => { println(v) v }) println("Iterable map done") val iterator = Seq(1, 2, 3).iterator.map(v => { println(v) v }) println("Iterator map done") ``` outputed ``` 1 2 3 Iterable map done Iterator map done ``` So we should use 'iterator' to reduce memory consumed by join. Found by Johannes Simon in http://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3C5BE70814-9D03-4F61-AE2C-0D63F2DE4446%40mail.de%3E Author: zsxwing <zsxwing@gmail.com> Closes #3671 from zsxwing/SPARK-4824 and squashes the following commits: 48ee7b9 [zsxwing] Remove the explicit types 95d59d6 [zsxwing] Add 'iterator' to reduce memory consumed by join (cherry picked from commit c233ab3) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 58e3702 - Browse repository at this point
Copy the full SHA 58e3702View commit details
Commits on Dec 23, 2014
-
Configuration menu - View commit details
-
Copy full SHA for f86fe08 - Browse repository at this point
Copy the full SHA f86fe08View commit details -
[SPARK-4931][Yarn][Docs] Fix the format of running-on-yarn.md
Currently, the format about log4j in running-on-yarn.md is a bit messy. ![running-on-yarn](https://cloud.githubusercontent.com/assets/1000778/5535248/204c4b64-8ab4-11e4-83c3-b4722ea0ad9d.png) Author: zsxwing <zsxwing@gmail.com> Closes #3774 from zsxwing/SPARK-4931 and squashes the following commits: 4a5f853 [zsxwing] Fix the format of running-on-yarn.md (cherry picked from commit 2d215ae) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9fb86b8 - Browse repository at this point
Copy the full SHA 9fb86b8View commit details -
[SPARK-4834] [standalone] Clean up application files after app finishes.
Commit 7aacb7b added support for sharing downloaded files among multiple executors of the same app. That works great in Yarn, since the app's directory is cleaned up after the app is done. But Spark standalone mode didn't do that, so the lock/cache files created by that change were left around and could eventually fill up the disk hosting /tmp. To solve that, create app-specific directories under the local dirs when launching executors. Multiple executors launched by the same Worker will use the same app directories, so they should be able to share the downloaded files. When the application finishes, a new message is sent to all workers telling them the application has finished; once that message has been received, and all executors registered for the application shut down, then those directories will be cleaned up by the Worker. Note: Unit testing this is hard (if even possible), since local-cluster mode doesn't seem to leave the Master/Worker daemons running long enough after `sc.stop()` is called for the clean up protocol to take effect. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3705 from vanzin/SPARK-4834 and squashes the following commits: b430534 [Marcelo Vanzin] Remove seemingly unnecessary synchronization. 50eb4b9 [Marcelo Vanzin] Review feedback. c0e5ea5 [Marcelo Vanzin] [SPARK-4834] [standalone] Clean up application files after app finishes. (cherry picked from commit dd15536) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for ec11ffd - Browse repository at this point
Copy the full SHA ec11ffdView commit details -
[SPARK-4932] Add help comments in Analytics
Trivial modifications for usability. Author: Takeshi Yamamuro <linguin.m.s@gmail.com> Closes #3775 from maropu/AddHelpCommentInAnalytics and squashes the following commits: fbea8f5 [Takeshi Yamamuro] Add help comments in Analytics (cherry picked from commit 9c251c5) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for e74ce14 - Browse repository at this point
Copy the full SHA e74ce14View commit details -
[SPARK-4914][Build] Cleans lib_managed before compiling with Hive 0.13.1
This PR tries to fix the Hive tests failure encountered in PR #3157 by cleaning `lib_managed` before building assembly jar against Hive 0.13.1 in `dev/run-tests`. Otherwise two sets of datanucleus jars would be left in `lib_managed` and may mess up class paths while executing Hive test suites. Please refer to [this thread] [1] for details. A clean build would be even safer, but we only clean `lib_managed` here to save build time. This PR also takes the chance to clean up some minor typos and formatting issues in the comments. [1]: #3157 (comment) <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3756) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3756 from liancheng/clean-lib-managed and squashes the following commits: e2bd21d [Cheng Lian] Adds lib_managed to clean set c9f2f3e [Cheng Lian] Cleans lib_managed before compiling with Hive 0.13.1 (cherry picked from commit 395b771) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 7b5ba85 - Browse repository at this point
Copy the full SHA 7b5ba85View commit details -
[SPARK-4730][YARN] Warn against deprecated YARN settings
See https://issues.apache.org/jira/browse/SPARK-4730. Author: Andrew Or <andrew@databricks.com> Closes #3590 from andrewor14/yarn-settings and squashes the following commits: 36e0753 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-settings dcd1316 [Andrew Or] Warn against deprecated YARN settings (cherry picked from commit 27c5399) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 6a46cc3 - Browse repository at this point
Copy the full SHA 6a46cc3View commit details -
[SPARK-4802] [streaming] Remove receiverInfo once receiver is de-regi…
…stered Once the streaming receiver is de-registered at executor, the `ReceiverTrackerActor` needs to remove the corresponding reveiverInfo from the `receiverInfo` map at `ReceiverTracker`. Author: Ilayaperumal Gopinathan <igopinathan@pivotal.io> Closes #3647 from ilayaperumalg/receiverInfo-RTracker and squashes the following commits: 6eb97d5 [Ilayaperumal Gopinathan] Polishing based on the review 3640c86 [Ilayaperumal Gopinathan] Remove receiverInfo once receiver is de-registered (cherry picked from commit 10d69e9) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 01adf45 - Browse repository at this point
Copy the full SHA 01adf45View commit details -
[SPARK-4671][Streaming]Do not replicate streaming block when WAL is e…
…nabled Currently streaming block will be replicated when specific storage level is set, since WAL is already fault tolerant, so replication is needless and will hurt the throughput of streaming application. Hi tdas , as per discussed about this issue, I fixed with this implementation, I'm not is this the way you want, would you mind taking a look at it? Thanks a lot. Author: jerryshao <saisai.shao@intel.com> Closes #3534 from jerryshao/SPARK-4671 and squashes the following commits: 500b456 [jerryshao] Do not replicate streaming block when WAL is enabled (cherry picked from commit 3f5f4cc) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for aa78c23 - Browse repository at this point
Copy the full SHA aa78c23View commit details
Commits on Dec 24, 2014
-
[SPARK-4606] Send EOF to child JVM when there's no more data to read.
Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3460 from vanzin/SPARK-4606 and squashes the following commits: 031207d [Marcelo Vanzin] [SPARK-4606] Send EOF to child JVM when there's no more data to read. (cherry picked from commit 7e2deb7) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 1a4e2ba - Browse repository at this point
Copy the full SHA 1a4e2baView commit details
Commits on Dec 25, 2014
-
[SPARK-4873][Streaming] Use
Future.zip
instead ofFuture.flatMap
(……for-loop) in WriteAheadLogBasedBlockHandler Use `Future.zip` instead of `Future.flatMap`(for-loop). `zip` implies these two Futures will run concurrently, while `flatMap` usually means one Future depends on the other one. Author: zsxwing <zsxwing@gmail.com> Closes #3721 from zsxwing/SPARK-4873 and squashes the following commits: 46a2cd9 [zsxwing] Use Future.zip instead of Future.flatMap(for-loop) (cherry picked from commit b4d0db8) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 17d6f54 - Browse repository at this point
Copy the full SHA 17d6f54View commit details -
Fix "Building Spark With Maven" link in README.md
Corrected link to the Building Spark with Maven page from its original (http://spark.apache.org/docs/latest/building-with-maven.html) to the current page (http://spark.apache.org/docs/latest/building-spark.html) Author: Denny Lee <denny.g.lee@gmail.com> Closes #3802 from dennyglee/patch-1 and squashes the following commits: 15f601a [Denny Lee] Update README.md (cherry picked from commit 08b18c7) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 475ab6e - Browse repository at this point
Copy the full SHA 475ab6eView commit details
Commits on Dec 26, 2014
-
[SPARK-4537][Streaming] Expand StreamingSource to add more metrics
Add `processingDelay`, `schedulingDelay` and `totalDelay` for the last completed batch. Add `lastReceivedBatchRecords` and `totalReceivedBatchRecords` to the received records counting. Author: jerryshao <saisai.shao@intel.com> Closes #3466 from jerryshao/SPARK-4537 and squashes the following commits: 00f5f7f [jerryshao] Change the code style and add totalProcessedRecords 44721a6 [jerryshao] Further address the comments c097ddc [jerryshao] Address the comments 02dd44f [jerryshao] Fix the addressed comments c7a9376 [jerryshao] Expand StreamingSource to add more metrics (cherry picked from commit f205fe4) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for acf5c63 - Browse repository at this point
Copy the full SHA acf5c63View commit details -
Configuration menu - View commit details
-
Copy full SHA for 391080b - Browse repository at this point
Copy the full SHA 391080bView commit details
Commits on Dec 27, 2014
-
[SPARK-3787][BUILD] Assembly jar name is wrong when we build with sbt…
… omitting -Dhadoop.version This PR is another solution for When we build with sbt with profile for hadoop and without property for hadoop version like: sbt/sbt -Phadoop-2.2 assembly jar name is always used default version (1.0.4). When we build with maven with same condition for sbt, default version for each profile is used. For instance, if we build like: mvn -Phadoop-2.2 package jar name is used hadoop2.2.0 as a default version of hadoop-2.2. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3046 from sarutak/fix-assembly-jarname-2 and squashes the following commits: 41ef90e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname-2 50c8676 [Kousuke Saruta] Merge branch 'fix-assembly-jarname-2' of github.com:sarutak/spark into fix-assembly-jarname-2 52a1cd2 [Kousuke Saruta] Fixed comflicts dd30768 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname2 f1c90bb [Kousuke Saruta] Fixed SparkBuild.scala in order to read `hadoop.version` property from pom.xml af6b100 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname c81806b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname ad1f96e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname b2318eb [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname 5fc1259 [Kousuke Saruta] Fixed typo. eebbb7d [Kousuke Saruta] Fixed wrong jar name
Configuration menu - View commit details
-
Copy full SHA for 2e0af87 - Browse repository at this point
Copy the full SHA 2e0af87View commit details -
HOTFIX: Slight tweak on previous commit.
Meant to merge this in when committing SPARK-3787.
Configuration menu - View commit details
-
Copy full SHA for 3c4acac - Browse repository at this point
Copy the full SHA 3c4acacView commit details -
[SPARK-4952][Core]Handle ConcurrentModificationExceptions in SparkEnv…
….environmentDetails Author: GuoQiang Li <witgo@qq.com> Closes #3788 from witgo/SPARK-4952 and squashes the following commits: d903529 [GuoQiang Li] Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails (cherry picked from commit 080ceb7) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 23d64cf - Browse repository at this point
Copy the full SHA 23d64cfView commit details
Commits on Dec 29, 2014
-
[SPARK-4966][YARN]The MemoryOverhead value is setted not correctly
Configuration menu - View commit details
-
Copy full SHA for 2cd446a - Browse repository at this point
Copy the full SHA 2cd446aView commit details -
[SPARK-4982][DOC]
spark.ui.retainedJobs
description is wrong in Spa……rk UI configuration guide Author: wangxiaojing <u9jing@gmail.com> Closes #3818 from wangxiaojing/SPARK-4982 and squashes the following commits: fe2ad5f [wangxiaojing] change stages to jobs (cherry picked from commit 6645e52) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 7604666 - Browse repository at this point
Copy the full SHA 7604666View commit details -
SPARK-4968: takeOrdered to skip reduce step in case mappers return no…
… partitions takeOrdered should skip reduce step in case mapped RDDs have no partitions. This prevents the mentioned exception : 4. run query SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100; Error trace java.lang.UnsupportedOperationException: empty collection at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.reduce(RDD.scala:863) at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136) Author: Yash Datta <Yash.Datta@guavus.com> Closes #3830 from saucam/fix_takeorder and squashes the following commits: 5974d10 [Yash Datta] SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions (cherry picked from commit 9bc0df6) Signed-off-by: Reynold Xin <rxin@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for e81c869 - Browse repository at this point
Copy the full SHA e81c869View commit details
Commits on Dec 30, 2014
-
[SPARK-4920][UI] add version on master and worker page for standalone…
Configuration menu - View commit details
-
Copy full SHA for e20d632 - Browse repository at this point
Copy the full SHA e20d632View commit details -
[SPARK-4882] Register PythonBroadcast with Kryo so that PySpark works…
… with KryoSerializer This PR fixes an issue where PySpark broadcast variables caused NullPointerExceptions if KryoSerializer was used. The fix is to register PythonBroadcast with Kryo so that it's deserialized with a KryoJavaSerializer. Author: Josh Rosen <joshrosen@databricks.com> Closes #3831 from JoshRosen/SPARK-4882 and squashes the following commits: 0466c7a [Josh Rosen] Register PythonBroadcast with Kryo. d5b409f [Josh Rosen] Enable registrationRequired, which would have caught this bug. 069d8a7 [Josh Rosen] Add failing test for SPARK-4882 (cherry picked from commit efa80a5) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 42809db - Browse repository at this point
Copy the full SHA 42809dbView commit details -
[SPARK-4908][SQL] Prevent multiple concurrent hive native commands
This is just a quick fix that locks when calling `runHive`. If we can find a way to avoid the error without a global lock that would be better. Author: Michael Armbrust <michael@databricks.com> Closes #3834 from marmbrus/hiveConcurrency and squashes the following commits: bf25300 [Michael Armbrust] prevent multiple concurrent hive native commands (cherry picked from commit 480bd1d) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for cde8a31 - Browse repository at this point
Copy the full SHA cde8a31View commit details -
[SPARK-4386] Improve performance when writing Parquet files
Convert type of RowWriteSupport.attributes to Array. Analysis of performance for writing very wide tables shows that time is spent predominantly in apply method on attributes var. Type of attributes previously was LinearSeqOptimized and apply is O(N) which made write O(N squared). Measurements on 575 column table showed this change made a 6x improvement in write times. Author: Michael Davies <Michael.BellDavies@gmail.com> Closes #3843 from MickDavies/SPARK-4386 and squashes the following commits: 892519d [Michael Davies] [SPARK-4386] Improve performance when writing Parquet files (cherry picked from commit 7425bec) Signed-off-by: Michael Armbrust <michael@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 7a24541 - Browse repository at this point
Copy the full SHA 7a24541View commit details -
[SPARK-4813][Streaming] Fix the issue that ContextWaiter didn't handl…
…e 'spurious wakeup' Used `Condition` to rewrite `ContextWaiter` because it provides a convenient API `awaitNanos` for timeout. Author: zsxwing <zsxwing@gmail.com> Closes #3661 from zsxwing/SPARK-4813 and squashes the following commits: 52247f5 [zsxwing] Add explicit unit type be42bcf [zsxwing] Update as per review suggestion e06bd4f [zsxwing] Fix the issue that ContextWaiter didn't handle 'spurious wakeup' (cherry picked from commit 6a89782) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for edc96d8 - Browse repository at this point
Copy the full SHA edc96d8View commit details
Commits on Dec 31, 2014
-
[SPARK-1010] Clean up uses of System.setProperty in unit tests
Several of our tests call System.setProperty (or test code which implicitly sets system properties) and don't always reset/clear the modified properties, which can create ordering dependencies between tests and cause hard-to-diagnose failures. This patch removes most uses of System.setProperty from our tests, since in most cases we can use SparkConf to set these configurations (there are a few exceptions, including the tests of SparkConf itself). For the cases where we continue to use System.setProperty, this patch introduces a `ResetSystemProperties` ScalaTest mixin class which snapshots the system properties before individual tests and to automatically restores them on test completion / failure. See the block comment at the top of the ResetSystemProperties class for more details. Author: Josh Rosen <joshrosen@databricks.com> Closes #3739 from JoshRosen/cleanup-system-properties-in-tests and squashes the following commits: 0236d66 [Josh Rosen] Replace setProperty uses in two example programs / tools 3888fe3 [Josh Rosen] Remove setProperty use in LocalJavaStreamingContext 4f4031d [Josh Rosen] Add note on why SparkSubmitSuite needs ResetSystemProperties 4742a5b [Josh Rosen] Clarify ResetSystemProperties trait inheritance ordering. 0eaf0b6 [Josh Rosen] Remove setProperty call in TaskResultGetterSuite. 7a3d224 [Josh Rosen] Fix trait ordering 3fdb554 [Josh Rosen] Remove setProperty call in TaskSchedulerImplSuite bee20df [Josh Rosen] Remove setProperty calls in SparkContextSchedulerCreationSuite 655587c [Josh Rosen] Remove setProperty calls in JobCancellationSuite 3f2f955 [Josh Rosen] Remove System.setProperty calls in DistributedSuite cfe9cce [Josh Rosen] Remove use of system properties in SparkContextSuite 8783ab0 [Josh Rosen] Remove TestUtils.setSystemProperty, since it is subsumed by the ResetSystemProperties trait. 633a84a [Josh Rosen] Remove use of system properties in FileServerSuite 25bfce2 [Josh Rosen] Use ResetSystemProperties in UtilsSuite 1d1aa5a [Josh Rosen] Use ResetSystemProperties in SizeEstimatorSuite dd9492b [Josh Rosen] Use ResetSystemProperties in AkkaUtilsSuite b0daff2 [Josh Rosen] Use ResetSystemProperties in BlockManagerSuite e9ded62 [Josh Rosen] Use ResetSystemProperties in TaskSchedulerImplSuite 5b3cb54 [Josh Rosen] Use ResetSystemProperties in SparkListenerSuite 0995c4b [Josh Rosen] Use ResetSystemProperties in SparkContextSchedulerCreationSuite c83ded8 [Josh Rosen] Use ResetSystemProperties in SparkConfSuite 51aa870 [Josh Rosen] Use withSystemProperty in ShuffleSuite 60a63a1 [Josh Rosen] Use ResetSystemProperties in JobCancellationSuite 14a92e4 [Josh Rosen] Use withSystemProperty in FileServerSuite 628f46c [Josh Rosen] Use ResetSystemProperties in DistributedSuite 9e3e0dd [Josh Rosen] Add ResetSystemProperties test fixture mixin; use it in SparkSubmitSuite. 4dcea38 [Josh Rosen] Move withSystemProperty to TestUtils class. (cherry picked from commit 352ed6b) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for ad3dc81 - Browse repository at this point
Copy the full SHA ad3dc81View commit details -
[SPARK-4298][Core] - The spark-submit cannot read Main-Class from Man…
…ifest. Resolves a bug where the `Main-Class` from a .jar file wasn't being read in properly. This was caused by the fact that the `primaryResource` object was a URI and needed to be normalized through a call to `.getPath` before it could be passed into the `JarFile` object. Author: Brennon York <brennon.york@capitalone.com> Closes #3561 from brennonyork/SPARK-4298 and squashes the following commits: 5e0fce1 [Brennon York] Use string interpolation for error messages, moved comment line from original code to above its necessary code segment 14daa20 [Brennon York] pushed mainClass assignment into match statement, removed spurious spaces, removed { } from case statements, removed return values c6dad68 [Brennon York] Set case statement to support multiple jar URI's and enabled the 'file' URI to load the main-class 8d20936 [Brennon York] updated to reset the error message back to the default a043039 [Brennon York] updated to split the uri and jar vals 8da7cbf [Brennon York] fixes SPARK-4298 (cherry picked from commit 8e14c5e) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 7c9c25b - Browse repository at this point
Copy the full SHA 7c9c25bView commit details -
[HOTFIX] Disable Spark UI in SparkSubmitSuite tests
This should fix a major cause of build breaks when running many parallel tests.
Configuration menu - View commit details
-
Copy full SHA for 076de46 - Browse repository at this point
Copy the full SHA 076de46View commit details -
[SPARK-4790][STREAMING] Fix ReceivedBlockTrackerSuite waits for old f…
…ile... ...s to get deleted before continuing. Since the deletes are happening asynchronously, the getFileStatus call might throw an exception in older HDFS versions, if the delete happens between the time listFiles is called on the directory and getFileStatus is called on the file in the getFileStatus method. This PR addresses this by adding an option to delete the files synchronously and then waiting for the deletion to complete before proceeding. Author: Hari Shreedharan <hshreedharan@apache.org> Closes #3726 from harishreedharan/spark-4790 and squashes the following commits: bbbacd1 [Hari Shreedharan] Call cleanUpOldLogs only once in the tests. 3255f17 [Hari Shreedharan] Add test for async deletion. Remove method from ReceiverTracker that does not take waitForCompletion. e4c83ec [Hari Shreedharan] Making waitForCompletion a mandatory param. Remove eventually from WALSuite since the cleanup method returns only after all files are deleted. af00fd1 [Hari Shreedharan] [SPARK-4790][STREAMING] Fix ReceivedBlockTrackerSuite waits for old files to get deleted before continuing. (cherry picked from commit 3610d3c) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for bd70ff9 - Browse repository at this point
Copy the full SHA bd70ff9View commit details -
[SPARK-5028][Streaming]Add total received and processed records metri…
…cs to Streaming UI This is a follow-up work of [SPARK-4537](https://issues.apache.org/jira/browse/SPARK-4537). Adding total received records and processed records metrics back to UI. ![screenshot](https://dl.dropboxusercontent.com/u/19230832/screenshot.png) Author: jerryshao <saisai.shao@intel.com> Closes #3852 from jerryshao/SPARK-5028 and squashes the following commits: c8c4877 [jerryshao] Add total received and processed metrics to Streaming UI (cherry picked from commit fdc2aa4) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 14dbd83 - Browse repository at this point
Copy the full SHA 14dbd83View commit details
Commits on Jan 1, 2015
-
[SPARK-5035] [Streaming] ReceiverMessage trait should extend Serializ…
…able Spark Streaming's ReceiverMessage trait should extend Serializable in order to fix a subtle bug that only occurs when running on a real cluster: If you attempt to send a fire-and-forget message to a remote Akka actor and that message cannot be serialized, then this seems to lead to more-or-less silent failures. As an optimization, Akka skips message serialization for messages sent within the same JVM. As a result, Spark's unit tests will never fail due to non-serializable Akka messages, but these will cause mostly-silent failures when running on a real cluster. Before this patch, here was the code for ReceiverMessage: ``` /** Messages sent to the NetworkReceiver. */ private[streaming] sealed trait ReceiverMessage private[streaming] object StopReceiver extends ReceiverMessage ``` Since ReceiverMessage does not extend Serializable and StopReceiver is a regular `object`, not a `case object`, StopReceiver will throw serialization errors. As a result, graceful receiver shutdown is broken on real clusters (and local-cluster mode) but works in local modes. If you want to reproduce this, try running the word count example from the Streaming Programming Guide in the Spark shell: ``` import org.apache.spark._ import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ val ssc = new StreamingContext(sc, Seconds(10)) // Create a DStream that will connect to hostname:port, like localhost:9999 val lines = ssc.socketTextStream("localhost", 9999) // Split each line into words val words = lines.flatMap(_.split(" ")) import org.apache.spark.streaming.StreamingContext._ // Count each word in each batch val pairs = words.map(word => (word, 1)) val wordCounts = pairs.reduceByKey(_ + _) // Print the first ten elements of each RDD generated in this DStream to the console wordCounts.print() ssc.start() Thread.sleep(10000) ssc.stop(true, true) ``` Prior to this patch, this would work correctly in local mode but fail when running against a real cluster (it would report that some receivers were not shut down). Author: Josh Rosen <joshrosen@databricks.com> Closes #3857 from JoshRosen/SPARK-5035 and squashes the following commits: 71d0eae [Josh Rosen] [SPARK-5035] ReceiverMessage trait should extend Serializable. (cherry picked from commit fe6efac) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 434ea00 - Browse repository at this point
Copy the full SHA 434ea00View commit details -
[HOTFIX] Bind web UI to ephemeral port in DriverSuite
The job launched by DriverSuite should bind the web UI to an ephemeral port, since it looks like port contention in this test has caused a large number of Jenkins failures when many builds are started simultaneously. Our tests already disable the web UI, but this doesn't affect subprocesses launched by our tests. In this case, I've opted to bind to an ephemeral port instead of disabling the UI because disabling features in this test may mask its ability to catch certain bugs. See also: e24d3a9 Author: Josh Rosen <joshrosen@databricks.com> Closes #3873 from JoshRosen/driversuite-webui-port and squashes the following commits: 48cd05c [Josh Rosen] [HOTFIX] Bind web UI to ephemeral port in DriverSuite. (cherry picked from commit 0128398) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for da9a4b9 - Browse repository at this point
Copy the full SHA da9a4b9View commit details
Commits on Jan 2, 2015
-
Configuration menu - View commit details
-
Copy full SHA for 33f0b14 - Browse repository at this point
Copy the full SHA 33f0b14View commit details
Commits on Jan 4, 2015
-
[SPARK-5058] Updated broken links
Updated the broken link pointing to the KafkaWordCount example to the correct one. Author: sigmoidanalytics <mayur@sigmoidanalytics.com> Closes #3877 from sigmoidanalytics/patch-1 and squashes the following commits: 3e19b31 [sigmoidanalytics] Updated broken links (cherry picked from commit 342612b) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 93617dd - Browse repository at this point
Copy the full SHA 93617ddView commit details -
[SPARK-4787] Stop SparkContext if a DAGScheduler init error occurs
Author: Dale <tigerquoll@outlook.com> Closes #3809 from tigerquoll/SPARK-4787 and squashes the following commits: 5661e01 [Dale] [SPARK-4787] Ensure that call to stop() doesn't lose the exception by using a finally block. 2172578 [Dale] [SPARK-4787] Stop context properly if an exception occurs during DAGScheduler initialization. (cherry picked from commit 3fddc94) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 9dbb62e - Browse repository at this point
Copy the full SHA 9dbb62eView commit details
Commits on Jan 5, 2015
-
[SPARK-4631] unit test for MQTT
Please review the unit test for MQTT Author: bilna <bilnap@am.amrita.edu> Author: Bilna P <bilna.p@gmail.com> Closes #3844 from Bilna/master and squashes the following commits: acea3a3 [bilna] Adding dependency with scope test 28681fa [bilna] Merge remote-tracking branch 'upstream/master' fac3904 [bilna] Correction in Indentation and coding style ed9db4c [bilna] Merge remote-tracking branch 'upstream/master' 4b34ee7 [Bilna P] Update MQTTStreamSuite.scala 04503cf [bilna] Added embedded broker service for mqtt test 89d804e [bilna] Merge remote-tracking branch 'upstream/master' fc8eb28 [bilna] Merge remote-tracking branch 'upstream/master' 4b58094 [Bilna P] Update MQTTStreamSuite.scala b1ac4ad [bilna] Added BeforeAndAfter 5f6bfd2 [bilna] Added BeforeAndAfter e8b6623 [Bilna P] Update MQTTStreamSuite.scala 5ca6691 [Bilna P] Update MQTTStreamSuite.scala 8616495 [bilna] [SPARK-4631] unit test for MQTT (cherry picked from commit e767d7d) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 67e2eb6 - Browse repository at this point
Copy the full SHA 67e2eb6View commit details -
[SPARK-4835] Disable validateOutputSpecs for Spark Streaming jobs
This patch disables output spec. validation for jobs launched through Spark Streaming, since this interferes with checkpoint recovery. Hadoop OutputFormats have a `checkOutputSpecs` method which performs certain checks prior to writing output, such as checking whether the output directory already exists. SPARK-1100 added checks for FileOutputFormat, SPARK-1677 (#947) added a SparkConf configuration to disable these checks, and SPARK-2309 (#1088) extended these checks to run for all OutputFormats, not just FileOutputFormat. In Spark Streaming, we might have to re-process a batch during checkpoint recovery, so `save` actions may be called multiple times. In addition to `DStream`'s own save actions, users might use `transform` or `foreachRDD` and call the `RDD` and `PairRDD` save actions. When output spec. validation is enabled, the second calls to these actions will fail due to existing output. This patch automatically disables output spec. validation for jobs submitted by the Spark Streaming scheduler. This is done by using Scala's `DynamicVariable` to propagate the bypass setting without having to mutate SparkConf or introduce a global variable. Author: Josh Rosen <joshrosen@databricks.com> Closes #3832 from JoshRosen/SPARK-4835 and squashes the following commits: 36eaf35 [Josh Rosen] Add comment explaining use of transform() in test. 6485cf8 [Josh Rosen] Add test case in Streaming; fix bug for transform() 7b3e06a [Josh Rosen] Remove Streaming-specific setting to undo this change; update conf. guide bf9094d [Josh Rosen] Revise disableOutputSpecValidation() comment to not refer to Spark Streaming. e581d17 [Josh Rosen] Deduplicate isOutputSpecValidationEnabled logic. 762e473 [Josh Rosen] [SPARK-4835] Disable validateOutputSpecs for Spark Streaming jobs. (cherry picked from commit 939ba1f) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for a0bb88e - Browse repository at this point
Copy the full SHA a0bb88eView commit details -
[SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos enviro…
…nme... ...nt at all. - fixed a scope of runAsSparkUser from MesosExecutorDriver.run to MesosExecutorBackend.launchTask - See the Jira Issue for more details. Author: Jongyoul Lee <jongyoul@gmail.com> Closes #3741 from jongyoul/SPARK-4465 and squashes the following commits: 46ad71e [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - Removed unused import 3d6631f [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - Removed comments and adjusted indentations 2343f13 [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - fixed a scope of runAsSparkUser from MesosExecutorDriver.run to MesosExecutorBackend.launchTask (cherry picked from commit 1c0e7ce) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for f979205 - Browse repository at this point
Copy the full SHA f979205View commit details -
[SPARK-5089][PYSPARK][MLLIB] Fix vector convert
This is a small change addressing a potentially significant bug in how PySpark + MLlib handles non-float64 numpy arrays. The automatic conversion to `DenseVector` that occurs when passing RDDs to MLlib algorithms in PySpark should automatically upcast to float64s, but currently this wasn't actually happening. As a result, non-float64 would be silently parsed inappropriately during SerDe, yielding erroneous results when running, for example, KMeans. The PR includes the fix, as well as a new test for the correct conversion behavior. davies Author: freeman <the.freeman.lab@gmail.com> Closes #3902 from freeman-lab/fix-vector-convert and squashes the following commits: 764db47 [freeman] Add a test for proper conversion behavior 704f97e [freeman] Return array after changing type (cherry picked from commit 6c6f325) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for cf55a2b - Browse repository at this point
Copy the full SHA cf55a2bView commit details
Commits on Jan 6, 2015
-
[HOTFIX] Add missing SparkContext._ import to fix 1.2 build.
This fixes a build break caused by a0bb88e
Configuration menu - View commit details
-
Copy full SHA for db83acb - Browse repository at this point
Copy the full SHA db83acbView commit details
Commits on Jan 7, 2015
-
[YARN][SPARK-4929] Bug fix: fix the yarn-client code to support HA
Nowadays, yarn-client will exit directly when the HA change happens no matter how many times the am should retry. The reason may be that the default final status only considerred the sys.exit, and the yarn-client HA cann't benefit from this. So we should distinct the default final status between client and cluster, because the SUCCEEDED status may cause the HA failed in client mode and UNDEFINED may cause the error reporter in cluster when using sys.exit. Author: huangzhaowei <carlmartinmax@gmail.com> Closes #3771 from SaintBacchus/YarnHA and squashes the following commits: c02bfcc [huangzhaowei] Improve the comment of the funciton 'getDefaultFinalStatus' 0e69924 [huangzhaowei] Bug fix: fix the yarn-client code to support HA (cherry picked from commit 5fde661) Signed-off-by: Thomas Graves <tgraves@apache.org>
Configuration menu - View commit details
-
Copy full SHA for 7a4be0b - Browse repository at this point
Copy the full SHA 7a4be0bView commit details -
[SPARK-5132][Core]Correct stage Attempt Id key in stageInfofromJson
SPARK-5132: stageInfoToJson: Stage Attempt Id stageInfoFromJson: Attempt Id Author: hushan[胡珊] <hushan@xiaomi.com> Closes #3932 from suyanNone/json-stage and squashes the following commits: 41419ab [hushan[胡珊]] Correct stage Attempt Id key in stageInfofromJson (cherry picked from commit d345ebe) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Configuration menu - View commit details
-
Copy full SHA for 1770c51 - Browse repository at this point
Copy the full SHA 1770c51View commit details