Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch 1.2 #3880

Closed
wants to merge 462 commits into from
Closed

Branch 1.2 #3880

wants to merge 462 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Nov 21, 2014

  1. [SPARK-4522][SQL] Parse schema with missing metadata.

    This is just a quick fix for 1.2.  SPARK-4523 describes a more complete solution.
    
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #3392 from marmbrus/parquetMetadata and squashes the following commits:
    
    bcc6626 [Michael Armbrust] Parse schema with missing metadata.
    
    (cherry picked from commit 90a6a46)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    marmbrus committed Nov 21, 2014
    Configuration menu
    Copy the full SHA
    668643b View commit details
    Browse the repository at this point in the history
  2. [SPARK-4472][Shell] Print "Spark context available as sc." only when …

    …SparkContext is created...
    
    ... successfully
    
    It's weird that printing "Spark context available as sc" when creating SparkContext unsuccessfully.
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes #3341 from zsxwing/SPARK-4472 and squashes the following commits:
    
    4850093 [zsxwing] Print "Spark context available as sc." only when SparkContext is created successfully
    
    (cherry picked from commit f1069b8)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    zsxwing authored and rxin committed Nov 21, 2014
    Configuration menu
    Copy the full SHA
    6f70e02 View commit details
    Browse the repository at this point in the history
  3. SPARK-4532: Fix bug in detection of Hive in Spark 1.2

    Because the Hive profile is no longer defined in the root pom,
    we need to check specifically in the sql/hive pom when we
    perform the check in make-distribtion.sh.
    
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #3398 from pwendell/make-distribution and squashes the following commits:
    
    8a58279 [Patrick Wendell] Fix bug in detection of Hive in Spark 1.2
    
    (cherry picked from commit a81918c)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    pwendell committed Nov 21, 2014
    Configuration menu
    Copy the full SHA
    6a01689 View commit details
    Browse the repository at this point in the history
  4. [SPARK-4531] [MLlib] cache serialized java object

    The Pyrolite is pretty slow (comparing to the adhoc serializer in 1.1), it cause much performance regression in 1.2, because we cache the serialized Python object in JVM, deserialize them into Java object in each step.
    
    This PR change to cache the deserialized JavaRDD instead of PythonRDD to avoid the deserialization of Pyrolite. It should have similar memory usage as before, but much faster.
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #3397 from davies/cache and squashes the following commits:
    
    7f6e6ce [Davies Liu] Update -> Updater
    4b52edd [Davies Liu] using named argument
    63b984e [Davies Liu] fix
    7da0332 [Davies Liu] add unpersist()
    dff33e1 [Davies Liu] address comments
    c2bdfc2 [Davies Liu] refactor
    d572f00 [Davies Liu] Merge branch 'master' into cache
    f1063e1 [Davies Liu] cache serialized java object
    
    (cherry picked from commit ce95bd8)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    Davies Liu authored and mengxr committed Nov 21, 2014
    Configuration menu
    Copy the full SHA
    9309ddf View commit details
    Browse the repository at this point in the history

Commits on Nov 22, 2014

  1. [SPARK-4431][MLlib] Implement efficient foreachActive for dense and s…

    …parse vector
    
    Previously, we were using Breeze's activeIterator to access the non-zero elements
    in dense/sparse vector. Due to the overhead, we switched back to native `while loop`
    in #SPARK-4129.
    
    However, #SPARK-4129 requires de-reference the dv.values/sv.values in
    each access to the value, which is very expensive. Also, in MultivariateOnlineSummarizer,
    we're using Breeze's dense vector to store the partial stats, and this is very expensive compared
    with using primitive scala array.
    
    In this PR, efficient foreachActive is implemented to unify the code path for dense and sparse
    vector operation which makes codebase easier to maintain. Breeze dense vector is replaced
    by primitive array to reduce the overhead further.
    
    Benchmarking with mnist8m dataset on single JVM
    with first 200 samples loaded in memory, and repeating 5000 times.
    
    Before change:
    Sparse Vector - 30.02
    Dense Vector - 38.27
    
    With this PR:
    Sparse Vector - 6.29
    Dense Vector - 11.72
    
    Author: DB Tsai <dbtsai@alpinenow.com>
    
    Closes #3288 from dbtsai/activeIterator and squashes the following commits:
    
    844b0e6 [DB Tsai] formating
    03dd693 [DB Tsai] futher performance tunning.
    1907ae1 [DB Tsai] address feedback
    98448bb [DB Tsai] Made the override final, and had a local copy of variables which made the accessing a single step operation.
    c0cbd5a [DB Tsai] fix a bug
    6441f92 [DB Tsai] Finished SPARK-4431
    
    (cherry picked from commit b5d17ef)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    DB Tsai authored and mengxr committed Nov 22, 2014
    Configuration menu
    Copy the full SHA
    4b68cab View commit details
    Browse the repository at this point in the history

Commits on Nov 24, 2014

  1. SPARK-4457. Document how to build for Hadoop versions greater than 2.4

    Author: Sandy Ryza <sandy@cloudera.com>
    
    Closes #3322 from sryza/sandy-spark-4457 and squashes the following commits:
    
    5e72b77 [Sandy Ryza] Feedback
    0cf05c1 [Sandy Ryza] Caveat
    be8084b [Sandy Ryza] SPARK-4457. Document how to build for Hadoop versions greater than 2.4
    
    (cherry picked from commit 29372b6)
    Signed-off-by: Thomas Graves <tgraves@apache.org>
    sryza authored and tgravescs committed Nov 24, 2014
    Configuration menu
    Copy the full SHA
    1a12ca3 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4479][SQL] Avoids unnecessary defensive copies when sort based…

    … shuffle is on
    
    This PR is a workaround for SPARK-4479. Two changes are introduced: when merge sort is bypassed in `ExternalSorter`,
    
    1. also bypass RDD elements buffering as buffering is the reason that `MutableRow` backed row objects must be copied, and
    2. avoids defensive copies in `Exchange` operator
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3422)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #3422 from liancheng/avoids-defensive-copies and squashes the following commits:
    
    591f2e9 [Cheng Lian] Passes all shuffle suites
    0c3c91e [Cheng Lian] Fixes shuffle write metrics when merge sort is bypassed
    ed5df3c [Cheng Lian] Fixes styling changes
    f75089b [Cheng Lian] Avoids unnecessary defensive copies when sort based shuffle is on
    
    (cherry picked from commit a6d7b61)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    liancheng authored and marmbrus committed Nov 24, 2014
    Configuration menu
    Copy the full SHA
    ee1bc89 View commit details
    Browse the repository at this point in the history
  3. [SQL] Fix comment in HiveShim

    This file is for Hive 0.13.1 I think.
    
    Author: Daniel Darabos <darabos.daniel@gmail.com>
    
    Closes #3432 from darabos/patch-2 and squashes the following commits:
    
    4fd22ed [Daniel Darabos] Fix comment. This file is for Hive 0.13.1.
    
    (cherry picked from commit d5834f0)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    darabos authored and marmbrus committed Nov 24, 2014
    Configuration menu
    Copy the full SHA
    1e3d22b View commit details
    Browse the repository at this point in the history
  4. [SQL] Fix path in HiveFromSpark

    It require us to run ```HiveFromSpark``` in specified dir because ```HiveFromSpark``` use relative path, this leads to ```run-example``` error(http://apache-spark-developers-list.1001551.n3.nabble.com/src-main-resources-kv1-txt-not-found-in-example-of-HiveFromSpark-td9100.html).
    
    Author: scwf <wangfei1@huawei.com>
    
    Closes #3415 from scwf/HiveFromSpark and squashes the following commits:
    
    ed3d6c9 [scwf] revert no need change
    b00e20c [scwf] fix path usring spark_home
    dbd321b [scwf] fix path in hivefromspark
    
    (cherry picked from commit b384119)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    scwf authored and marmbrus committed Nov 24, 2014
    Configuration menu
    Copy the full SHA
    0e7fa7f View commit details
    Browse the repository at this point in the history
  5. [SPARK-4487][SQL] Fix attribute reference resolution error when using…

    … ORDER BY.
    
    When we use ORDER BY clause, at first, attributes referenced by projection are resolved (1).
    And then, attributes referenced at ORDER BY clause are resolved (2).
     But when resolving attributes referenced at ORDER BY clause, the resolution result generated in (1) is discarded so for example, following query fails.
    
        SELECT c1 + c2 FROM mytable ORDER BY c1;
    
    The query above fails because when resolving the attribute reference 'c1', the resolution result of 'c2' is discarded.
    
    Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    
    Closes #3363 from sarutak/SPARK-4487 and squashes the following commits:
    
    fd314f3 [Kousuke Saruta] Fixed attribute resolution logic in Analyzer
    6e60c20 [Kousuke Saruta] Fixed conflicts
    cb5b7e9 [Kousuke Saruta] Added test case for SPARK-4487
    282d529 [Kousuke Saruta] Fixed attributes reference resolution error
    b6123e6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into concat-feature
    317b7fb [Kousuke Saruta] WIP
    
    (cherry picked from commit dd1c9cb)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    sarutak authored and marmbrus committed Nov 24, 2014
    Configuration menu
    Copy the full SHA
    97b7eb4 View commit details
    Browse the repository at this point in the history
  6. [SPARK-4145] Web UI job pages

    This PR adds two new pages to the Spark Web UI:
    
    - A jobs overview page, which shows details on running / completed / failed jobs.
    - A job details page, which displays information on an individual job's stages.
    
    The jobs overview page is now the default UI homepage; the old homepage is still accessible at `/stages`.
    
    ### Screenshots
    
    #### New UI homepage
    
    ![image](https://cloud.githubusercontent.com/assets/50748/5119035/fd0a69e6-701f-11e4-89cb-db7e9705714f.png)
    
    #### Job details page
    
    (This is effectively a per-job version of the stages page that can be extended later with other things, such as DAG visualizations)
    
    ![image](https://cloud.githubusercontent.com/assets/50748/5134910/50b340d4-70c7-11e4-88e1-6b73237ea7c8.png)
    
    ### Key changes in this PR
    
    - Rename `JobProgressPage` to `AllStagesPage`
    - Expose `StageInfo` objects in the ``SparkListenerJobStart` event; add backwards-compatibility tests to JsonProtocol.
    - Add additional data structures to `JobProgressListener` to map from stages to jobs.
    - Add several fields to `JobUIData`.
    
    I also added ~150 lines of Selenium tests as I uncovered UI issues while developing this patch.
    
    ### Limitations
    
    If a job contains stages that aren't run, then its overall job progress bar may be an underestimate of the total job progress; in other words, a completed job may appear to have a progress bar that's not at 100%.
    
    If stages or tasks fail, then the progress bar will not go backwards to reflect the true amount of remaining work.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #3009 from JoshRosen/job-page and squashes the following commits:
    
    eb05e90 [Josh Rosen] Disable kill button in completed stages tables.
    f00c851 [Josh Rosen] Fix JsonProtocol compatibility
    b89c258 [Josh Rosen] More JSON protocol backwards-compatibility fixes.
    ff804cd [Josh Rosen] Don't write "Stage Ids" field in JobStartEvent JSON.
    6f17f3f [Josh Rosen] Only store StageInfos in SparkListenerJobStart event.
    2bbf41a [Josh Rosen] Update job progress bar to reflect skipped tasks/stages.
    61c265a [Josh Rosen] Add “skipped stages” table; only display non-empty tables.
    1f45d44 [Josh Rosen] Incorporate a bunch of minor review feedback.
    0b77e3e [Josh Rosen] More bug fixes for phantom stages.
    034aa8d [Josh Rosen] Use `.max()` to find result stage for job.
    eebdc2c [Josh Rosen] Don’t display pending stages for completed jobs.
    67080ba [Josh Rosen] Ensure that "phantom stages" don't cause memory leaks.
    7d10b97 [Josh Rosen] Merge remote-tracking branch 'apache/master' into job-page
    d69c775 [Josh Rosen] Fix table sorting on all jobs page.
    5eb39dc [Josh Rosen] Add pending stages table to job page.
    f2a15da [Josh Rosen] Add status field to job details page.
    171b53c [Josh Rosen] Move `startTime` to the start of SparkContext.
    e2f2c43 [Josh Rosen] Fix sorting of stages in job details page.
    8955f4c [Josh Rosen] Display information for pending stages on jobs page.
    8ab6c28 [Josh Rosen] Compute numTasks from job start stage infos.
    5884f91 [Josh Rosen] Add StageInfos to SparkListenerJobStart event.
    79793cd [Josh Rosen] Track indices of completed stage to avoid overcounting when failures occur.
    d62ea7b [Josh Rosen] Add failing Selenium test for stage overcounting issue.
    1145c60 [Josh Rosen] Display text instead of progress bar for stages.
    3d0a007 [Josh Rosen] Merge remote-tracking branch 'origin/master' into job-page
    8a2351b [Josh Rosen] Add help tooltip to Spark Jobs page.
    b7bf30e [Josh Rosen] Add stages progress bar; fix bug where active stages show as completed.
    4846ce4 [Josh Rosen] Hide "(Job Group") if no jobs were submitted in job groups.
    4d58e55 [Josh Rosen] Change label to "Tasks (for all stages)"
    85e9c85 [Josh Rosen] Extract startTime into separate variable.
    1cf4987 [Josh Rosen] Fix broken kill links; add Selenium test to avoid future regressions.
    56701fa [Josh Rosen] Move last stage name / description logic out of markup.
    a475ea1 [Josh Rosen] Add progress bars to jobs page.
    45343b8 [Josh Rosen] More comments
    4b206fb [Josh Rosen] Merge remote-tracking branch 'origin/master' into job-page
    bfce2b9 [Josh Rosen] Address review comments, except for progress bar.
    4487dcb [Josh Rosen] [SPARK-4145] Web UI job pages
    2568a6c [Josh Rosen] Rename JobProgressPage to AllStagesPage:
    
    (cherry picked from commit 4a90276)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    JoshRosen authored and pwendell committed Nov 24, 2014
    Configuration menu
    Copy the full SHA
    2d35cc0 View commit details
    Browse the repository at this point in the history
  7. [SPARK-4518][SPARK-4519][Streaming] Refactored file stream to prevent…

    … files from being processed multiple times
    
    Because of a corner case, a file already selected for batch t can get considered again for batch t+2. This refactoring fixes it by remembering all the files selected in the last 1 minute, so that this corner case does not arise. Also uses spark context's hadoop configuration to access the file system API for listing directories.
    
    pwendell Please take look. I still have not run long-running integration tests, so I cannot say for sure whether this has indeed solved the issue. You could do a first pass on this in the meantime.
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    
    Closes #3419 from tdas/filestream-fix2 and squashes the following commits:
    
    c19dd8a [Tathagata Das] Addressed PR comments.
    513b608 [Tathagata Das] Updated docs.
    d364faf [Tathagata Das] Added the current time condition back
    5526222 [Tathagata Das] Removed unnecessary imports.
    38bb736 [Tathagata Das] Fix long line.
    203bbc7 [Tathagata Das] Un-ignore tests.
    eaef4e1 [Tathagata Das] Fixed SPARK-4519
    9dbd40a [Tathagata Das] Refactored FileInputDStream to remember last few batches.
    
    (cherry picked from commit cb0e9b0)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    tdas committed Nov 24, 2014
    Configuration menu
    Copy the full SHA
    6fa3e41 View commit details
    Browse the repository at this point in the history

Commits on Nov 25, 2014

  1. [SPARK-4562] [MLlib] speedup vector

    This PR change the underline array of DenseVector to numpy.ndarray to avoid the conversion, because most of the users will using numpy.array.
    
    It also improve the serialization of DenseVector.
    
    Before this change:
    
    trial	| trainingTime | 	testTime
    -------|--------|--------
    0	| 5.126 | 	1.786
    1	|2.698	|1.693
    
    After the change:
    
    trial	| trainingTime |	testTime
    -------|--------|--------
    0	|4.692	|0.554
    1	|2.307	|0.525
    
    This could partially fix the performance regression during test.
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #3420 from davies/ser2 and squashes the following commits:
    
    0e1e6f3 [Davies Liu] fix tests
    426f5db [Davies Liu] impove toArray()
    44707ec [Davies Liu] add name for ISO-8859-1
    fa7d791 [Davies Liu] address comments
    1cfb137 [Davies Liu] handle zero sparse vector
    2548ee2 [Davies Liu] fix tests
    9e6389d [Davies Liu] bugfix
    470f702 [Davies Liu] speed up DenseMatrix
    f0d3c40 [Davies Liu] speedup SparseVector
    ef6ce70 [Davies Liu] speed up dense vector
    
    (cherry picked from commit b660de7)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    Davies Liu authored and mengxr committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    9ea67fc View commit details
    Browse the repository at this point in the history
  2. get raw vectors for further processing in Word2Vec

    e.g. clustering
    
    Author: tkaessmann <tobias.kaessmann@s24.com>
    
    Closes #3309 from tkaessmann/branch-1.2 and squashes the following commits:
    
    e3a3142 [tkaessmann] changes the comment for getVectors
    58d3d83 [tkaessmann] removes sign from comment
    a5be213 [tkaessmann] fixes getVectors to fit code guidelines
    3782fa9 [tkaessmann] get raw vectors for further processing
    tkaessmann authored and mengxr committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    2acbd28 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4578] fix asDict() with nested Row()

    The Row object is created on the fly once the field is accessed, so we should access them by getattr() in asDict(0
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #3434 from davies/fix_asDict and squashes the following commits:
    
    b20f1e7 [Davies Liu] fix asDict() with nested Row()
    
    (cherry picked from commit 050616b)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    Davies Liu authored and pwendell committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    8371bc2 View commit details
    Browse the repository at this point in the history
  4. [SPARK-4548] []SPARK-4517] improve performance of python broadcast

    Re-implement the Python broadcast using file:
    
    1) serialize the python object using cPickle, write into disks.
    2) Create a wrapper in JVM (for the dumped file), it read data from during serialization
    3) Using TorrentBroadcast or HttpBroadcast to transfer the data (compressed) into executors
    4) During deserialization, writing the data into disk.
    5) Passing the path into Python worker, read data from disk and unpickle it into python object, until the first access.
    
    It fixes the performance regression introduced in #2659, has similar performance as 1.1, but support object larger than 2G, also improve the memory efficiency (only one compressed copy in driver and executor).
    
    Testing with a 500M broadcast and 4 tasks (excluding the benefit from reused worker in 1.2):
    
             name |   1.1   | 1.2 with this patch |  improvement
    ---------|--------|---------|--------
          python-broadcast-w-bytes  |	25.20  |	9.33   |	170.13% |
            python-broadcast-w-set	  |     4.13	   |    4.50  |	-8.35%  |
    
    Testing with 100 tasks (16 CPUs):
    
             name |   1.1   | 1.2 with this patch |  improvement
    ---------|--------|---------|--------
         python-broadcast-w-bytes	| 38.16	| 8.40	 | 353.98%
            python-broadcast-w-set	| 23.29	| 9.59 |	142.80%
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #3417 from davies/pybroadcast and squashes the following commits:
    
    50a58e0 [Davies Liu] address comments
    b98de1d [Davies Liu] disable gc while unpickle
    e5ee6b9 [Davies Liu] support large string
    09303b8 [Davies Liu] read all data into memory
    dde02dd [Davies Liu] improve performance of python broadcast
    
    (cherry picked from commit 6cf5076)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Davies Liu authored and JoshRosen committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    841f247 View commit details
    Browse the repository at this point in the history
  5. [SPARK-4266] [Web-UI] Reduce stage page load time.

    The commit changes the java script used to show/hide additional
    metrics in order to reduce page load time. SPARK-4016 significantly
    increased page load time for the stage page when stages had a lot
    (thousands or tens of thousands) of tasks, due to the additional
    Javascript to hide some metrics by default and stripe the tables.
    This commit reduces page load time in two ways:
    
    (1) Now, all of the metrics that are hidden by default are
    hidden by setting "display: none;" using CSS for the page,
    rather than hiding them using javascript after the page loads.
    Without this change, for stages with thousands of tasks, there
    was a few second delay after page load, where first the additional
    metrics were shown, and then after a delay were hidden once the
    relevant JS finished running.
    
    (2) CSS is used to stripe all of the tables except for the summary
    table. The summary table needs javascript to do the striping because
    some rows are hidden, but the javascript striping is slower, which
    again resulted in a delay when it was used for the task table (where
    for a few seconds after page load, all of the rows in the task table
    would be white, while the browser finished running the JS to stripe
    the table).
    
    cc pwendell
    
    This change is intended to be backported to 1.2 to avoid a regression in
    UI performance when users run large jobs.
    
    Author: Kay Ousterhout <kayousterhout@gmail.com>
    
    Closes #3328 from kayousterhout/SPARK-4266 and squashes the following commits:
    
    f964091 [Kay Ousterhout] [SPARK-4266] [Web-UI] Reduce stage page load time.
    
    (cherry picked from commit d24d5bf)
    Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
    kayousterhout committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    47d4fce View commit details
    Browse the repository at this point in the history
  6. [SPARK-4525] Mesos should decline unused offers

    Functionally, this is just a small change on top of #3393 (by jongyoul). The issue being addressed is discussed in the comments there. I have not yet added a test for the bug there. I will add one shortly.
    
    I've also done some minor renaming/clean-up of variables in this class and tests.
    
    Author: Patrick Wendell <pwendell@gmail.com>
    Author: Jongyoul Lee <jongyoul@gmail.com>
    
    Closes #3436 from pwendell/mesos-issue and squashes the following commits:
    
    58c35b5 [Patrick Wendell] Adding unit test for this situation
    c4f0697 [Patrick Wendell] Additional clean-up and fixes on top of existing fix
    f20f1b3 [Jongyoul Lee] [SPARK-4525] MesosSchedulerBackend.resourceOffers cannot decline unused offers from acceptedOffers - Added code for declining unused offers among acceptedOffers - Edited testCase for checking declining unused offers
    
    (cherry picked from commit b043c27)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    pwendell committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    4b47973 View commit details
    Browse the repository at this point in the history
  7. Revert "[SPARK-4525] Mesos should decline unused offers"

    This reverts commit 4b47973.
    
    I accidentally committed this using my own authorship credential. However,
    I should have given authoriship to the original author: Jongyoul Lee.
    pwendell committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    e7b8bf0 View commit details
    Browse the repository at this point in the history
  8. [SPARK-4525] Mesos should decline unused offers

    Functionally, this is just a small change on top of #3393 (by jongyoul). The issue being addressed is discussed in the comments there. I have not yet added a test for the bug there. I will add one shortly.
    
    I've also done some minor renaming/clean-up of variables in this class and tests.
    
    Author: Patrick Wendell <pwendell@gmail.com>
    Author: Jongyoul Lee <jongyoul@gmail.com>
    
    Closes #3436 from pwendell/mesos-issue and squashes the following commits:
    
    58c35b5 [Patrick Wendell] Adding unit test for this situation
    c4f0697 [Patrick Wendell] Additional clean-up and fixes on top of existing fix
    f20f1b3 [Jongyoul Lee] [SPARK-4525] MesosSchedulerBackend.resourceOffers cannot decline unused offers from acceptedOffers - Added code for declining unused offers among acceptedOffers - Edited testCase for checking declining unused offers
    
    (cherry picked from commit b043c27)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    jongyoul authored and pwendell committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    10e4339 View commit details
    Browse the repository at this point in the history
  9. [SQL] Compute timeTaken correctly

    ```timeTaken``` should not count the time of printing result.
    
    Author: w00228970 <wangfei1@huawei.com>
    
    Closes #3423 from scwf/time-taken-bug and squashes the following commits:
    
    da7e102 [w00228970] compute time taken correctly
    
    (cherry picked from commit 723be60)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    scwf authored and rxin committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    259cb26 View commit details
    Browse the repository at this point in the history
  10. [DOC][Build] Wrong cmd for build spark with apache hadoop 2.4.X and h…

    …ive 12
    
    Author: wangfei <wangfei1@huawei.com>
    
    Closes #3335 from scwf/patch-10 and squashes the following commits:
    
    d343113 [wangfei] add '-Phive'
    60d595e [wangfei] [DOC] Wrong cmd for build spark with apache hadoop 2.4.X and Hive 12 support
    
    (cherry picked from commit 0fe54cf)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    scwf authored and pwendell committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    1f4d1ac View commit details
    Browse the repository at this point in the history
  11. [SPARK-4596][MLLib] Refactorize Normalizer to make code cleaner

    In this refactoring, the performance will be slightly increased due to removing
    the overhead from breeze vector. The bottleneck is still in breeze norm
    which is implemented by activeIterator.
    
    This inefficiency of breeze norm will be addressed in next PR. At least,
    this PR makes the code more consistent in the codebase.
    
    Author: DB Tsai <dbtsai@alpinenow.com>
    
    Closes #3446 from dbtsai/normalizer and squashes the following commits:
    
    e20a2b9 [DB Tsai] first commit
    
    (cherry picked from commit 89f9122)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    DB Tsai authored and mengxr committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    7457199 View commit details
    Browse the repository at this point in the history
  12. [SPARK-4526][MLLIB]GradientDescent get a wrong gradient value accordi…

    …ng to the gradient formula.
    
    This is caused by the miniBatchSize parameter.The number of `RDD.sample` returns is not fixed.
    cc mengxr
    
    Author: GuoQiang Li <witgo@qq.com>
    
    Closes #3399 from witgo/GradientDescent and squashes the following commits:
    
    13cb228 [GuoQiang Li] review commit
    668ab66 [GuoQiang Li] Double to Long
    b6aa11a [GuoQiang Li] Check miniBatchSize is greater than 0
    0b5c3e3 [GuoQiang Li] Minor fix
    12e7424 [GuoQiang Li] GradientDescent get a wrong gradient value according to the gradient formula, which is caused by the miniBatchSize parameter.
    
    (cherry picked from commit f515f94)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    witgo authored and mengxr committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    d117f8f View commit details
    Browse the repository at this point in the history
  13. [SPARK-4535][Streaming] Fix the error in comments

    change `NetworkInputDStream` to `ReceiverInputDStream`
    change `ReceiverInputTracker` to `ReceiverTracker`
    
    Author: q00251598 <qiyadong@huawei.com>
    
    Closes #3400 from watermen/fix-comments and squashes the following commits:
    
    75d795c [q00251598] change 'NetworkInputDStream' to 'ReceiverInputDStream' && change 'ReceiverInputTracker' to 'ReceiverTracker'
    
    (cherry picked from commit a51118a)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    
    Conflicts:
    	examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala
    watermen authored and tdas committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    42b9d0d View commit details
    Browse the repository at this point in the history
  14. [SPARK-4381][Streaming]Add warning log when user set spark.master to …

    …local in Spark Streaming and there's no job executed
    
    Author: jerryshao <saisai.shao@intel.com>
    
    Closes #3244 from jerryshao/SPARK-4381 and squashes the following commits:
    
    d2486c7 [jerryshao] Improve the warning log
    d726e85 [jerryshao] Add local[1] to the filter condition
    eca428b [jerryshao] Add warning log
    
    (cherry picked from commit fef27b2)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    jerryshao authored and tdas committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    b026546 View commit details
    Browse the repository at this point in the history
  15. [SPARK-4344][DOCS] adding documentation on spark.yarn.user.classpath.…

    …first
    
    The documentation for the two parameters is the same with a pointer from the standalone parameter to the yarn parameter
    
    Author: arahuja <aahuja11@gmail.com>
    
    Closes #3209 from arahuja/yarn-classpath-first-param and squashes the following commits:
    
    51cb9b2 [arahuja] [SPARK-4344][DOCS] adding documentation for YARN on userClassPathFirst
    
    (cherry picked from commit d240760)
    Signed-off-by: Thomas Graves <tgraves@apache.org>
    arahuja authored and tgravescs committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    a689ab9 View commit details
    Browse the repository at this point in the history
  16. [SPARK-4601][Streaming] Set correct call site for streaming jobs so t…

    …hat it is displayed correctly on the Spark UI
    
    When running the NetworkWordCount, the description of the word count jobs are set as "getCallsite at DStream:xxx" . This should be set to the line number of the streaming application that has the output operation that led to the job being created. This is because the callsite is incorrectly set in the thread launching the jobs. This PR fixes that.
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    
    Closes #3455 from tdas/streaming-callsite-fix and squashes the following commits:
    
    69fc26f [Tathagata Das] Set correct call site for streaming jobs so that it is displayed correctly on the Spark UI
    
    (cherry picked from commit 69cd53e)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    tdas committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    96f76fc View commit details
    Browse the repository at this point in the history
  17. [SPARK-4581][MLlib] Refactorize StandardScaler to improve the transfo…

    …rmation performance
    
    The following optimizations are done to improve the StandardScaler model
    transformation performance.
    
    1) Covert Breeze dense vector to primitive vector to reduce the overhead.
    2) Since mean can be potentially a sparse vector, we explicitly convert it to dense primitive vector.
    3) Have a local reference to `shift` and `factor` array so JVM can locate the value with one operation call.
    4) In pattern matching part, we use the mllib SparseVector/DenseVector instead of breeze's vector to
    make the codebase cleaner.
    
    Benchmark with mnist8m dataset:
    
    Before,
    DenseVector withMean and withStd: 50.97secs
    DenseVector withMean and withoutStd: 42.11secs
    DenseVector withoutMean and withStd: 8.75secs
    SparseVector withoutMean and withStd: 5.437secs
    
    With this PR,
    DenseVector withMean and withStd: 5.76secs
    DenseVector withMean and withoutStd: 5.28secs
    DenseVector withoutMean and withStd: 5.30secs
    SparseVector withoutMean and withStd: 1.27secs
    
    Note that without the local reference copy of `factor` and `shift` arrays,
    the runtime is almost three time slower.
    
    DenseVector withMean and withStd: 18.15secs
    DenseVector withMean and withoutStd: 18.05secs
    DenseVector withoutMean and withStd: 18.54secs
    SparseVector withoutMean and withStd: 2.01secs
    
    The following code,
    ```scala
    while (i < size) {
       values(i) = (values(i) - shift(i)) * factor(i)
       i += 1
    }
    ```
    will generate the bytecode
    ```
       L13
        LINENUMBER 106 L13
       FRAME FULL [org/apache/spark/mllib/feature/StandardScalerModel org/apache/spark/mllib/linalg/Vector org/apache/spark/mllib/linalg/Vector org/apache/spark/mllib/linalg/DenseVector T [D I I] []
        ILOAD 7
        ILOAD 6
        IF_ICMPGE L14
       L15
        LINENUMBER 107 L15
        ALOAD 5
        ILOAD 7
        ALOAD 5
        ILOAD 7
        DALOAD
        ALOAD 0
        INVOKESPECIAL org/apache/spark/mllib/feature/StandardScalerModel.shift ()[D
        ILOAD 7
        DALOAD
        DSUB
        ALOAD 0
        INVOKESPECIAL org/apache/spark/mllib/feature/StandardScalerModel.factor ()[D
        ILOAD 7
        DALOAD
        DMUL
        DASTORE
       L16
        LINENUMBER 108 L16
        ILOAD 7
        ICONST_1
        IADD
        ISTORE 7
        GOTO L13
    ```
    , while with the local reference of the `shift` and `factor` arrays, the bytecode will be
    ```
       L14
        LINENUMBER 107 L14
        ALOAD 0
        INVOKESPECIAL org/apache/spark/mllib/feature/StandardScalerModel.factor ()[D
        ASTORE 9
       L15
        LINENUMBER 108 L15
       FRAME FULL [org/apache/spark/mllib/feature/StandardScalerModel org/apache/spark/mllib/linalg/Vector [D org/apache/spark/mllib/linalg/Vector org/apache/spark/mllib/linalg/DenseVector T [D I I [D] []
        ILOAD 8
        ILOAD 7
        IF_ICMPGE L16
       L17
        LINENUMBER 109 L17
        ALOAD 6
        ILOAD 8
        ALOAD 6
        ILOAD 8
        DALOAD
        ALOAD 2
        ILOAD 8
        DALOAD
        DSUB
        ALOAD 9
        ILOAD 8
        DALOAD
        DMUL
        DASTORE
       L18
        LINENUMBER 110 L18
        ILOAD 8
        ICONST_1
        IADD
        ISTORE 8
        GOTO L15
    ```
    
    You can see that with local reference, the both of the arrays will be in the stack, so JVM can access the value without calling `INVOKESPECIAL`.
    
    Author: DB Tsai <dbtsai@alpinenow.com>
    
    Closes #3435 from dbtsai/standardscaler and squashes the following commits:
    
    85885a9 [DB Tsai] revert to have lazy in shift array.
    daf2b06 [DB Tsai] Address the feedback
    cdb5cef [DB Tsai] small change
    9c51eef [DB Tsai] style
    fc795e4 [DB Tsai] update
    5bffd3d [DB Tsai] first commit
    
    (cherry picked from commit bf1a6aa)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    DB Tsai authored and mengxr committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    1e356a8 View commit details
    Browse the repository at this point in the history
  18. [SPARK-4196][SPARK-4602][Streaming] Fix serialization issue in PairDS…

    …treamFunctions.saveAsNewAPIHadoopFiles
    
    Solves two JIRAs in one shot
    - Makes the ForechDStream created by saveAsNewAPIHadoopFiles serializable for checkpoints
    - Makes the default configuration object used saveAsNewAPIHadoopFiles be the Spark's hadoop configuration
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    
    Closes #3457 from tdas/savefiles-fix and squashes the following commits:
    
    bb4729a [Tathagata Das] Same treatment for saveAsHadoopFiles
    b382ea9 [Tathagata Das] Fix serialization issue in PairDStreamFunctions.saveAsNewAPIHadoopFiles.
    
    (cherry picked from commit 8838ad7)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    tdas committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    a9944c8 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    a2c01ae View commit details
    Browse the repository at this point in the history
  20. [SPARK-4592] Avoid duplicate worker registrations in standalone mode

    **Summary.** On failover, the Master may receive duplicate registrations from the same worker, causing the worker to exit. This is caused by this commit 4afe9a4, which adds logic for the worker to re-register with the master in case of failures. However, the following race condition may occur:
    
    (1) Master A fails and Worker attempts to reconnect to all masters
    (2) Master B takes over and notifies Worker
    (3) Worker responds by registering with Master B
    (4) Meanwhile, Worker's previous reconnection attempt reaches Master B, causing the same Worker to register with Master B twice
    
    **Fix.** Instead of attempting to register with all known masters, the worker should re-register with only the one that it has been communicating with. This is safe because the fact that a failover has occurred means the old master must have died. Then, when the worker is finally notified of a new master, it gives up on the old one in favor of the new one.
    
    **Caveat.** Even this fix is subject to more obscure race conditions. For instance, if Master B fails and Master A recovers immediately, then Master A may still observe duplicate worker registrations. However, this and other potential race conditions summarized in [SPARK-4592](https://issues.apache.org/jira/browse/SPARK-4592), are much, much less likely than the one described above, which is deterministically reproducible.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #3447 from andrewor14/standalone-failover and squashes the following commits:
    
    0d9716c [Andrew Or] Move re-registration logic to actor for thread-safety
    79286dc [Andrew Or] Preserve old behavior for initial retries
    83b321c [Andrew Or] Tweak wording
    1fce6a9 [Andrew Or] Active master actor could be null in the beginning
    b6f269e [Andrew Or] Avoid duplicate worker registrations
    
    (cherry picked from commit 1b2ab1c)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    Andrew Or committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    ee03175 View commit details
    Browse the repository at this point in the history
  21. [SPARK-4546] Improve HistoryServer first time user experience

    The documentation points the user to run the following
    ```
    sbin/start-history-server.sh
    ```
    The first thing this does is throw an exception that complains a log directory is not specified. The exception message itself does not say anything about what to set. Instead we should have a default and a landing page with a better message. The new default log directory is `file:/tmp/spark-events`.
    
    This is what it looks like as of this PR:
    
    ![after](https://issues.apache.org/jira/secure/attachment/12682985/after.png)
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #3411 from andrewor14/minor-history-improvements and squashes the following commits:
    
    f33d6b3 [Andrew Or] Point user to set config if default log dir does not exist
    fc4c17a [Andrew Or] Improve HistoryServer UX
    
    (cherry picked from commit 9afcbe4)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    Andrew Or committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    58c840d View commit details
    Browse the repository at this point in the history
  22. Fix SPARK-4471: blockManagerIdFromJson function throws exception whil…

    …e B...
    
    Fix [SPARK-4471](https://issues.apache.org/jira/browse/SPARK-4471): blockManagerIdFromJson function throws exception while BlockManagerId be null in MetadataFetchFailedException
    
    Author: hushan[胡珊] <hushan@xiaomi.com>
    
    Closes #3340 from suyanNone/fix-blockmanagerId-jnothing-2 and squashes the following commits:
    
    159f9a3 [hushan[胡珊]] Refine test code for blockmanager is null
    4380d73 [hushan[胡珊]] remove useless blank line
    3ccf651 [hushan[胡珊]] Fix SPARK-4471: blockManagerIdFromJson function throws exception while metadata fetch failed
    
    (cherry picked from commit 9bdf5da)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    suyanNone authored and Andrew Or committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    93b914d View commit details
    Browse the repository at this point in the history

Commits on Nov 26, 2014

  1. [Spark-4509] Revert EC2 tag-based cluster membership patch

    This PR reverts changes related to tag-based cluster membership. As discussed in SPARK-3332, we didn't figure out a safe strategy to use tags to determine cluster membership, because tagging is not atomic. The following changes are reverted:
    
    SPARK-2333: 94053a7
    SPARK-3213: 7faf755
    SPARK-3608: 78d4220.
    
    I tested launch, login, and destroy. It is easy to check the diff by comparing it to Josh's patch for branch-1.1:
    
    https://github.com/apache/spark/pull/2225/files
    
    JoshRosen I sent the PR to master. It might be easier for us to keep master and branch-1.2 the same at this time. We can always re-apply the patch once we figure out a stable solution.
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3453 from mengxr/SPARK-4509 and squashes the following commits:
    
    f0b708b [Xiangrui Meng] revert 94053a7
    4298ea5 [Xiangrui Meng] revert 7faf755
    35963a1 [Xiangrui Meng] Revert "SPARK-3608 Break if the instance tag naming succeeds"
    
    (cherry picked from commit 7eba0fb)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    mengxr authored and Andrew Or committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    a48ea3c View commit details
    Browse the repository at this point in the history
  2. [SPARK-4583] [mllib] LogLoss for GradientBoostedTrees fix + doc updates

    Currently, the LogLoss used by GradientBoostedTrees has 2 issues:
    * the gradient (and therefore loss) does not match that used by Friedman (1999)
    * the error computation uses 0/1 accuracy, not log loss
    
    This PR updates LogLoss.
    It also adds some doc for boosting and forests.
    
    I tested it on sample data and made sure the log loss is monotonically decreasing with each boosting iteration.
    
    CC: mengxr manishamde codedeft
    
    Author: Joseph K. Bradley <joseph@databricks.com>
    
    Closes #3439 from jkbradley/gbt-loss-fix and squashes the following commits:
    
    cfec17e [Joseph K. Bradley] removed forgotten temp comments
    a27eb6d [Joseph K. Bradley] corrections to last log loss commit
    ed5da2c [Joseph K. Bradley] updated LogLoss (boosting) for numerical stability
    5e52bff [Joseph K. Bradley] * Removed the 1/2 from SquaredError.  This also required updating the test suite since it effectively doubles the gradient and loss. * Added doc for developers within RandomForest. * Small cleanup in test suite (generating data only once)
    e57897a [Joseph K. Bradley] Fixed LogLoss for GradientBoostedTrees, and updated doc for losses, forests, and boosting
    
    (cherry picked from commit c251fd7)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    jkbradley authored and mengxr committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    6880b46 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    37d58aa View commit details
    Browse the repository at this point in the history
  4. [SPARK-4604][MLLIB] make MatrixFactorizationModel public

    User could construct an MF model directly. I added a note about the performance.
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3459 from mengxr/SPARK-4604 and squashes the following commits:
    
    f64bcd3 [Xiangrui Meng] organize imports
    ed08214 [Xiangrui Meng] check preconditions and unit tests
    a624c12 [Xiangrui Meng] make MatrixFactorizationModel public
    
    (cherry picked from commit b5fb141)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    mengxr committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    2756d0d View commit details
    Browse the repository at this point in the history
  5. [SPARK-4516] Cap default number of Netty threads at 8

    In practice, only 2-4 cores should be required to transfer roughly 10 Gb/s, and each core that we use will have an initial overhead of roughly 32 MB of off-heap memory, which comes at a premium.
    
    Thus, this value should still retain maximum throughput and reduce wasted off-heap memory allocation. It can be overridden by setting the number of serverThreads and clientThreads manually in Spark's configuration.
    
    Author: Aaron Davidson <aaron@databricks.com>
    
    Closes #3469 from aarondav/fewer-pools2 and squashes the following commits:
    
    087c59f [Aaron Davidson] [SPARK-4516] Cap default number of Netty threads at 8
    
    (cherry picked from commit f5f2d27)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    aarondav authored and pwendell committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    1e12f59 View commit details
    Browse the repository at this point in the history
  6. Revert "Preparing development version 1.2.1-SNAPSHOT"

    This reverts commit d7ac601.
    pwendell committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    b028aaf View commit details
    Browse the repository at this point in the history
  7. Revert "Preparing Spark release v1.2.0-snapshot1"

    This reverts commit 38c1fbd.
    pwendell committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    0127178 View commit details
    Browse the repository at this point in the history
  8. Preparing Spark release v1.2.0-rc1

    Ubuntu committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    db7f4a8 View commit details
    Browse the repository at this point in the history
  9. Preparing development version 1.2.1-SNAPSHOT

    Ubuntu committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    d7b1ecb View commit details
    Browse the repository at this point in the history
  10. Revert "Preparing development version 1.2.1-SNAPSHOT"

    This reverts commit d7b1ecb.
    pwendell committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    68a217c View commit details
    Browse the repository at this point in the history
  11. Revert "Preparing Spark release v1.2.0-rc1"

    This reverts commit db7f4a8.
    pwendell committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    ce6200b View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    5247dd8 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    79df6b4 View commit details
    Browse the repository at this point in the history
  14. Revert "Preparing development version 1.2.1-SNAPSHOT"

    This reverts commit 79df6b4.
    pwendell committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    37bc7a8 View commit details
    Browse the repository at this point in the history
  15. Revert "Preparing Spark release v1.2.0-rc1"

    This reverts commit 5247dd8.
    pwendell committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    de8029b View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    dfb8c65 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    cc2c05e View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    380eba5 View commit details
    Browse the repository at this point in the history
  19. [SPARK-4516] Avoid allocating Netty PooledByteBufAllocators unnecessa…

    …rily
    
    Turns out we are allocating an allocator pool for every TransportClient (which means that the number increases with the number of nodes in the cluster), when really we should just reuse one for all clients.
    
    This patch, as expected, greatly decreases off-heap memory allocation, and appears to make allocation only proportional to the number of cores.
    
    Author: Aaron Davidson <aaron@databricks.com>
    
    Closes #3465 from aarondav/fewer-pools and squashes the following commits:
    
    36c49da [Aaron Davidson] [SPARK-4516] Avoid allocating unnecessarily Netty PooledByteBufAllocators
    
    (cherry picked from commit 346bc17)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    aarondav authored and pwendell committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    c7185f0 View commit details
    Browse the repository at this point in the history
  20. Revert "Preparing development version 1.2.1-SNAPSHOT"

    This reverts commit 380eba5.
    pwendell committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    537d699 View commit details
    Browse the repository at this point in the history
  21. Revert "Preparing Spark release v1.2.0-rc1"

    This reverts commit cc2c05e.
    pwendell committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    8f5ebcb View commit details
    Browse the repository at this point in the history
  22. Revert "[SPARK-4583] [mllib] LogLoss for GradientBoostedTrees fix + d…

    …oc updates"
    
    This reverts commit 6880b46.
    pwendell committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    17a4b8e View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    69d021b View commit details
    Browse the repository at this point in the history
  24. [SPARK-4612] Reduce task latency and increase scheduling throughput b…

    …y making configuration initialization lazy
    
    https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L337 creates a configuration object for every task that is launched, even if there is no new dependent file/JAR to update. This is a heavy-weight creation that should be avoided if there is no new file/JAR to update. This PR makes that creation lazy. Quick local test in spark-perf scheduling throughput tests gives the following numbers in a local standalone scheduler mode.
    1 job with 10000 tasks: before 7.8395 seconds, after 2.6415 seconds = 3x increase in task scheduling throughput
    
    pwendell JoshRosen
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    
    Closes #3463 from tdas/lazy-config and squashes the following commits:
    
    c791c1e [Tathagata Das] Reduce task latency by making configuration initialization lazy
    
    (cherry picked from commit e7f4d25)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    tdas authored and rxin committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    e866972 View commit details
    Browse the repository at this point in the history
  25. Removing confusing TripletFields

    After additional discussion with rxin, I think having all the possible `TripletField` options is confusing.  This pull request reduces the triplet fields to:
    
    ```java
      /**
       * None of the triplet fields are exposed.
       */
      public static final TripletFields None = new TripletFields(false, false, false);
    
      /**
       * Expose only the edge field and not the source or destination field.
       */
      public static final TripletFields EdgeOnly = new TripletFields(false, false, true);
    
      /**
       * Expose the source and edge fields but not the destination field. (Same as Src)
       */
      public static final TripletFields Src = new TripletFields(true, false, true);
    
      /**
       * Expose the destination and edge fields but not the source field. (Same as Dst)
       */
      public static final TripletFields Dst = new TripletFields(false, true, true);
    
      /**
       * Expose all the fields (source, edge, and destination).
       */
      public static final TripletFields All = new TripletFields(true, true, true);
    ```
    
    Author: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com>
    
    Closes #3472 from jegonzal/SimplifyTripletFields and squashes the following commits:
    
    91796b5 [Joseph E. Gonzalez] removing confusing triplet fields
    
    (cherry picked from commit 288ce58)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    jegonzal authored and rxin committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    9f3b159 View commit details
    Browse the repository at this point in the history
  26. [BRANCH-1.2][SPARK-4604][MLLIB] make MatrixFactorizationModel public

    We reverted #3459 in branch-1.2 due to missing `import o.a.s.SparkContext._`, which is no longer needed in master (#3262). This PR adds #3459 back to branch-1.2 with correct imports.
    
    Github is out-of-sync now. The real changes are the last two commits.
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3473 from mengxr/SPARK-4604-1.2 and squashes the following commits:
    
    a7638a5 [Xiangrui Meng] add import o.a.s.SparkContext._ for v1.2
    b749000 [Xiangrui Meng] [SPARK-4604][MLLIB] make MatrixFactorizationModel public
    mengxr committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    9b63900 View commit details
    Browse the repository at this point in the history
  27. [BRANCH-1.2][SPARK-4614][MLLIB] Slight API changes in Matrix and Matr…

    …ices
    
    This is #3468 for branch-1.2, same content except mima excludes.
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3482 from mengxr/SPARK-4614-1.2 and squashes the following commits:
    
    ea4f08d [Xiangrui Meng] hide transposeMultiply; add rng to rand and randn; add unit tests
    mengxr committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    8fc19e5 View commit details
    Browse the repository at this point in the history
  28. [BRANCH-1.2][SPARK-4583][MLLIB] LogLoss for GradientBoostedTrees fix …

    …+ doc updates
    
    We reverted #3439 in branch-1.2 due to missing `import o.a.s.SparkContext._`, which is no longer needed in master (#3262). This PR adds #3439 back to branch-1.2 with correct imports.
    
    Github is out-of-sync now. The real changes are the last two commits.
    
    Author: Joseph K. Bradley <joseph@databricks.com>
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3474 from mengxr/SPARK-4583-1.2 and squashes the following commits:
    
    aca2abb [Xiangrui Meng] add import o.a.s.SparkContext._ for v1.2
    6b5564a [Joseph K. Bradley] [SPARK-4583] [mllib] LogLoss for GradientBoostedTrees fix + doc updates
    jkbradley authored and mengxr committed Nov 26, 2014
    Configuration menu
    Copy the full SHA
    69550f7 View commit details
    Browse the repository at this point in the history

Commits on Nov 27, 2014

  1. [SPARK-732][SPARK-3628][CORE][RESUBMIT] eliminate duplicate update on…

    … accmulator
    
    https://issues.apache.org/jira/browse/SPARK-3628
    
    In current implementation, the accumulator will be updated for every successfully finished task, even the task is from a resubmitted stage, which makes the accumulator counter-intuitive
    
    In this patch, I changed the way for the DAGScheduler to update the accumulator,
    
    DAGScheduler maintains a HashTable, mapping the stage id to the received <accumulator_id , value> pairs. Only when the stage becomes independent, (no job needs it any more), we accumulate the values of the <accumulator_id , value> pairs, when a task finished, we check if the HashTable has contained such stageId, it saves the accumulator_id, value only when the task is the first finished task of a new stage or the stage is running for the first attempt...
    
    Author: CodingCat <zhunansjtu@gmail.com>
    
    Closes #2524 from CodingCat/SPARK-732-1 and squashes the following commits:
    
    701a1e8 [CodingCat] roll back change on Accumulator.scala
    1433e6f [CodingCat] make MIMA happy
    b233737 [CodingCat] address Matei's comments
    02261b8 [CodingCat] rollback  some changes
    6b0aff9 [CodingCat] update document
    2b2e8cf [CodingCat] updateAccumulator
    83b75f8 [CodingCat] style fix
    84570d2 [CodingCat] re-enable  the bad accumulator guard
    1e9e14d [CodingCat] add NPE guard
    21b6840 [CodingCat] simplify the patch
    88d1f03 [CodingCat] fix rebase error
    f74266b [CodingCat] add test case for resubmitted result stage
    5cf586f [CodingCat] de-duplicate on task level
    138f9b3 [CodingCat] make MIMA happy
    67593d2 [CodingCat] make if allowing duplicate update as an option of accumulator
    
    (cherry picked from commit 5af53ad)
    Signed-off-by: Matei Zaharia <matei@databricks.com>
    CodingCat authored and mateiz committed Nov 27, 2014
    Configuration menu
    Copy the full SHA
    66cc243 View commit details
    Browse the repository at this point in the history
  2. [Release] Automate generation of contributors list

    This commit provides a script that computes the contributors list
    by linking the github commits with JIRA issues. Automatically
    translating github usernames remains a TODO at this point.
    Andrew Or committed Nov 27, 2014
    Configuration menu
    Copy the full SHA
    a0aa07b View commit details
    Browse the repository at this point in the history
  3. [SPARK-4626] Kill a task only if the executorId is (still) registered…

    … with the scheduler
    
    Author: roxchkplusony <roxchkplusony@gmail.com>
    
    Closes #3483 from roxchkplusony/bugfix/4626 and squashes the following commits:
    
    aba9184 [roxchkplusony] replace warning message per review
    5e7fdea [roxchkplusony] [SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler
    
    (cherry picked from commit 84376d3)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    roxchkplusony authored and rxin committed Nov 27, 2014
    Configuration menu
    Copy the full SHA
    bfba8bf View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2014

  1. [SPARK-4613][Core] Java API for JdbcRDD

    This PR introduces a set of Java APIs for using `JdbcRDD`:
    
    1. Trait (interface) `JdbcRDD.ConnectionFactory`: equivalent to the `getConnection: () => Connection` parameter in `JdbcRDD` constructor.
    2. Two overloaded versions of `Jdbc.create`: used to create `JavaRDD` that wraps a `JdbcRDD`.
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3478)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #3478 from liancheng/japi-jdbc-rdd and squashes the following commits:
    
    9a54625 [Cheng Lian] Only shutdowns a single DB rather than the whole Derby driver
    d4cedc5 [Cheng Lian] Moves Java JdbcRDD test case to a separate test suite
    ffcdf2e [Cheng Lian] Java API for JdbcRDD
    
    (cherry picked from commit 120a350)
    Signed-off-by: Matei Zaharia <matei@databricks.com>
    liancheng authored and mateiz committed Nov 28, 2014
    Configuration menu
    Copy the full SHA
    0928004 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4619][Storage]delete redundant time suffix

    Time suffix exists in Utils.getUsedTimeMs(startTime), no need to append again, delete that
    
    Author: maji2014 <maji3@asiainfo.com>
    
    Closes #3475 from maji2014/SPARK-4619 and squashes the following commits:
    
    df0da4e [maji2014] delete redundant time suffix
    
    (cherry picked from commit ceb6281)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    maji2014 authored and rxin committed Nov 28, 2014
    Configuration menu
    Copy the full SHA
    e924426 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4308][SQL] Sets SQL operation state to ERROR when exception is…

    … thrown
    
    In `HiveThriftServer2`, when an exception is thrown during a SQL execution, the SQL operation state should be set to `ERROR`, but now it remains `RUNNING`. This affects the result of the `GetOperationStatus` Thrift API.
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #3175 from liancheng/fix-op-state and squashes the following commits:
    
    6d4c1fe [Cheng Lian] Sets SQL operation state to ERROR when exception is thrown
    liancheng authored and pwendell committed Nov 28, 2014
    Configuration menu
    Copy the full SHA
    7fa5fff View commit details
    Browse the repository at this point in the history
  4. [SPARK-4645][SQL] Disables asynchronous execution in Hive 0.13.1 Hive…

    …ThriftServer2
    
    This PR disables HiveThriftServer2 asynchronous execution by setting `runInBackground` argument in `ExecuteStatementOperation` to `false`, and reverting `SparkExecuteStatementOperation.run` in Hive 13 shim to Hive 12 version. This change makes Simba ODBC driver v1.0.0.1000 work.
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3506)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #3506 from liancheng/disable-async-exec and squashes the following commits:
    
    593804d [Cheng Lian] Disables asynchronous execution in Hive 0.13.1 HiveThriftServer2
    liancheng authored and pwendell committed Nov 28, 2014
    Configuration menu
    Copy the full SHA
    8cf1227 View commit details
    Browse the repository at this point in the history
  5. [SPARK-4193][BUILD] Disable doclint in Java 8 to prevent from build e…

    …rror.
    
    Author: Takuya UESHIN <ueshin@happy-camper.st>
    
    Closes #3058 from ueshin/issues/SPARK-4193 and squashes the following commits:
    
    e096bb1 [Takuya UESHIN] Add a plugin declaration to pluginManagement.
    6762ec2 [Takuya UESHIN] Fix usage of -Xdoclint javadoc option.
    fdb280a [Takuya UESHIN] Fix Javadoc errors.
    4745f3c [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4193
    923e2f0 [Takuya UESHIN] Use doclint option `-missing` instead of `none`.
    30d6718 [Takuya UESHIN] Fix Javadoc errors.
    b548017 [Takuya UESHIN] Disable doclint in Java 8 to prevent from build error.
    
    (cherry picked from commit e464f0a)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    ueshin authored and pwendell committed Nov 28, 2014
    Configuration menu
    Copy the full SHA
    3219834 View commit details
    Browse the repository at this point in the history
  6. [SPARK-4584] [yarn] Remove security manager from Yarn AM.

    The security manager adds a lot of overhead to the runtime of the
    app, and causes a severe performance regression. Even stubbing out
    all unneeded methods (all except checkExit()) does not help.
    
    So, instead, penalize users who do an explicit System.exit() by leaving
    them in "undefined behavior" territory: if they do that, the Yarn
    backend won't be able to report the final app status to the RM.
    The result is that the final status of the application might not match
    the user's expectations.
    
    One side-effect of the change is that users who do an explicit
    System.exit() will lose the AM retry functionality. Since there is
    no way to know if the exit was because of success or failure, the
    AM right now errs on the side of it being a successful exit.
    
    Author: Marcelo Vanzin <vanzin@cloudera.com>
    
    Closes #3484 from vanzin/SPARK-4584 and squashes the following commits:
    
    21f2502 [Marcelo Vanzin] Do not retry apps that use System.exit().
    4198b3b [Marcelo Vanzin] [SPARK-4584] [yarn] Remove security manager from Yarn AM.
    
    (cherry picked from commit 915f8ee)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    Marcelo Vanzin authored and pwendell committed Nov 28, 2014
    Configuration menu
    Copy the full SHA
    8cec431 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    39c7d1c View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    fc7bff0 View commit details
    Browse the repository at this point in the history
  9. Revert "Preparing development version 1.2.1-SNAPSHOT"

    This reverts commit fc7bff0.
    pwendell committed Nov 28, 2014
    Configuration menu
    Copy the full SHA
    6e0269c View commit details
    Browse the repository at this point in the history
  10. Revert "Preparing Spark release v1.2.0-rc1"

    This reverts commit 39c7d1c.
    pwendell committed Nov 28, 2014
    Configuration menu
    Copy the full SHA
    88f1a6a View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    eb4d457 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    1056e9e View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    00316cc View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    3a4609e View commit details
    Browse the repository at this point in the history

Commits on Nov 29, 2014

  1. [SPARK-4597] Use proper exception and reset variable in Utils.createT…

    …empDir()
    
    `File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should be reset too.
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    
    Closes #3449 from viirya/fix_createtempdir and squashes the following commits:
    
    36cacbd [Liang-Chi Hsieh] Use proper exception and reset variable.
    
    (cherry picked from commit 49fe879)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    viirya authored and JoshRosen committed Nov 29, 2014
    Configuration menu
    Copy the full SHA
    854fade View commit details
    Browse the repository at this point in the history

Commits on Nov 30, 2014

  1. [DOCS][BUILD] Add instruction to use change-version-to-2.11.sh in 'Bu…

    …ilding for Scala 2.11'.
    
    To build with Scala 2.11, we have to execute `change-version-to-2.11.sh` before Maven execute, otherwise inter-module dependencies are broken.
    
    Author: Takuya UESHIN <ueshin@happy-camper.st>
    
    Closes #3361 from ueshin/docs/building-spark_2.11 and squashes the following commits:
    
    1d29126 [Takuya UESHIN] Add instruction to use change-version-to-2.11.sh in 'Building for Scala 2.11'.
    
    (cherry picked from commit 0fcd24c)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    ueshin authored and pwendell committed Nov 30, 2014
    Configuration menu
    Copy the full SHA
    e07dbd8 View commit details
    Browse the repository at this point in the history
  2. SPARK-2143 [WEB UI] Add Spark version to UI footer

    This PR adds the Spark version number to the UI footer; this is how it looks:
    
    ![screen shot 2014-11-21 at 22 58 40](https://cloud.githubusercontent.com/assets/822522/5157738/f4822094-7316-11e4-98f1-333a535fdcfa.png)
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #3410 from srowen/SPARK-2143 and squashes the following commits:
    
    e9b3a7a [Sean Owen] Add Spark version to footer
    srowen authored and JoshRosen committed Nov 30, 2014
    Configuration menu
    Copy the full SHA
    d324728 View commit details
    Browse the repository at this point in the history

Commits on Dec 1, 2014

  1. [SPARK-4656][Doc] Typo in Programming Guide markdown

    Grammatical error in Programming Guide document
    
    Author: lewuathe <lewuathe@me.com>
    
    Closes #3412 from Lewuathe/typo-programming-guide and squashes the following commits:
    
    a3e2f00 [lewuathe] Typo in Programming Guide markdown
    
    (cherry picked from commit a217ec5)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Lewuathe authored and JoshRosen committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    c899f03 View commit details
    Browse the repository at this point in the history
  2. [DOC] Fixes formatting typo in SQL programming guide

    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3498)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #3498 from liancheng/fix-sql-doc-typo and squashes the following commits:
    
    865ecd7 [Cheng Lian] Fixes formatting typo in SQL programming guide
    
    (cherry picked from commit 2a4d389)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    liancheng authored and JoshRosen committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    0f4dad4 View commit details
    Browse the repository at this point in the history
  3. SPARK-2192 [BUILD] Examples Data Not in Binary Distribution

    Simply, add data/ to distributions. This adds about 291KB (compressed) to the tarball, FYI.
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #3480 from srowen/SPARK-2192 and squashes the following commits:
    
    47688f1 [Sean Owen] Add data/ to distributions
    
    (cherry picked from commit 6384f42)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    srowen authored and mengxr committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    9b8a769 View commit details
    Browse the repository at this point in the history
  4. [SPARK-4661][Core] Minor code and docs cleanup

    Author: zsxwing <zsxwing@gmail.com>
    
    Closes #3521 from zsxwing/SPARK-4661 and squashes the following commits:
    
    03cbe3f [zsxwing] Minor code and docs cleanup
    
    (cherry picked from commit 30a86ac)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    zsxwing authored and rxin committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    67a2c13 View commit details
    Browse the repository at this point in the history
  5. Documentation: add description for repartitionAndSortWithinPartitions

    Author: Madhu Siddalingaiah <madhu@madhu.com>
    
    Closes #3390 from msiddalingaiah/master and squashes the following commits:
    
    cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again)
    332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code>
    cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
    0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions
    
    (cherry picked from commit 2b233f5)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    msiddalingaiah authored and JoshRosen committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    35bc338 View commit details
    Browse the repository at this point in the history
  6. [SPARK-4258][SQL][DOC] Documents spark.sql.parquet.filterPushdown

    Documents `spark.sql.parquet.filterPushdown`, explains why it's turned off by default and when it's safe to be turned on.
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3440)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #3440 from liancheng/parquet-filter-pushdown-doc and squashes the following commits:
    
    2104311 [Cheng Lian] Documents spark.sql.parquet.filterPushdown
    
    (cherry picked from commit 5db8dca)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    liancheng authored and marmbrus committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    9c9b4bd View commit details
    Browse the repository at this point in the history
  7. [SQL] add @group tab in limit() and count()

    group tab is missing for scaladoc
    
    Author: Jacky Li <jacky.likun@gmail.com>
    
    Closes #3458 from jackylk/patch-7 and squashes the following commits:
    
    0121a70 [Jacky Li] add @group tab in limit() and count()
    
    (cherry picked from commit bafee67)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    Jacky Li authored and marmbrus committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    e0a6d36 View commit details
    Browse the repository at this point in the history
  8. [SPARK-4358][SQL] Let BigDecimal do checking type compatibility

    Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility.
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    
    Closes #3208 from viirya/more_numericLit and squashes the following commits:
    
    e9834b4 [Liang-Chi Hsieh] Remove byte and short types for number literal.
    1bd1825 [Liang-Chi Hsieh] Fix Indentation and make the modification clearer.
    cf1a997 [Liang-Chi Hsieh] Modified for comment to add a rule of analysis that adds a cast.
    91fe489 [Liang-Chi Hsieh] add Byte and Short.
    1bdc69d [Liang-Chi Hsieh] Let BigDecimal do checking type compatibility.
    
    (cherry picked from commit b57365a)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    viirya authored and marmbrus committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    f2bb90a View commit details
    Browse the repository at this point in the history
  9. [SPARK-4650][SQL] Supporting multi column support in countDistinct fu…

    …nction like count(distinct c1,c2..) in Spark SQL
    
    Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL
    
    Author: ravipesala <ravindra.pesala@huawei.com>
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #3511 from ravipesala/countdistinct and squashes the following commits:
    
    cc4dbb1 [ravipesala] style
    070e12a [ravipesala] Supporting multi column support in count(distinct c1,c2..) in Spark SQL
    
    (cherry picked from commit 6a9ff19)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    ravipesala authored and marmbrus committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    5006aab View commit details
    Browse the repository at this point in the history
  10. [SPARK-4658][SQL] Code documentation issue in DDL of datasource API

    Author: ravipesala <ravindra.pesala@huawei.com>
    
    Closes #3516 from ravipesala/ddl_doc and squashes the following commits:
    
    d101fdf [ravipesala] Style issues fixed
    d2238cd [ravipesala] Corrected documentation
    
    (cherry picked from commit bc35381)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    ravipesala authored and marmbrus committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    b39cfee View commit details
    Browse the repository at this point in the history
  11. [SQL] Minor fix for doc and comment

    Author: wangfei <wangfei1@huawei.com>
    
    Closes #3533 from scwf/sql-doc1 and squashes the following commits:
    
    962910b [wangfei] doc and comment fix
    
    (cherry picked from commit 7b79957)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    scwf authored and marmbrus committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    31cf51b View commit details
    Browse the repository at this point in the history
  12. [SQL][DOC] Date type in SQL programming guide

    Author: Daoyuan Wang <daoyuan.wang@intel.com>
    
    Closes #3535 from adrian-wang/datedoc and squashes the following commits:
    
    18ff1ed [Daoyuan Wang] [DOC] Date type
    
    (cherry picked from commit 5edbcbf)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    adrian-wang authored and marmbrus committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    e66f816 View commit details
    Browse the repository at this point in the history

Commits on Dec 2, 2014

  1. [SPARK-4529] [SQL] support view with column alias

    Support view definition like
    
    CREATE VIEW view3(valoo)
    TBLPROPERTIES ("fear" = "factor")
    AS SELECT upper(value) FROM src WHERE key=86;
    
    [valoo as the alias of upper(value)]. This is missing part of SPARK-4239, for a fully view support.
    
    Author: Daoyuan Wang <daoyuan.wang@intel.com>
    
    Closes #3396 from adrian-wang/viewcolumn and squashes the following commits:
    
    4d001d0 [Daoyuan Wang] support view with column alias
    
    (cherry picked from commit 4df60a8)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    adrian-wang authored and marmbrus committed Dec 2, 2014
    Configuration menu
    Copy the full SHA
    445fc95 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4611][MLlib] Implement the efficient vector norm

    The vector norm in breeze is implemented by `activeIterator` which is known to be very slow.
    In this PR, an efficient vector norm is implemented, and with this API, `Normalizer` and
    `k-means` have big performance improvement.
    
    Here is the benchmark against mnist8m dataset.
    
    a) `Normalizer`
    Before
    DenseVector: 68.25secs
    SparseVector: 17.01secs
    
    With this PR
    DenseVector: 12.71secs
    SparseVector: 2.73secs
    
    b) `k-means`
    Before
    DenseVector: 83.46secs
    SparseVector: 61.60secs
    
    With this PR
    DenseVector: 70.04secs
    SparseVector: 59.05secs
    
    Author: DB Tsai <dbtsai@alpinenow.com>
    
    Closes #3462 from dbtsai/norm and squashes the following commits:
    
    63c7165 [DB Tsai] typo
    0c3637f [DB Tsai] add import org.apache.spark.SparkContext._ back
    6fa616c [DB Tsai] address feedback
    9b7cb56 [DB Tsai] move norm to static method
    0b632e6 [DB Tsai] kmeans
    dbed124 [DB Tsai] style
    c1a877c [DB Tsai] first commit
    
    (cherry picked from commit 64f3175)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    DB Tsai authored and mengxr committed Dec 2, 2014
    Configuration menu
    Copy the full SHA
    3783e15 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4686] Link to allowed master URLs is broken

    The link points to the old scala programming guide; it should point to the submitting applications page.
    
    This should be backported to 1.1.2 (it's been broken as of 1.0).
    
    Author: Kay Ousterhout <kayousterhout@gmail.com>
    
    Closes #3542 from kayousterhout/SPARK-4686 and squashes the following commits:
    
    a8fc43b [Kay Ousterhout] [SPARK-4686] Link to allowed master URLs is broken
    
    (cherry picked from commit d9a148b)
    Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
    kayousterhout committed Dec 2, 2014
    Configuration menu
    Copy the full SHA
    b97c27f View commit details
    Browse the repository at this point in the history
  4. [SPARK-4536][SQL] Add sqrt and abs to Spark SQL DSL

    Spark SQL has embeded sqrt and abs but DSL doesn't support those functions.
    
    Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    
    Closes #3401 from sarutak/dsl-missing-operator and squashes the following commits:
    
    07700cf [Kousuke Saruta] Modified Literal(null, NullType) to Literal(null) in DslQuerySuite
    8f366f8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator
    1b88e2e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator
    0396f89 [Kousuke Saruta] Added sqrt and abs to Spark SQL DSL
    
    (cherry picked from commit e75e04f)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    sarutak authored and marmbrus committed Dec 2, 2014
    Configuration menu
    Copy the full SHA
    1850d90 View commit details
    Browse the repository at this point in the history
  5. [SPARK-4663][sql]add finally to avoid resource leak

    Author: baishuo <vc_java@hotmail.com>
    
    Closes #3526 from baishuo/master-trycatch and squashes the following commits:
    
    d446e14 [baishuo] correct the code style
    b36bf96 [baishuo] correct the code style
    ae0e447 [baishuo] add finally to avoid resource leak
    
    (cherry picked from commit 69b6fed)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    baishuo authored and marmbrus committed Dec 2, 2014
    Configuration menu
    Copy the full SHA
    aa3d369 View commit details
    Browse the repository at this point in the history
  6. [SPARK-4676][SQL] JavaSchemaRDD.schema may throw NullType MatchError …

    …if sql has null
    
    val jsc = new org.apache.spark.api.java.JavaSparkContext(sc)
    val jhc = new org.apache.spark.sql.hive.api.java.JavaHiveContext(jsc)
    val nrdd = jhc.hql("select null from spark_test.for_test")
    println(nrdd.schema)
    Then the error is thrown as follows:
    scala.MatchError: NullType (of class org.apache.spark.sql.catalyst.types.NullType$)
    at org.apache.spark.sql.types.util.DataTypeConversions$.asJavaDataType(DataTypeConversions.scala:43)
    
    Author: YanTangZhai <hakeemzhai@tencent.com>
    Author: yantangzhai <tyz0303@163.com>
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #3538 from YanTangZhai/MatchNullType and squashes the following commits:
    
    e052dff [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null
    4b4bb34 [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null
    896c7b7 [yantangzhai] fix NullType MatchError in JavaSchemaRDD when sql has null
    6e643f8 [YanTangZhai] Merge pull request #11 from apache/master
    e249846 [YanTangZhai] Merge pull request #10 from apache/master
    d26d982 [YanTangZhai] Merge pull request #9 from apache/master
    76d4027 [YanTangZhai] Merge pull request #8 from apache/master
    03b62b0 [YanTangZhai] Merge pull request #7 from apache/master
    8a00106 [YanTangZhai] Merge pull request #6 from apache/master
    cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
    cdef539 [YanTangZhai] Merge pull request #1 from apache/master
    
    (cherry picked from commit 1066427)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    YanTangZhai authored and marmbrus committed Dec 2, 2014
    Configuration menu
    Copy the full SHA
    06129cd View commit details
    Browse the repository at this point in the history
  7. [SPARK-4593][SQL] Return null when denominator is 0

    SELECT max(1/0) FROM src
    would return a very large number, which is obviously not right.
    For hive-0.12, hive would return `Infinity` for 1/0, while for hive-0.13.1, it is `NULL` for 1/0.
    I think it is better to keep our behavior with newer Hive version.
    This PR ensures that when the divider is 0, the result of expression should be NULL, same with hive-0.13.1
    
    Author: Daoyuan Wang <daoyuan.wang@intel.com>
    
    Closes #3443 from adrian-wang/div and squashes the following commits:
    
    2e98677 [Daoyuan Wang] fix code gen for divide 0
    85c28ba [Daoyuan Wang] temp
    36236a5 [Daoyuan Wang] add test cases
    6f5716f [Daoyuan Wang] fix comments
    cee92bd [Daoyuan Wang] avoid evaluation 2 times
    22ecd9a [Daoyuan Wang] fix style
    cf28c58 [Daoyuan Wang] divide fix
    2dfe50f [Daoyuan Wang] return null when divider is 0 of Double type
    
    (cherry picked from commit f6df609)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    adrian-wang authored and marmbrus committed Dec 2, 2014
    Configuration menu
    Copy the full SHA
    97dc238 View commit details
    Browse the repository at this point in the history
  8. [SPARK-4670] [SQL] wrong symbol for bitwise not

    We should use `~` instead of `-` for bitwise NOT.
    
    Author: Daoyuan Wang <daoyuan.wang@intel.com>
    
    Closes #3528 from adrian-wang/symbol and squashes the following commits:
    
    affd4ad [Daoyuan Wang] fix code gen test case
    56efb79 [Daoyuan Wang] ensure bitwise NOT over byte and short persist data type
    f55fbae [Daoyuan Wang] wrong symbol for bitwise not
    
    (cherry picked from commit 1f5ddf1)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    adrian-wang authored and marmbrus committed Dec 2, 2014
    Configuration menu
    Copy the full SHA
    adc5d6f View commit details
    Browse the repository at this point in the history
  9. [SPARK-4695][SQL] Get result using executeCollect

    Using ```executeCollect``` to collect the result, because executeCollect is a custom implementation of collect in spark sql which better than rdd's collect
    
    Author: wangfei <wangfei1@huawei.com>
    
    Closes #3547 from scwf/executeCollect and squashes the following commits:
    
    a5ab68e [wangfei] Revert "adding debug info"
    a60d680 [wangfei] fix test failure
    0db7ce8 [wangfei] adding debug info
    184c594 [wangfei] using executeCollect instead collect
    
    (cherry picked from commit 3ae0cda)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    scwf authored and marmbrus committed Dec 2, 2014
    Configuration menu
    Copy the full SHA
    658fe8f View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2014

  1. Configuration menu
    Copy the full SHA
    5e026a3 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4672][GraphX]Perform checkpoint() on PartitionsRDD to shorten …

    …the lineage
    
    The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672
    
    Iterative GraphX applications always have long lineage, while checkpoint() on EdgeRDD and VertexRDD themselves cannot shorten the lineage. In contrast, if we perform checkpoint() on their ParitionsRDD, the long lineage can be cut off. Moreover, the existing operations such as cache() in this code is performed on the PartitionsRDD, so checkpoint() should do the same way. More details and explanation can be found in the JIRA.
    
    Author: JerryLead <JerryLead@163.com>
    Author: Lijie Xu <csxulijie@gmail.com>
    
    Closes #3549 from JerryLead/my_graphX_checkpoint and squashes the following commits:
    
    d1aa8d8 [JerryLead] Perform checkpoint() on PartitionsRDD not VertexRDD and EdgeRDD themselves
    ff08ed4 [JerryLead] Merge branch 'master' of https://github.com/apache/spark
    c0169da [JerryLead] Merge branch 'master' of https://github.com/apache/spark
    52799e3 [Lijie Xu] Merge pull request #1 from apache/master
    
    (cherry picked from commit fc0a147)
    Signed-off-by: Ankur Dave <ankurdave@gmail.com>
    JerryLead authored and ankurdave committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    f1859fc View commit details
    Browse the repository at this point in the history
  3. [SPARK-4672][GraphX]Non-transient PartitionsRDDs will lead to StackOv…

    …erflow error
    
    The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672
    
    In a nutshell, if `val partitionsRDD` in EdgeRDDImpl and VertexRDDImpl are non-transient, the serialization chain can become very long in iterative algorithms and finally lead to the StackOverflow error. More details and explanation can be found in the JIRA.
    
    Author: JerryLead <JerryLead@163.com>
    Author: Lijie Xu <csxulijie@gmail.com>
    
    Closes #3544 from JerryLead/my_graphX and squashes the following commits:
    
    628f33c [JerryLead] set PartitionsRDD to be transient in EdgeRDDImpl and VertexRDDImpl
    c0169da [JerryLead] Merge branch 'master' of https://github.com/apache/spark
    52799e3 [Lijie Xu] Merge pull request #1 from apache/master
    
    (cherry picked from commit 17c162f)
    Signed-off-by: Ankur Dave <ankurdave@gmail.com>
    JerryLead authored and ankurdave committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    528cce8 View commit details
    Browse the repository at this point in the history
  4. [SPARK-4672][Core]Checkpoint() should clear f to shorten the serializ…

    …ation chain
    
    The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672
    
    The f closure of `PartitionsRDD(ZippedPartitionsRDD2)` contains a `$outer` that references EdgeRDD/VertexRDD, which causes task's serialization chain become very long in iterative GraphX applications. As a result, StackOverflow error will occur. If we set "f = null" in `clearDependencies()`, checkpoint() can cut off the long serialization chain. More details and explanation can be found in the JIRA.
    
    Author: JerryLead <JerryLead@163.com>
    Author: Lijie Xu <csxulijie@gmail.com>
    
    Closes #3545 from JerryLead/my_core and squashes the following commits:
    
    f7faea5 [JerryLead] checkpoint() should clear the f to avoid StackOverflow error
    c0169da [JerryLead] Merge branch 'master' of https://github.com/apache/spark
    52799e3 [Lijie Xu] Merge pull request #1 from apache/master
    
    (cherry picked from commit 77be8b9)
    Signed-off-by: Ankur Dave <ankurdave@gmail.com>
    JerryLead authored and ankurdave committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    667f7ff View commit details
    Browse the repository at this point in the history
  5. [SPARK-4710] [mllib] Eliminate MLlib compilation warnings

    Renamed StreamingKMeans to StreamingKMeansExample to avoid warning about name conflict with StreamingKMeans class.
    
    Added import to DecisionTreeRunner to eliminate warning.
    
    CC: mengxr
    
    Author: Joseph K. Bradley <joseph@databricks.com>
    
    Closes #3568 from jkbradley/ml-compilation-warnings and squashes the following commits:
    
    64d6bc4 [Joseph K. Bradley] Updated DecisionTreeRunner.scala and StreamingKMeans.scala to eliminate compilation warnings, including renaming StreamingKMeans to StreamingKMeansExample.
    
    (cherry picked from commit 4ac2151)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    jkbradley authored and mengxr committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    fb14bfd View commit details
    Browse the repository at this point in the history
  6. [SPARK-4708][MLLib] Make k-mean runs two/three times faster with dens…

    …e/sparse sample
    
    Note that the usage of `breezeSquaredDistance` in
    `org.apache.spark.mllib.util.MLUtils.fastSquaredDistance`
    is in the critical path, and `breezeSquaredDistance` is slow.
    We should replace it with our own implementation.
    
    Here is the benchmark against mnist8m dataset.
    
    Before
    DenseVector: 70.04secs
    SparseVector: 59.05secs
    
    With this PR
    DenseVector: 30.58secs
    SparseVector: 21.14secs
    
    Author: DB Tsai <dbtsai@alpinenow.com>
    
    Closes #3565 from dbtsai/kmean and squashes the following commits:
    
    08bc068 [DB Tsai] restyle
    de24662 [DB Tsai] address feedback
    b185a77 [DB Tsai] cleanup
    4554ddd [DB Tsai] first commit
    
    (cherry picked from commit 7fc49ed)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    DB Tsai authored and mengxr committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    8ff7a28 View commit details
    Browse the repository at this point in the history
  7. [SPARK-4717][MLlib] Optimize BLAS library to avoid de-reference multi…

    …ple times in loop
    
    Have a local reference to `values` and `indices` array in the `Vector` object
    so JVM can locate the value with one operation call. See `SPARK-4581`
    for similar optimization, and the bytecode analysis.
    
    Author: DB Tsai <dbtsai@alpinenow.com>
    
    Closes #3577 from dbtsai/blasopt and squashes the following commits:
    
    62d38c4 [DB Tsai] formating
    0316cef [DB Tsai] first commit
    
    (cherry picked from commit d005429)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    DB Tsai authored and mengxr committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    b63e941 View commit details
    Browse the repository at this point in the history
  8. SPARK-2624 add datanucleus jars to the container in yarn-cluster

    If `spark-submit` finds the datanucleus jars, it adds them to the driver's classpath, but does not add it to the container.
    
    This patch modifies the yarn deployment class to copy all `datanucleus-*` jars found in `[spark-home]/libs` to the container.
    
    Author: Jim Lim <jim@quixey.com>
    
    Closes #3238 from jimjh/SPARK-2624 and squashes the following commits:
    
    3633071 [Jim Lim] SPARK-2624 update documentation and comments
    fe95125 [Jim Lim] SPARK-2624 keep java imports together
    6c31fe0 [Jim Lim] SPARK-2624 update documentation
    6690fbf [Jim Lim] SPARK-2624 add tests
    d28d8e9 [Jim Lim] SPARK-2624 add spark.yarn.datanucleus.dir option
    84e6cba [Jim Lim] SPARK-2624 add datanucleus jars to the container in yarn-cluster
    Jim Lim authored and Andrew Or committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    163fd78 View commit details
    Browse the repository at this point in the history
  9. [SPARK-4701] Typo in sbt/sbt

    Modified typo.
    
    Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
    
    Closes #3560 from tsudukim/feature/SPARK-4701 and squashes the following commits:
    
    ed2a3f1 [Masayoshi TSUZUKI] Another whitespace position error.
    1af3a35 [Masayoshi TSUZUKI] [SPARK-4701] Typo in sbt/sbt
    
    (cherry picked from commit 96786e3)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    tsudukim authored and Andrew Or committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    614e686 View commit details
    Browse the repository at this point in the history
  10. [SPARK-4715][Core] Make sure tryToAcquire won't return a negative value

    ShuffleMemoryManager.tryToAcquire may return a negative value. The unit test demonstrates this bug. It will output `0 did not equal -200 granted is negative`.
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes #3575 from zsxwing/SPARK-4715 and squashes the following commits:
    
    a193ae6 [zsxwing] Make sure tryToAcquire won't return a negative value
    
    (cherry picked from commit edd3cd4)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    zsxwing authored and Andrew Or committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    1ee65b4 View commit details
    Browse the repository at this point in the history
  11. [SPARK-4642] Add description about spark.yarn.queue to running-on-YAR…

    …N document.
    
    Added descriptions about these parameters.
    - spark.yarn.queue
    
    Modified description about the defalut value of this parameter.
    - spark.yarn.submit.file.replication
    
    Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
    
    Closes #3500 from tsudukim/feature/SPARK-4642 and squashes the following commits:
    
    ce99655 [Masayoshi TSUZUKI] better gramatically.
    21cf624 [Masayoshi TSUZUKI] Removed intentionally undocumented properties.
    88cac9b [Masayoshi TSUZUKI] [SPARK-4642] Documents about running-on-YARN needs update
    
    (cherry picked from commit 692f493)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    tsudukim authored and Andrew Or committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    4a71e08 View commit details
    Browse the repository at this point in the history
  12. [HOT FIX] [YARN] Check whether /lib exists before listing its files

    This is caused by a975dc3
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #3589 from andrewor14/yarn-hot-fix and squashes the following commits:
    
    a4fad5f [Andrew Or] Check whether lib directory exists before listing its files
    
    (cherry picked from commit 90ec643)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    Andrew Or committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    38cb2c3 View commit details
    Browse the repository at this point in the history
  13. [SPARK-4552][SQL] Avoid exception when reading empty parquet data thr…

    …ough Hive
    
    This is a very small fix that catches one specific exception and returns an empty table.  #3441 will address this in a more principled way.
    
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #3586 from marmbrus/fixEmptyParquet and squashes the following commits:
    
    2781d9f [Michael Armbrust] Handle empty lists for newParquet
    04dd376 [Michael Armbrust] Avoid exception when reading empty parquet data through Hive
    
    (cherry picked from commit 513ef82)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    marmbrus committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    4793197 View commit details
    Browse the repository at this point in the history
  14. [SPARK-4498][core] Don't transition ExecutorInfo to RUNNING until Dri…

    …ver adds Executor
    
    The ExecutorInfo only reaches the RUNNING state if the Driver is alive to send the ExecutorStateChanged message to master.  Else, appInfo.resetRetryCount() is never called and failing Executors will eventually exceed ApplicationState.MAX_NUM_RETRY, resulting in the application being removed from the master's accounting.
    
    Author: Mark Hamstra <markhamstra@gmail.com>
    
    Closes #3550 from markhamstra/SPARK-4498 and squashes the following commits:
    
    8f543b1 [Mark Hamstra] Don't transition ExecutorInfo to RUNNING until Executor is added by Driver
    markhamstra authored and JoshRosen committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    6b6b779 View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2014

  1. [SPARK-4085] Propagate FetchFailedException when Spark fails to read …

    …local shuffle file.
    
    cc aarondav kayousterhout pwendell
    
    This should go into 1.2?
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes #3579 from rxin/SPARK-4085 and squashes the following commits:
    
    255b4fd [Reynold Xin] Updated test.
    f9814d9 [Reynold Xin] Code review feedback.
    2afaf35 [Reynold Xin] [SPARK-4085] Propagate FetchFailedException when Spark fails to read local shuffle file.
    
    (cherry picked from commit 1826372)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    rxin authored and pwendell committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    fe28ee2 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4711] [mllib] [docs] Programming guide advice on choosing opti…

    …mizer
    
    I have heard requests for the docs to include advice about choosing an optimization method. The programming guide could include a brief statement about this (so the user does not have to read the whole optimization section).
    
    CC: mengxr
    
    Author: Joseph K. Bradley <joseph@databricks.com>
    
    Closes #3569 from jkbradley/lr-doc and squashes the following commits:
    
    654aeb5 [Joseph K. Bradley] updated section header for mllib-optimization
    5035ad0 [Joseph K. Bradley] updated based on review
    94f6dec [Joseph K. Bradley] Updated linear methods and optimization docs with quick advice on choosing an optimization method
    
    (cherry picked from commit 27ab0b8)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    jkbradley authored and mengxr committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    4259ca8 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4580] [SPARK-4610] [mllib] [docs] Documentation for tree ensem…

    …bles + DecisionTree API fix
    
    Major changes:
    * Added programming guide sections for tree ensembles
    * Added examples for tree ensembles
    * Updated DecisionTree programming guide with more info on parameters
    * **API change**: Standardized the tree parameter for the number of classes (for classification)
    
    Minor changes:
    * Updated decision tree documentation
    * Updated existing tree and tree ensemble examples
     * Use train/test split, and compute test error instead of training error.
     * Fixed decision_tree_runner.py to actually use the number of classes it computes from data. (small bug fix)
    
    Note: I know this is a lot of lines, but most is covered by:
    * Programming guide sections for gradient boosting and random forests.  (The changes are probably best viewed by generating the docs locally.)
    * New examples (which were copied from the programming guide)
    * The "numClasses" renaming
    
    I have run all examples and relevant unit tests.
    
    CC: mengxr manishamde codedeft
    
    Author: Joseph K. Bradley <joseph@databricks.com>
    Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
    
    Closes #3461 from jkbradley/ensemble-docs and squashes the following commits:
    
    70a75f3 [Joseph K. Bradley] updated forest vs boosting comparison
    d1de753 [Joseph K. Bradley] Added note about toString and toDebugString for DecisionTree to migration guide
    8e87f8f [Joseph K. Bradley] Combined GBT and RandomForest guides into one ensembles guide
    6fab846 [Joseph K. Bradley] small fixes based on review
    b9f8576 [Joseph K. Bradley] updated decision tree doc
    375204c [Joseph K. Bradley] fixed python style
    2b60b6e [Joseph K. Bradley] merged Java RandomForest examples into 1 file.  added header.  Fixed small bug in same example in the programming guide.
    706d332 [Joseph K. Bradley] updated python DT runner to print full model if it is small
    c76c823 [Joseph K. Bradley] added migration guide for mllib
    abe5ed7 [Joseph K. Bradley] added examples for random forest in Java and Python to examples folder
    07fc11d [Joseph K. Bradley] Renamed numClassesForClassification to numClasses everywhere in trees and ensembles. This is a breaking API change, but it was necessary to correct an API inconsistency in Spark 1.1 (where Python DecisionTree used numClasses but Scala used numClassesForClassification).
    cdfdfbc [Joseph K. Bradley] added examples for GBT
    6372a2b [Joseph K. Bradley] updated decision tree examples to use random split.  tested all of them.
    ad3e695 [Joseph K. Bradley] added gbt and random forest to programming guide.  still need to update their examples
    
    (cherry picked from commit 657a888)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    jkbradley authored and mengxr committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    9880bb4 View commit details
    Browse the repository at this point in the history
  4. [Release] Correctly translate contributors name in release notes

    This commit involves three main changes:
    
    (1) It separates the translation of contributor names from the
    generation of the contributors list. This is largely motivated
    by the Github API limit; even if we exceed this limit, we should
    at least be able to proceed manually as before. This is why the
    translation logic is abstracted into its own script
    translate-contributors.py.
    
    (2) When we look for candidate replacements for invalid author
    names, we should look for the assignees of the associated JIRAs
    too. As a result, the intermediate file must keep track of these.
    
    (3) This provides an interactive mode with which the user can
    sit at the terminal and manually pick the candidate replacement
    that he/she thinks makes the most sense. As before, there is a
    non-interactive mode that picks the first candidate that the
    script considers "valid."
    
    TODO: We should have a known_contributors file that stores
    known mappings so we don't have to go through all of this
    translation every time. This is also valuable because some
    contributors simply cannot be automatically translated.
    
    Conflicts:
    	.gitignore
    Andrew Or committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    f9e1f89 View commit details
    Browse the repository at this point in the history
  5. [SPARK-4685] Include all spark.ml and spark.mllib packages in JavaDoc…

    …'s MLlib group
    
    This is #3554 from Lewuathe except that I put both `spark.ml` and `spark.mllib` in the group 'MLlib`.
    
    Closes #3554
    
    jkbradley
    
    Author: lewuathe <lewuathe@me.com>
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3598 from mengxr/Lewuathe-modify-javadoc-setting and squashes the following commits:
    
    184609a [Xiangrui Meng] merge spark.ml and spark.mllib into the same group in javadoc
    f7535e6 [lewuathe] [SPARK-4685] Update JavaDoc settings to include spark.ml and all spark.mllib subpackages in the right sections
    
    (cherry picked from commit 20bfea4)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    Lewuathe authored and mengxr committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    2605acb View commit details
    Browse the repository at this point in the history
  6. [SQL] Minor: Avoid calling Seq#size in a loop

    Just found this instance while doing some jstack-based profiling of a Spark SQL job. It is very unlikely that this is causing much of a perf issue anywhere, but it is unnecessarily suboptimal.
    
    Author: Aaron Davidson <aaron@databricks.com>
    
    Closes #3593 from aarondav/seq-opt and squashes the following commits:
    
    962cdfc [Aaron Davidson] [SQL] Minor: Avoid calling Seq#size in a loop
    
    (cherry picked from commit c6c7165)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    aarondav authored and rxin committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    dec838b View commit details
    Browse the repository at this point in the history
  7. [docs] Fix outdated comment in tuning guide

    When you use the SPARK_JAVA_OPTS env variable, Spark complains:
    
    ```
    SPARK_JAVA_OPTS was detected (set to ' -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps ').
    This is deprecated in Spark 1.0+.
    
    Please instead use:
     - ./spark-submit with conf/spark-defaults.conf to set defaults for an application
     - ./spark-submit with --driver-java-options to set -X options for a driver
     - spark.executor.extraJavaOptions to set -X options for executors
     - SPARK_DAEMON_JAVA_OPTS to set java options for standalone daemons (master or worker)
    ```
    
    This updates the docs to redirect the user to the relevant part of the configuration docs.
    
    CC: mengxr  but please CC someone else as needed
    
    Author: Joseph K. Bradley <joseph@databricks.com>
    
    Closes #3592 from jkbradley/tuning-doc and squashes the following commits:
    
    0760ce1 [Joseph K. Bradley] fixed outdated comment in tuning guide
    
    (cherry picked from commit 529439b)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    jkbradley authored and rxin committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    bf720ef View commit details
    Browse the repository at this point in the history
  8. [SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes

    Documentation:
    * Added ml-guide.md, linked from mllib-guide.md
    * Updated mllib-guide.md with small section pointing to ml-guide.md
    
    Examples:
    * CrossValidatorExample
    * SimpleParamsExample
    * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md)
    
    Bug fixes:
    * PipelineModel: did not use ParamMaps correctly
    * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!)
    
    CC: mengxr shivaram  etrain  Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete.
    
    Author: Joseph K. Bradley <joseph@databricks.com>
    Author: jkbradley <joseph.kurata.bradley@gmail.com>
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3588 from jkbradley/ml-package-docs and squashes the following commits:
    
    d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit).  updated examples for CV and Params for spark.ml
    c38469c [Joseph K. Bradley] Updated ml-guide with CV examples
    99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params.  Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold.
    ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs
    3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype
    41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version.  CrossValidatorExample not working yet.  Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
    
    (cherry picked from commit 469a6e5)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    jkbradley authored and mengxr committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    266a814 View commit details
    Browse the repository at this point in the history
  9. [FIX][DOC] Fix broken links in ml-guide.md

    and some minor changes in ScalaDoc.
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3601 from mengxr/SPARK-4575-fix and squashes the following commits:
    
    c559768 [Xiangrui Meng] minor code update
    ce94da8 [Xiangrui Meng] Java Bean -> JavaBean
    0b5c182 [Xiangrui Meng] fix links in ml-guide
    
    (cherry picked from commit 7e758d7)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    mengxr committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    34fdca0 View commit details
    Browse the repository at this point in the history
  10. [SPARK-4683][SQL] Add a beeline.cmd to run on Windows

    Tested locally with a Win7 VM. Connected to a Spark SQL Thrift server instance running on Mac OS X with the following command line:
    
    ```
    bin\beeline.cmd -u jdbc:hive2://10.0.2.2:10000 -n lian
    ```
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3599)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #3599 from liancheng/beeline.cmd and squashes the following commits:
    
    79092e7 [Cheng Lian] Windows script for BeeLine
    
    (cherry picked from commit 28c7aca)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    liancheng authored and pwendell committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    2fbe488 View commit details
    Browse the repository at this point in the history
  11. Revert "HOTFIX: Rolling back incorrect version change"

    This reverts commit 3a4609e.
    pwendell committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    2c6e287 View commit details
    Browse the repository at this point in the history
  12. Revert "Preparing development version 1.2.1-SNAPSHOT"

    This reverts commit 00316cc.
    pwendell committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    701019b View commit details
    Browse the repository at this point in the history
  13. Revert "Preparing Spark release v1.2.0-rc1"

    This reverts commit 1056e9e.
    pwendell committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    078894c View commit details
    Browse the repository at this point in the history
  14. [SPARK-4253] Ignore spark.driver.host in yarn-cluster and standalone-…

    …cluster modes
    
    In yarn-cluster and standalone-cluster modes, we don't know where driver will run until it is launched.  If the `spark.driver.host` property is set on the submitting machine and propagated to the driver through SparkConf then this will lead to errors when the driver launches.
    
    This patch fixes this issue by dropping the `spark.driver.host` property in SparkSubmit when running in a cluster deploy mode.
    
    Author: WangTaoTheTonic <barneystinson@aliyun.com>
    Author: WangTao <barneystinson@aliyun.com>
    
    Closes #3112 from WangTaoTheTonic/SPARK4253 and squashes the following commits:
    
    ed1a25c [WangTaoTheTonic] revert unrelated formatting issue
    02c4e49 [WangTao] add comment
    32a3f3f [WangTaoTheTonic] ingore it in SparkSubmit instead of SparkContext
    667cf24 [WangTaoTheTonic] document fix
    ff8d5f7 [WangTaoTheTonic] also ignore it in standalone cluster mode
    2286e6b [WangTao] ignore spark.driver.host in yarn-cluster mode
    
    (cherry picked from commit 8106b1e)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    WangTaoTheTonic authored and JoshRosen committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    d9aee07 View commit details
    Browse the repository at this point in the history
  15. [HOTFIX] Fixing two issues with the release script.

    1. The version replacement was still producing some false changes.
    2. Uploads to the staging repo specifically.
    
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #3608 from pwendell/release-script and squashes the following commits:
    
    3c63294 [Patrick Wendell] Fixing two issues with the release script:
    
    (cherry picked from commit 8dae26f)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    pwendell committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    ead01b6 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    2b72c56 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    bc05df8 View commit details
    Browse the repository at this point in the history
  18. [SPARK-4745] Fix get_existing_cluster() function with multiple securi…

    …ty groups
    
    The current get_existing_cluster() function would only find an instance belonged to a cluster if the instance's security groups == cluster_name + "-master" (or "-slaves"). This fix allows for multiple security groups by checking if the cluster_name + "-master" security group is in the list of groups for a particular instance.
    
    Author: alexdebrie <alexdebrie1@gmail.com>
    
    Closes #3596 from alexdebrie/master and squashes the following commits:
    
    9d51232 [alexdebrie] Fix get_existing_cluster() function with multiple security groups
    
    (cherry picked from commit 794f3ae)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    alexdebrie authored and JoshRosen committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    a00d0aa View commit details
    Browse the repository at this point in the history
  19. [SPARK-4459] Change groupBy type parameter from K to U

    Please see https://issues.apache.org/jira/browse/SPARK-4459
    
    Author: Saldanha <saldaal1@phusca-l24858.wlan.na.novartis.net>
    
    Closes #3327 from alokito/master and squashes the following commits:
    
    54b1095 [Saldanha] [SPARK-4459] changed type parameter for keyBy from K to U
    d5f73c3 [Saldanha] [SPARK-4459] added keyBy test
    316ad77 [Saldanha] SPARK-4459 changed type parameter for groupBy from K to U.
    62ddd4b [Saldanha] SPARK-4459 added failing unit test
    (cherry picked from commit 743a889)
    
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Saldanha authored and JoshRosen committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    0d159de View commit details
    Browse the repository at this point in the history
  20. [SPARK-4652][DOCS] Add docs about spark-git-repo option

    There might be some cases when WIPS spark version need to be run
    on EC2 cluster. In order to setup this type of cluster more easily,
    add --spark-git-repo option description to ec2 documentation.
    
    Author: lewuathe <lewuathe@me.com>
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #3513 from Lewuathe/doc-for-development-spark-cluster and squashes the following commits:
    
    6dae8ee [lewuathe] Wrap consistent with other descriptions
    cfaf9be [lewuathe] Add docs about spark-git-repo option
    
    (Editing / cleanup by Josh Rosen)
    
    (cherry picked from commit ab8177d)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Lewuathe authored and JoshRosen committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    f5c5647 View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2014

  1. [SPARK-4421] Wrong link in spark-standalone.html

    Modified the link of building Spark.
    
    Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
    
    Closes #3279 from tsudukim/feature/SPARK-4421 and squashes the following commits:
    
    56e31c1 [Masayoshi TSUZUKI] Modified the link of building Spark.
    
    (cherry picked from commit ddfc09c)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    tsudukim authored and JoshRosen committed Dec 5, 2014
    Configuration menu
    Copy the full SHA
    b905e11 View commit details
    Browse the repository at this point in the history
  2. Fix typo in Spark SQL docs.

    Author: Andy Konwinski <andykonwinski@gmail.com>
    
    Closes #3611 from andyk/patch-3 and squashes the following commits:
    
    7bab333 [Andy Konwinski] Fix typo in Spark SQL docs.
    
    (cherry picked from commit 15cf3b0)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    andyk authored and JoshRosen committed Dec 5, 2014
    Configuration menu
    Copy the full SHA
    63b1bc1 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4464] Description about configuration options need to be modif…

    …ied in docs.
    
    Added description about -h and -host.
    Modified description about -i and -ip which are now deprecated.
    Added description about --properties-file.
    
    Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
    
    Closes #3329 from tsudukim/feature/SPARK-4464 and squashes the following commits:
    
    6c07caf [Masayoshi TSUZUKI] [SPARK-4464] Description about configuration options need to be modified in docs.
    
    (cherry picked from commit ca37903)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    tsudukim authored and JoshRosen committed Dec 5, 2014
    Configuration menu
    Copy the full SHA
    6c43631 View commit details
    Browse the repository at this point in the history
  4. Revert "[HOT FIX] [YARN] Check whether /lib exists before listing i…

    …ts files"
    
    This reverts commit 38cb2c3.
    Andrew Or committed Dec 5, 2014
    Configuration menu
    Copy the full SHA
    325babe View commit details
    Browse the repository at this point in the history
  5. Revert "SPARK-2624 add datanucleus jars to the container in yarn-clus…

    …ter"
    
    This reverts commit a975dc3.
    Andrew Or committed Dec 5, 2014
    Configuration menu
    Copy the full SHA
    a8d8077 View commit details
    Browse the repository at this point in the history
  6. [SPARK-4753][SQL] Use catalyst for partition pruning in newParquet.

    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #3613 from marmbrus/parquetPartitionPruning and squashes the following commits:
    
    4f138f8 [Michael Armbrust] Use catalyst for partition pruning in newParquet.
    
    (cherry picked from commit f5801e8)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    marmbrus authored and pwendell committed Dec 5, 2014
    Configuration menu
    Copy the full SHA
    d12ea49 View commit details
    Browse the repository at this point in the history
  7. [SPARK-4761][SQL] Enables Kryo by default in Spark SQL Thrift server

    Enables Kryo and disables reference tracking by default in Spark SQL Thrift server. Configurations explicitly defined by users in `spark-defaults.conf` are respected (the Thrift server is started by `spark-submit`, which handles configuration properties properly).
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3621)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #3621 from liancheng/kryo-by-default and squashes the following commits:
    
    70c2775 [Cheng Lian] Enables Kryo by default in Spark SQL Thrift server
    
    (cherry picked from commit 6f61e1f)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    liancheng authored and pwendell committed Dec 5, 2014
    Configuration menu
    Copy the full SHA
    e8d8077 View commit details
    Browse the repository at this point in the history
  8. Streaming doc : do you mean inadvertently?

    Author: CrazyJvm <crazyjvm@gmail.com>
    
    Closes #3620 from CrazyJvm/streaming-foreachRDD and squashes the following commits:
    
    b72886b [CrazyJvm] do you mean inadvertently?
    
    (cherry picked from commit 6eb1b6f)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    CrazyJvm authored and rxin committed Dec 5, 2014
    Configuration menu
    Copy the full SHA
    11446a6 View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2014

  1. [SPARK-3623][GraphX] GraphX should support the checkpoint operation

    Author: GuoQiang Li <witgo@qq.com>
    
    Closes #2631 from witgo/SPARK-3623 and squashes the following commits:
    
    a70c500 [GuoQiang Li] Remove java related
    4d1e249 [GuoQiang Li] Add comments
    e682724 [GuoQiang Li] Graph should support the checkpoint operation
    
    (cherry picked from commit e895e0c)
    Signed-off-by: Ankur Dave <ankurdave@gmail.com>
    witgo authored and ankurdave committed Dec 6, 2014
    Configuration menu
    Copy the full SHA
    27d9f13 View commit details
    Browse the repository at this point in the history

Commits on Dec 8, 2014

  1. [SPARK-4646] Replace Scala.util.Sorting.quickSort with Sorter(TimSort…

    …) in Spark
    
    This patch just replaces a native quick sorter with Sorter(TimSort) in Spark.
    It could get performance gains by ~8% in my quick experiments.
    
    Author: Takeshi Yamamuro <linguin.m.s@gmail.com>
    
    Closes #3507 from maropu/TimSortInEdgePartitionBuilderSpike and squashes the following commits:
    
    8d4e5d2 [Takeshi Yamamuro] Remove a wildcard import
    3527e00 [Takeshi Yamamuro] Replace Scala.util.Sorting.quickSort with Sorter(TimSort) in Spark
    
    (cherry picked from commit 2e6b736)
    Signed-off-by: Ankur Dave <ankurdave@gmail.com>
    maropu authored and ankurdave committed Dec 8, 2014
    Configuration menu
    Copy the full SHA
    a4ae7c8 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4620] Add unpersist in Graph and GraphImpl

    Add an IF to uncache both vertices and edges of Graph/GraphImpl.
    This IF is useful when iterative graph operations build a new graph in each iteration, and the vertices and edges of previous iterations are no longer needed for following iterations.
    
    Author: Takeshi Yamamuro <linguin.m.s@gmail.com>
    
    This patch had conflicts when merged, resolved by
    Committer: Ankur Dave <ankurdave@gmail.com>
    
    Closes #3476 from maropu/UnpersistInGraphSpike and squashes the following commits:
    
    77a006a [Takeshi Yamamuro] Add unpersist in Graph and GraphImpl
    
    (cherry picked from commit 8817fc7)
    Signed-off-by: Ankur Dave <ankurdave@gmail.com>
    maropu authored and ankurdave committed Dec 8, 2014
    Configuration menu
    Copy the full SHA
    6b9e8b0 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4774] [SQL] Makes HiveFromSpark more portable

    HiveFromSpark read the kv1.txt file from SPARK_HOME/examples/src/main/resources/kv1.txt which assumed
    you had a source tree checked out. Now we copy the kv1.txt file to a temporary file and delete it when
    the jvm shuts down. This allows us to run this example outside of a spark source tree.
    
    Author: Kostas Sakellis <kostas@cloudera.com>
    
    Closes #3628 from ksakellis/kostas-spark-4774 and squashes the following commits:
    
    6770f83 [Kostas Sakellis] [SPARK-4774] [SQL] Makes HiveFromSpark more portable
    
    (cherry picked from commit d6a972b)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    Kostas Sakellis authored and marmbrus committed Dec 8, 2014
    Configuration menu
    Copy the full SHA
    9ed5641 View commit details
    Browse the repository at this point in the history

Commits on Dec 9, 2014

  1. SPARK-4770. [DOC] [YARN] spark.scheduler.minRegisteredResourcesRatio …

    …doc...
    
    ...umented default is incorrect for YARN
    
    Author: Sandy Ryza <sandy@cloudera.com>
    
    Closes #3624 from sryza/sandy-spark-4770 and squashes the following commits:
    
    bd81a3a [Sandy Ryza] SPARK-4770. [DOC] [YARN] spark.scheduler.minRegisteredResourcesRatio documented default is incorrect for YARN
    
    (cherry picked from commit cda94d1)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    sryza authored and JoshRosen committed Dec 9, 2014
    Configuration menu
    Copy the full SHA
    f416032 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4769] [SQL] CTAS does not work when reading from temporary tables

    This is the code refactor and follow ups for #2570
    
    Author: Cheng Hao <hao.cheng@intel.com>
    
    Closes #3336 from chenghao-intel/createtbl and squashes the following commits:
    
    3563142 [Cheng Hao] remove the unused variable
    e215187 [Cheng Hao] eliminate the compiling warning
    4f97f14 [Cheng Hao] fix bug in unittest
    5d58812 [Cheng Hao] revert the API changes
    b85b620 [Cheng Hao] fix the regression of temp tabl not found in CTAS
    
    (cherry picked from commit 51b1fe1)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    chenghao-intel authored and marmbrus committed Dec 9, 2014
    Configuration menu
    Copy the full SHA
    31a6d4f View commit details
    Browse the repository at this point in the history
  3. [SPARK-4785][SQL] Initilize Hive UDFs on the driver and serialize the…

    …m with a wrapper
    
    Different from Hive 0.12.0, in Hive 0.13.1 UDF/UDAF/UDTF (aka Hive function) objects should only be initialized once on the driver side and then serialized to executors. However, not all function objects are serializable (e.g. GenericUDF doesn't implement Serializable). Hive 0.13.1 solves this issue with Kryo or XML serializer. Several utility ser/de methods are provided in class o.a.h.h.q.e.Utilities for this purpose. In this PR we chose Kryo for efficiency. The Kryo serializer used here is created in Hive. Spark Kryo serializer wasn't used because there's no available SparkConf instance.
    
    Author: Cheng Hao <hao.cheng@intel.com>
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #3640 from chenghao-intel/udf_serde and squashes the following commits:
    
    8e13756 [Cheng Hao] Update the comment
    74466a3 [Cheng Hao] refactor as feedbacks
    396c0e1 [Cheng Hao] avoid Simple UDF to be serialized
    e9c3212 [Cheng Hao] update the comment
    19cbd46 [Cheng Hao] support udf instance ser/de after initialization
    
    (cherry picked from commit 383c555)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    chenghao-intel authored and marmbrus committed Dec 9, 2014
    Configuration menu
    Copy the full SHA
    e686742 View commit details
    Browse the repository at this point in the history
  4. [SPARK-4765] Make GC time always shown in UI.

    This commit removes the GC time for each task from the set of
    optional, additional metrics, and instead always shows it for
    each task.
    
    cc pwendell
    
    Author: Kay Ousterhout <kayousterhout@gmail.com>
    
    Closes #3622 from kayousterhout/gc_time and squashes the following commits:
    
    15ac242 [Kay Ousterhout] Make TaskDetailsClassNames private[spark]
    e71d893 [Kay Ousterhout] [SPARK-4765] Make GC time always shown in UI.
    
    (cherry picked from commit 1f51106)
    Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
    kayousterhout committed Dec 9, 2014
    Configuration menu
    Copy the full SHA
    5a3a3cc View commit details
    Browse the repository at this point in the history

Commits on Dec 10, 2014

  1. SPARK-4567. Make SparkJobInfo and SparkStageInfo serializable

    Author: Sandy Ryza <sandy@cloudera.com>
    
    Closes #3426 from sryza/sandy-spark-4567 and squashes the following commits:
    
    cb4b8d2 [Sandy Ryza] SPARK-4567. Make SparkJobInfo and SparkStageInfo serializable
    
    (cherry picked from commit 5e4c06f)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    sryza authored and JoshRosen committed Dec 10, 2014
    Configuration menu
    Copy the full SHA
    51da2c5 View commit details
    Browse the repository at this point in the history
  2. SPARK-4805 [CORE] BlockTransferMessage.toByteArray() trips assertion

    Allocate enough room for type byte as well as message, to avoid tripping assertion about capacity of the buffer
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #3650 from srowen/SPARK-4805 and squashes the following commits:
    
    9e1d502 [Sean Owen] Allocate enough room for type byte as well as message, to avoid tripping assertion about capacity of the buffer
    
    (cherry picked from commit d8f84f2)
    Signed-off-by: Aaron Davidson <aaron@databricks.com>
    srowen authored and aarondav committed Dec 10, 2014
    Configuration menu
    Copy the full SHA
    b0d64e5 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4740] Create multiple concurrent connections between two peer …

    …nodes in Netty.
    
    It's been reported that when the number of disks is large and the number of nodes is small, Netty network throughput is low compared with NIO. We suspect the problem is that only a small number of disks are utilized to serve shuffle files at any given point, due to connection reuse. This patch adds a new config parameter to specify the number of concurrent connections between two peer nodes, default to 2.
    
    Author: Reynold Xin <rxin@databricks.com>
    
    Closes #3625 from rxin/SPARK-4740 and squashes the following commits:
    
    ad4241a [Reynold Xin] Updated javadoc.
    f33c72b [Reynold Xin] Code review feedback.
    0fefabb [Reynold Xin] Use double check in synchronization.
    41dfcb2 [Reynold Xin] Added test case.
    9076b4a [Reynold Xin] Fixed two NPEs.
    3e1306c [Reynold Xin] Minor style fix.
    4f21673 [Reynold Xin] [SPARK-4740] Create multiple concurrent connections between two peer nodes in Netty.
    
    (cherry picked from commit 2b9b726)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    rxin committed Dec 10, 2014
    Configuration menu
    Copy the full SHA
    441ec34 View commit details
    Browse the repository at this point in the history
  4. Config updates for the new shuffle transport.

    Author: Reynold Xin <rxin@databricks.com>
    
    Closes #3657 from rxin/conf-update and squashes the following commits:
    
    7370eab [Reynold Xin] Config updates for the new shuffle transport.
    
    (cherry picked from commit 9bd9334)
    Signed-off-by: Aaron Davidson <aaron@databricks.com>
    rxin authored and aarondav committed Dec 10, 2014
    Configuration menu
    Copy the full SHA
    5e5d8f4 View commit details
    Browse the repository at this point in the history
  5. [Minor] Use <sup> tag for help icon in web UI page header

    This small commit makes the `(?)` web UI help link into a superscript, which should address feedback that the current design makes it look like an error occurred or like information is missing.
    
    Before:
    
    ![image](https://cloud.githubusercontent.com/assets/50748/5370611/a3ed0034-7fd9-11e4-870f-05bd9faad5b9.png)
    
    After:
    
    ![image](https://cloud.githubusercontent.com/assets/50748/5370602/6c5ca8d6-7fd9-11e4-8d1a-568d71290aa7.png)
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #3659 from JoshRosen/webui-help-sup and squashes the following commits:
    
    bd72899 [Josh Rosen] Use <sup> tag for help icon in web UI page header.
    
    (cherry picked from commit f79c1cf)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Dec 10, 2014
    Configuration menu
    Copy the full SHA
    ff6f59b View commit details
    Browse the repository at this point in the history
  6. Revert "Preparing development version 1.2.1-SNAPSHOT"

    This reverts commit bc05df8.
    pwendell committed Dec 10, 2014
    Configuration menu
    Copy the full SHA
    a4d4a97 View commit details
    Browse the repository at this point in the history
  7. Revert "Preparing Spark release v1.2.0-rc2"

    This reverts commit 2b72c56.
    pwendell committed Dec 10, 2014
    Configuration menu
    Copy the full SHA
    e4f20bd View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    a428c44 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    d70c729 View commit details
    Browse the repository at this point in the history
  10. [SPARK-4771][Docs] Document standalone cluster supervise mode

    tdas looks like streaming already refers to the supervise mode. The link from there is broken though.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #3627 from andrewor14/document-supervise and squashes the following commits:
    
    9ca0908 [Andrew Or] Wording changes
    2b55ed2 [Andrew Or] Document standalone cluster supervise mode
    Andrew Or committed Dec 10, 2014
    Configuration menu
    Copy the full SHA
    1da1937 View commit details
    Browse the repository at this point in the history
  11. SPARK-3526 Add section about data locality to the tuning guide

    cc kayousterhout
    
    I have a few outstanding questions from compiling this documentation:
    - What's the difference between NO_PREF and ANY?  I understand the implications of the ordering but don't know what an example of each would be
    - Why is NO_PREF ahead of RACK_LOCAL?  I would think it'd be better to schedule rack-local tasks ahead of no preference if you could only do one or the other.  Is the idea to wait longer and hope for the rack-local tasks to turn into node-local or better?
    - Will there be a datacenter-local locality level in the future?  Apache Cassandra for example has this level
    
    Author: Andrew Ash <andrew@andrewash.com>
    
    Closes #2519 from ash211/SPARK-3526 and squashes the following commits:
    
    44cff28 [Andrew Ash] Link to spark.locality parameters rather than copying the list
    6d5d966 [Andrew Ash] Stay focused on Spark, no astronaut architecture mumbo-jumbo
    20e0e31 [Andrew Ash] SPARK-3526 Add section about data locality to the tuning guide
    
    (cherry picked from commit 652b781)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    ash211 authored and pwendell committed Dec 10, 2014
    Configuration menu
    Copy the full SHA
    1eb3ec5 View commit details
    Browse the repository at this point in the history

Commits on Dec 11, 2014

  1. [SPARK-4806] Streaming doc update for 1.2

    Important updates to the streaming programming guide
    - Make the fault-tolerance properties easier to understand, with information about write ahead logs
    - Update the information about deploying the spark streaming app with information about Driver HA
    - Update Receiver guide to discuss reliable vs unreliable receivers.
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    Author: Josh Rosen <joshrosen@databricks.com>
    Author: Josh Rosen <rosenville@gmail.com>
    
    Closes #3653 from tdas/streaming-doc-update-1.2 and squashes the following commits:
    
    f53154a [Tathagata Das] Addressed Josh's comments.
    ce299e4 [Tathagata Das] Minor update.
    ca19078 [Tathagata Das] Minor change
    f746951 [Tathagata Das] Mentioned performance problem with WAL
    7787209 [Tathagata Das] Merge branch 'streaming-doc-update-1.2' of github.com:tdas/spark into streaming-doc-update-1.2
    2184729 [Tathagata Das] Updated Kafka and Flume guides with reliability information.
    2f3178c [Tathagata Das] Added more information about writing reliable receivers in the custom receiver guide.
    91aa5aa [Tathagata Das] Improved API Docs menu
    5707581 [Tathagata Das] Added Pythn API badge
    b9c8c24 [Tathagata Das] Merge pull request #26 from JoshRosen/streaming-programming-guide
    b8c8382 [Josh Rosen] minor fixes
    a4ef126 [Josh Rosen] Restructure parts of the fault-tolerance section to read a bit nicer when skipping over the headings
    65f66cd [Josh Rosen] Fix broken link to fault-tolerance semantics section.
    f015397 [Josh Rosen] Minor grammar / pluralization fixes.
    3019f3a [Josh Rosen] Fix minor Markdown formatting issues
    aa8bb87 [Tathagata Das] Small update.
    195852c [Tathagata Das] Updated based on Josh's comments, updated receiver reliability and deploying section, and also updated configuration.
    17b99fb [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-doc-update-1.2
    a0217c0 [Tathagata Das] Changed Deploying menu layout
    67fcffc [Tathagata Das] Added cluster mode + supervise example to submitting application guide.
    e45453b [Tathagata Das] Update streaming guide, added deploying section.
    192c7a7 [Tathagata Das] Added more info about Python API, and rewrote the checkpointing section.
    
    (cherry picked from commit b004150)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    tdas committed Dec 11, 2014
    Configuration menu
    Copy the full SHA
    c3b0713 View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2014

  1. [SPARK-4825] [SQL] CTAS fails to resolve when created using saveAsTable

    Fix bug when query like:
    ```
      test("save join to table") {
        val testData = sparkContext.parallelize(1 to 10).map(i => TestData(i, i.toString))
        sql("CREATE TABLE test1 (key INT, value STRING)")
        testData.insertInto("test1")
        sql("CREATE TABLE test2 (key INT, value STRING)")
        testData.insertInto("test2")
        testData.insertInto("test2")
        sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").saveAsTable("test")
        checkAnswer(
          table("test"),
          sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").collect().toSeq)
      }
    ```
    
    Author: Cheng Hao <hao.cheng@intel.com>
    
    Closes #3673 from chenghao-intel/spark_4825 and squashes the following commits:
    
    e8cbd56 [Cheng Hao] alternate the pattern matching order for logical plan:CTAS
    e004895 [Cheng Hao] fix bug
    
    (cherry picked from commit 0abbff2)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    chenghao-intel authored and marmbrus committed Dec 12, 2014
    Configuration menu
    Copy the full SHA
    c82e99d View commit details
    Browse the repository at this point in the history

Commits on Dec 14, 2014

  1. fixed spelling errors in documentation

    changed "form" to "from" in 3 documentation entries for Kafka integration
    
    Author: Peter Klipfel <peter@klipfel.me>
    
    Closes #3691 from peterklipfel/master and squashes the following commits:
    
    0fe7fc5 [Peter Klipfel] fixed spelling errors in documentation
    
    (cherry picked from commit 2a2983f)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    peterklipfel authored and JoshRosen committed Dec 14, 2014
    Configuration menu
    Copy the full SHA
    6eec4bc View commit details
    Browse the repository at this point in the history

Commits on Dec 15, 2014

  1. Configuration menu
    Copy the full SHA
    2ec78a1 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4826] Fix generation of temp file names in WAL tests

    This PR should fix SPARK-4826, an issue where a bug in how we generate temp. file names was causing spurious test failures in the write ahead log suites.
    
    Closes #3695.
    Closes #3701.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #3704 from JoshRosen/SPARK-4826 and squashes the following commits:
    
    f2307f5 [Josh Rosen] Use Spark Utils class for directory creation/deletion
    a693ddb [Josh Rosen] remove unused Random import
    b275e41 [Josh Rosen] Move creation of temp. dir to beforeEach/afterEach.
    9362919 [Josh Rosen] [SPARK-4826] Fix bug in generation of temp file names. in WAL suites.
    86c1944 [Josh Rosen] Revert "HOTFIX: Disabling failing block manager test"
    
    (cherry picked from commit f6b8591)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Dec 15, 2014
    Configuration menu
    Copy the full SHA
    c5a9ae6 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4668] Fix some documentation typos.

    Author: Ryan Williams <ryan.blake.williams@gmail.com>
    
    Closes #3523 from ryan-williams/tweaks and squashes the following commits:
    
    d2eddaa [Ryan Williams] code review feedback
    ce27fc1 [Ryan Williams] CoGroupedRDD comment nit
    c6cfad9 [Ryan Williams] remove unnecessary if statement
    b74ea35 [Ryan Williams] comment fix
    b0221f0 [Ryan Williams] fix a gendered pronoun
    c71ffed [Ryan Williams] use names on a few boolean parameters
    89954aa [Ryan Williams] clarify some comments in {Security,Shuffle}Manager
    e465dac [Ryan Williams] Saved building-spark.md with Dillinger.io
    83e8358 [Ryan Williams] fix pom.xml typo
    dc4662b [Ryan Williams] typo fixes in tuning.md, configuration.md
    
    (cherry picked from commit 8176b7a)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    
    Conflicts:
    	pom.xml
    ryan-williams authored and pwendell committed Dec 15, 2014
    Configuration menu
    Copy the full SHA
    ec19175 View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2014

  1. [Minor][Core] fix comments in MapOutputTracker

    Using driver and executor in the comments of ```MapOutputTracker``` is more clear.
    
    Author: wangfei <wangfei1@huawei.com>
    
    Closes #3700 from scwf/commentFix and squashes the following commits:
    
    aa68524 [wangfei] master and worker should be driver and executor
    
    (cherry picked from commit 5c24759)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    scwf authored and JoshRosen committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    f1f27ec View commit details
    Browse the repository at this point in the history
  2. SPARK-4814 [CORE] Enable assertions in SBT, Maven tests / AssertionEr…

    …ror from Hive's LazyBinaryInteger
    
    This enables assertions for the Maven and SBT build, but overrides the Hive module to not enable assertions.
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #3692 from srowen/SPARK-4814 and squashes the following commits:
    
    caca704 [Sean Owen] Disable assertions just for Hive
    f71e783 [Sean Owen] Enable assertions for SBT and Maven build
    
    (cherry picked from commit 81112e4)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    srowen authored and JoshRosen committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    6bd8a96 View commit details
    Browse the repository at this point in the history
  3. [DOCS][SQL] Add a Note on jsonFile having separate JSON objects per line

    * This commit hopes to avoid the confusion I faced when trying
      to submit a regular, valid multi-line JSON file, also see
    
      http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html
    
    Author: Peter Vandenabeele <peter@vandenabeele.com>
    
    Closes #3517 from petervandenabeele/pv-docs-note-on-jsonFile-format/01 and squashes the following commits:
    
    1f98e52 [Peter Vandenabeele] Revert to people.json and simple Note text
    6b6e062 [Peter Vandenabeele] Change the "JSON" connotation to "txt"
    fca7dfb [Peter Vandenabeele] Add a Note on jsonFile having separate JSON objects per line
    
    (cherry picked from commit 1a9e35e)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    petervandenabeele authored and marmbrus committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    4f9916f View commit details
    Browse the repository at this point in the history
  4. [SPARK-4847][SQL]Fix "extraStrategies cannot take effect in SQLContex…

    …t" issue
    
    Author: jerryshao <saisai.shao@intel.com>
    
    Closes #3698 from jerryshao/SPARK-4847 and squashes the following commits:
    
    4741130 [jerryshao] Make later added extraStrategies effect when calling strategies
    
    (cherry picked from commit dc8280d)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    jerryshao authored and marmbrus committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    1b6fc23 View commit details
    Browse the repository at this point in the history

Commits on Dec 17, 2014

  1. [Release] Major improvements to generate contributors script

    This commit introduces several major improvements to the script
    that generates the contributors list for release notes, notably:
    
    (1) Use release tags instead of a range of commits. Across branches,
    commits are not actually strictly two-dimensional, and so it is not
    sufficient to specify a start hash and an end hash. Otherwise, we
    end up counting commits that were already merged in an older branch.
    
    (2) Match PR numbers in addition to commit hashes. This is related
    to the first point in that if a PR is already merged in an older
    minor release tag, it should be filtered out here. This requires us
    to do some intelligent regex parsing on the commit description in
    addition to just relying on the GitHub API.
    
    (3) Relax author validity check. The old code fails on a name that
    has many middle names, for instance. The test was just too strict.
    
    (4) Use GitHub authentication. This allows us to make far more
    requests through the GitHub API than before (5000 as opposed to 60
    per hour).
    
    (5) Translate from Github username, not commit author name. This is
    important because the commit author name is not always configured
    correctly by the user. For instance, the username "falaki" used to
    resolve to just "Hossein", which was treated as a github username
    and translated to something else that is completely arbitrary.
    
    (6) Add an option to use the untranslated name. If there is not
    a satisfactory candidate to replace the untranslated name with,
    at least allow the user to not translate it.
    Andrew Or committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    0fb0047 View commit details
    Browse the repository at this point in the history
  2. [Release] Cache known author translations locally

    This bypasses unnecessary calls to the Github and JIRA API.
    Additionally, having a local cache allows us to remember names
    that we had to manually discover ourselves.
    Andrew Or committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    8a69ed3 View commit details
    Browse the repository at this point in the history
  3. [Release] Update contributors list format and sort it

    Additionally, we now warn the user when a duplicate author name
    arises, in which case he/she needs to resolve it manually.
    Andrew Or committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    beb75ac View commit details
    Browse the repository at this point in the history
  4. [HOTFIX] Fix RAT exclusion for known_translations file

    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #3719 from JoshRosen/rat-fix and squashes the following commits:
    
    1542886 [Josh Rosen] [HOTFIX] Fix RAT exclusion for known_translations file
    
    (cherry picked from commit 3d0c37b)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    b5919d1 View commit details
    Browse the repository at this point in the history
  5. [SPARK-4595][Core] Fix MetricsServlet not work issue

    `MetricsServlet` handler should be added to the web UI after initialized by `MetricsSystem`, otherwise servlet handler cannot be attached.
    
    Author: Saisai Shao <saisai.shao@intel.com>
    Author: Josh Rosen <joshrosen@databricks.com>
    Author: jerryshao <saisai.shao@intel.com>
    
    Closes #3444 from jerryshao/SPARK-4595 and squashes the following commits:
    
    434d17e [Saisai Shao] Merge pull request #10 from JoshRosen/metrics-system-cleanup
    87a2292 [Josh Rosen] Guard against misuse of MetricsSystem methods.
    f779fe0 [jerryshao] Fix MetricsServlet not work issue
    
    (cherry picked from commit cf50631)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    jerryshao authored and JoshRosen committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    2f00a29 View commit details
    Browse the repository at this point in the history
  6. [SPARK-4764] Ensure that files are fetched atomically

    tempFile is created in the same directory than targetFile, so that the
    move from tempFile to targetFile is always atomic
    
    Author: Christophe Préaud <christophe.preaud@kelkoo.com>
    
    Closes #2855 from preaudc/master and squashes the following commits:
    
    9ba89ca [Christophe Préaud] Ensure that files are fetched atomically
    54419ae [Christophe Préaud] Merge remote-tracking branch 'upstream/master'
    c6a5590 [Christophe Préaud] Revert commit 8ea871f
    7456a33 [Christophe Préaud] Merge remote-tracking branch 'upstream/master'
    8ea871f [Christophe Préaud] Ensure that files are fetched atomically
    
    (cherry picked from commit ab2abcb)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Christophe Préaud authored and JoshRosen committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    e1d839e View commit details
    Browse the repository at this point in the history
  7. [SPARK-4750] Dynamic allocation - synchronize kills

    Simple omission on my part.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #3612 from andrewor14/dynamic-allocation-synchronization and squashes the following commits:
    
    1f03b60 [Andrew Or] Synchronize kills
    
    (cherry picked from commit 65f929d)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Andrew Or authored and JoshRosen committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    7ecf30e View commit details
    Browse the repository at this point in the history
  8. SPARK-3926 [CORE] Reopened: result of JavaRDD collectAsMap() is not s…

    …erializable
    
    My original 'fix' didn't fix at all. Now, there's a unit test to check whether it works. Of the two options to really fix it -- copy the `Map` to a `java.util.HashMap`, or copy and modify Scala's implementation in `Wrappers.MapWrapper`, I went with the latter.
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #3587 from srowen/SPARK-3926 and squashes the following commits:
    
    8586bb9 [Sean Owen] Remove unneeded no-arg constructor, and add additional note about copied code in LICENSE
    7bb0e66 [Sean Owen] Make SerializableMapWrapper actually serialize, and add unit test
    
    (cherry picked from commit e829bfa)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    srowen authored and JoshRosen committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    26dfac6 View commit details
    Browse the repository at this point in the history
  9. [SPARK-4691][shuffle] Restructure a few lines in shuffle code

    In HashShuffleReader.scala and HashShuffleWriter.scala, no need to judge "dep.aggregator.isEmpty" again as this is judged by "dep.aggregator.isDefined"
    
    In SortShuffleWriter.scala, "dep.aggregator.isEmpty"  is better than "!dep.aggregator.isDefined" ?
    
    Author: maji2014 <maji3@asiainfo.com>
    
    Closes #3553 from maji2014/spark-4691 and squashes the following commits:
    
    bf7b14d [maji2014] change a elegant way for SortShuffleWriter.scala
    10d0cf0 [maji2014] change a elegant way
    d8f52dc [maji2014] code optimization for judgement
    
    (cherry picked from commit b310744)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    maji2014 authored and JoshRosen committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    51081e4 View commit details
    Browse the repository at this point in the history
  10. [SPARK-4714] BlockManager.dropFromMemory() should check whether block…

    … has been removed after synchronizing on BlockInfo instance.
    
    After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed.
    
    The three methods have the same logic to get the `info` lock:
    ```
       info = blockInfo.get(id)
       if (info != null) {
         info.synchronized {
           // do something
         }
       }
    ```
    
    So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first.
    
    The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed.
    But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks.
    
    This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread.
    
    Author: hushan[胡珊] <hushan@xiaomi.com>
    
    Closes #3574 from suyanNone/refine-block-concurrency and squashes the following commits:
    
    edb989d [hushan[胡珊]] Refine code style and comments position
    55fa4ba [hushan[胡珊]] refine code
    e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn
    
    (cherry picked from commit 30dca92)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    suyanNone authored and JoshRosen committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    0ebbccb View commit details
    Browse the repository at this point in the history
  11. [SPARK-4772] Clear local copies of accumulators as soon as we're done…

    … with them
    
    Accumulators keep thread-local copies of themselves.  These copies were only cleared at the beginning of a task.  This meant that (a) the memory they used was tied up until the next task ran on that thread, and (b) if a thread died, the memory it had used for accumulators was locked up forever on that worker.
    
    This PR clears the thread-local copies of accumulators at the end of each task, in the tasks finally block, to make sure they are cleaned up between tasks.  It also stores them in a ThreadLocal object, so that if, for some reason, the thread dies, any memory they are using at the time should be freed up.
    
    Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com>
    
    Closes #3570 from nkronenfeld/Accumulator-Improvements and squashes the following commits:
    
    a581f3f [Nathan Kronenfeld] Change Accumulators to private[spark] instead of adding mima exclude to get around false positive in mima tests
    b6c2180 [Nathan Kronenfeld] Include MiMa exclude as per build error instructions - this version incompatibility should be irrelevent, as it will only surface if a master is talking to a worker running a different version of spark.
    537baad [Nathan Kronenfeld] Fuller refactoring as intended, incorporating JR's suggestions for ThreadLocal localAccums, and keeping clear(), but also calling it in tasks' finally block, rather than just at the beginning of the task.
    39a82f2 [Nathan Kronenfeld] Clear local copies of accumulators as soon as we're done with them
    
    (cherry picked from commit 94b377f)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Nathan Kronenfeld authored and JoshRosen committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    e635168 View commit details
    Browse the repository at this point in the history
  12. SPARK-785 [CORE] ClosureCleaner not invoked on most PairRDDFunctions

    This looked like perhaps a simple and important one. `combineByKey` looks like it should clean its arguments' closures, and that in turn covers apparently all remaining functions in `PairRDDFunctions` which delegate to it.
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes #3690 from srowen/SPARK-785 and squashes the following commits:
    
    8df68fe [Sean Owen] Clean context of most remaining functions in PairRDDFunctions, which ultimately call combineByKey
    
    (cherry picked from commit 2a28bc6)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    srowen authored and JoshRosen committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    76c88c6 View commit details
    Browse the repository at this point in the history
  13. [SPARK-4841] fix zip with textFile()

    UTF8Deserializer can not be used in BatchedSerializer, so always use PickleSerializer() when change batchSize in zip().
    
    Also, if two RDD have the same batch size already, they did not need re-serialize any more.
    
    Author: Davies Liu <davies@databricks.com>
    
    Closes #3706 from davies/fix_4841 and squashes the following commits:
    
    20ce3a3 [Davies Liu] fix bug in _reserialize()
    e3ebf7c [Davies Liu] add comment
    379d2c8 [Davies Liu] fix zip with textFile()
    
    (cherry picked from commit c246b95)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Davies Liu authored and JoshRosen committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    0429ec3 View commit details
    Browse the repository at this point in the history
  14. [SPARK-4821] [mllib] [python] [docs] Fix for pyspark.mllib.rand doc

    + small doc edit
    + include edit to make IntelliJ happy
    
    CC: davies  mengxr
    
    Note to davies  -- this does not fix the "WARNING: Literal block expected; none found." warnings since that seems to involve spacing which IntelliJ does not like.  (Those warnings occur when generating the Python docs.)
    
    Author: Joseph K. Bradley <joseph@databricks.com>
    
    Closes #3669 from jkbradley/python-warnings and squashes the following commits:
    
    4587868 [Joseph K. Bradley] fixed warning
    8cb073c [Joseph K. Bradley] Updated based on davies recommendation
    c51eca4 [Joseph K. Bradley] Updated rst file for pyspark.mllib.rand doc.  Small doc edit.  Small include edit to make IntelliJ happy.
    
    (cherry picked from commit affc3f4)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    jkbradley authored and mengxr committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    f305e7d View commit details
    Browse the repository at this point in the history

Commits on Dec 18, 2014

  1. Add mesos specific configurations into doc

    Author: Timothy Chen <tnachen@gmail.com>
    
    Closes #3349 from tnachen/mesos_doc and squashes the following commits:
    
    737ef49 [Timothy Chen] Add TOC
    5ca546a [Timothy Chen] Update description around cores requested.
    26283a5 [Timothy Chen] Add mesos specific configurations into doc
    
    (cherry picked from commit d9956f8)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    tnachen authored and rxin committed Dec 18, 2014
    Configuration menu
    Copy the full SHA
    19efa5b View commit details
    Browse the repository at this point in the history
  2. HOTFIX: Changing doc color

    pwendell committed Dec 18, 2014
    Configuration menu
    Copy the full SHA
    ef5c236 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4880] remove spark.locality.wait in Analytics

    spark.locality.wait set to 100000 in examples/graphx/Analytics.scala.
    Should be left to the user.
    
    Author: Ernest <earneyzxl@gmail.com>
    
    Closes #3730 from Earne/SPARK-4880 and squashes the following commits:
    
    d79ed04 [Ernest] remove spark.locality.wait in Analytics
    
    (cherry picked from commit a7ed6f3)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    Earne authored and rxin committed Dec 18, 2014
    Configuration menu
    Copy the full SHA
    e7f9dd5 View commit details
    Browse the repository at this point in the history

Commits on Dec 19, 2014

  1. [SPARK-4884]: Improve Partition docs

    Rewording was based on this discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-td9804.html
    This is the associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-4884
    
    Author: Madhu Siddalingaiah <madhu@madhu.com>
    
    Closes #3722 from msiddalingaiah/master and squashes the following commits:
    
    79e679f [Madhu Siddalingaiah] [DOC]: improve documentation
    51d14b9 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
    38faca4 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
    cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again)
    332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code>
    cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
    0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions
    
    (cherry picked from commit d5a596d)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    msiddalingaiah authored and JoshRosen committed Dec 19, 2014
    Configuration menu
    Copy the full SHA
    61c9b89 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4837] NettyBlockTransferService should use spark.blockManager.…

    …port config
    
    This is used in NioBlockTransferService here:
    https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/network/nio/NioBlockTransferService.scala#L66
    
    Author: Aaron Davidson <aaron@databricks.com>
    
    Closes #3688 from aarondav/SPARK-4837 and squashes the following commits:
    
    ebd2007 [Aaron Davidson] [SPARK-4837] NettyBlockTransferService should use spark.blockManager.port config
    
    (cherry picked from commit 105293a)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    aarondav authored and JoshRosen committed Dec 19, 2014
    Configuration menu
    Copy the full SHA
    075b399 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4754] Refactor SparkContext into ExecutorAllocationClient

    This is such that the `ExecutorAllocationManager` does not take in the `SparkContext` with all of its dependencies as an argument. This prevents future developers of this class to tie down this class further with the `SparkContext`, which has really become quite a monstrous object.
    
    cc'ing pwendell who originally suggested this, and JoshRosen who may have thoughts about the trait mix-in style of `SparkContext`.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #3614 from andrewor14/dynamic-allocation-sc and squashes the following commits:
    
    187070d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc
    59baf6c [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc
    347a348 [Andrew Or] Refactor SparkContext into ExecutorAllocationClient
    
    (cherry picked from commit 9804a75)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/SparkContext.scala
    Andrew Or committed Dec 19, 2014
    Configuration menu
    Copy the full SHA
    ca37639 View commit details
    Browse the repository at this point in the history
  4. SPARK-3428. TaskMetrics for running tasks is missing GC time metrics

    Author: Sandy Ryza <sandy@cloudera.com>
    
    Closes #3684 from sryza/sandy-spark-3428 and squashes the following commits:
    
    cb827fe [Sandy Ryza] SPARK-3428. TaskMetrics for running tasks is missing GC time metrics
    
    (cherry picked from commit 283263f)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    sryza authored and JoshRosen committed Dec 19, 2014
    Configuration menu
    Copy the full SHA
    fd7bb9d View commit details
    Browse the repository at this point in the history
  5. [SPARK-4889] update history server example cmds

    Author: Ryan Williams <ryan.blake.williams@gmail.com>
    
    Closes #3736 from ryan-williams/hist and squashes the following commits:
    
    421d8ff [Ryan Williams] add another random typo fix
    76d6a4c [Ryan Williams] remove hdfs example
    a2d0f82 [Ryan Williams] code review feedback
    9ca7629 [Ryan Williams] [SPARK-4889] update history server example cmds
    
    (cherry picked from commit cdb2c64)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    ryan-williams authored and Andrew Or committed Dec 19, 2014
    Configuration menu
    Copy the full SHA
    6aa88cc View commit details
    Browse the repository at this point in the history
  6. [SPARK-4896] don’t redundantly overwrite executor JAR deps

    Author: Ryan Williams <ryan.blake.williams@gmail.com>
    
    Closes #2848 from ryan-williams/fetch-file and squashes the following commits:
    
    c14daff [Ryan Williams] Fix copy that was changed to a move inadvertently
    8e39c16 [Ryan Williams] code review feedback
    788ed41 [Ryan Williams] don’t redundantly overwrite executor JAR deps
    
    (cherry picked from commit 7981f96)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    ryan-williams authored and JoshRosen committed Dec 19, 2014
    Configuration menu
    Copy the full SHA
    f930fe8 View commit details
    Browse the repository at this point in the history

Commits on Dec 20, 2014

  1. change signature of example to match released code

    the signature of registerKryoClasses is actually of Array[Class[_]]  not Seq
    
    Author: Eran Medan <ehrann.mehdan@gmail.com>
    
    Closes #3747 from eranation/patch-1 and squashes the following commits:
    
    ee9885d [Eran Medan] change signature of example to match released code
    eranation authored and Andrew Or committed Dec 20, 2014
    Configuration menu
    Copy the full SHA
    4da1039 View commit details
    Browse the repository at this point in the history
  2. SPARK-2641: Passing num executors to spark arguments from properties …

    …file
    
    Since we can set spark executor memory and executor cores using property file, we must also be allowed to set the executor instances.
    
    Author: Kanwaljit Singh <kanwaljit.singh@guavus.com>
    
    Closes #1657 from kjsingh/branch-1.0 and squashes the following commits:
    
    d8a5a12 [Kanwaljit Singh] SPARK-2641: Fixing how spark arguments are loaded from properties file for num executors
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
    Kanwaljit Singh authored and Andrew Or committed Dec 20, 2014
    1 Configuration menu
    Copy the full SHA
    a1a1361 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4140] Document dynamic allocation

    Once the external shuffle service is also documented, the dynamic allocation section will link to it. Let me know if the whole dynamic allocation should be moved to its separate page; I personally think the organization might be cleaner that way.
    
    This patch builds on top of oza's work in #3689.
    
    aarondav pwendell
    
    Author: Andrew Or <andrew@databricks.com>
    Author: Tsuyoshi Ozawa <ozawa.tsuyoshi@gmail.com>
    
    Closes #3731 from andrewor14/document-dynamic-allocation and squashes the following commits:
    
    1281447 [Andrew Or] Address a few comments
    b9843f2 [Andrew Or] Document the configs as well
    246fb44 [Andrew Or] Merge branch 'SPARK-4839' of github.com:oza/spark into document-dynamic-allocation
    8c64004 [Andrew Or] Add documentation for dynamic allocation (without configs)
    6827b56 [Tsuyoshi Ozawa] Fixing a documentation of spark.dynamicAllocation.enabled.
    53cff58 [Tsuyoshi Ozawa] Adding a documentation about dynamic resource allocation.
    
    (cherry picked from commit 15c03e1)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    Andrew Or committed Dec 20, 2014
    Configuration menu
    Copy the full SHA
    96d5b00 View commit details
    Browse the repository at this point in the history
  4. [Minor] Build Failed: value defaultProperties not found

    Mvn Build Failed: value defaultProperties not found .Maybe related to this pr:
    1d64812
    andrewor14 can you look at this problem?
    
    Author: huangzhaowei <carlmartinmax@gmail.com>
    
    Closes #3749 from SaintBacchus/Mvn-Build-Fail and squashes the following commits:
    
    8e2917c [huangzhaowei] Build Failed: value defaultProperties not found
    
    (cherry picked from commit a764960)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    SaintBacchus authored and JoshRosen committed Dec 20, 2014
    Configuration menu
    Copy the full SHA
    4346a2b View commit details
    Browse the repository at this point in the history

Commits on Dec 22, 2014

  1. [SPARK-2075][Core] Make the compiler generate same bytes code for Had…

    …oop 1.+ and Hadoop 2.+
    
    `NullWritable` is a `Comparable` rather than `Comparable[NullWritable]` in Hadoop 1.+, so the compiler cannot find an implicit Ordering for it. It will generate different anonymous classes for `saveAsTextFile` in Hadoop 1.+ and Hadoop 2.+. Therefore, here we provide an Ordering for NullWritable so that the compiler will generate same codes.
    
    I used the following commands to confirm the generated byte codes are some.
    ```
    mvn -Dhadoop.version=1.2.1 -DskipTests clean package -pl core -am
    javap -private -c -classpath core/target/scala-2.10/classes org.apache.spark.rdd.RDD > ~/hadoop1.txt
    
    mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package -pl core -am
    javap -private -c -classpath core/target/scala-2.10/classes org.apache.spark.rdd.RDD > ~/hadoop2.txt
    
    diff ~/hadoop1.txt ~/hadoop2.txt
    ```
    
    However, the compiler will generate different codes for the classes which call methods of `JobContext/TaskAttemptContext`. `JobContext/TaskAttemptContext` is a class in Hadoop 1.+, and calling its method will use `invokevirtual`, while it's an interface in Hadoop 2.+, and will use `invokeinterface`.
    
    To fix it, we can use reflection to call `JobContext/TaskAttemptContext.getConfiguration`.
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes #3740 from zsxwing/SPARK-2075 and squashes the following commits:
    
    39d9df2 [zsxwing] Fix the code style
    e4ad8b5 [zsxwing] Use null for the implicit Ordering
    734bac9 [zsxwing] Explicitly set the implicit parameters
    ca03559 [zsxwing] Use reflection to access JobContext/TaskAttemptContext.getConfiguration
    fa40db0 [zsxwing] Add an Ordering for NullWritable to make the compiler generate same byte codes for RDD
    
    (cherry picked from commit 6ee6aa7)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    zsxwing authored and rxin committed Dec 22, 2014
    Configuration menu
    Copy the full SHA
    665653d View commit details
    Browse the repository at this point in the history
  2. [SPARK-2075][Core] backport for branch-1.2

    backport #3740 for branch-1.2
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes #3758 from zsxwing/SPARK-2075-branch-1.2 and squashes the following commits:
    
    b57d440 [zsxwing] SPARK-2075 backport for branch-1.2
    zsxwing authored and rxin committed Dec 22, 2014
    Configuration menu
    Copy the full SHA
    b896963 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4915][YARN] Fix classname to be specified for external shuffle…

    … service.
    
    Author: Tsuyoshi Ozawa <ozawa.tsuyoshi@lab.ntt.co.jp>
    
    Closes #3757 from oza/SPARK-4915 and squashes the following commits:
    
    3b0d6d6 [Tsuyoshi Ozawa] Fix classname to be specified for external shuffle service.
    
    (cherry picked from commit 96606f6)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    Tsuyoshi Ozawa authored and Andrew Or committed Dec 22, 2014
    Configuration menu
    Copy the full SHA
    31d42c4 View commit details
    Browse the repository at this point in the history
  4. [SPARK-4883][Shuffle] Add a name to the directoryCleaner thread

    Author: zsxwing <zsxwing@gmail.com>
    
    Closes #3734 from zsxwing/SPARK-4883 and squashes the following commits:
    
    e6f2b61 [zsxwing] Fix the name
    cc74727 [zsxwing] Add a name to the directoryCleaner thread
    
    (cherry picked from commit 8773705)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    zsxwing authored and Andrew Or committed Dec 22, 2014
    Configuration menu
    Copy the full SHA
    70e69ef View commit details
    Browse the repository at this point in the history
  5. [Minor] Improve some code in BroadcastTest for short

    Using
        val arr1 = (0 until num).toArray
    instead of
        val arr1 = new Array[Int](num)
        for (i <- 0 until arr1.length) {
          arr1(i) = i
        }
    for short.
    
    Author: carlmartin <carlmartinmax@gmail.com>
    
    Closes #3750 from SaintBacchus/BroadcastTest and squashes the following commits:
    
    43adb70 [carlmartin] Improve some code in BroadcastTest for short
    SaintBacchus authored and Andrew Or committed Dec 22, 2014
    Configuration menu
    Copy the full SHA
    c7396b5 View commit details
    Browse the repository at this point in the history
  6. [SPARK-4864] Add documentation to Netty-based configs

    Author: Aaron Davidson <aaron@databricks.com>
    
    Closes #3713 from aarondav/netty-configs and squashes the following commits:
    
    8a8b373 [Aaron Davidson] Address Patrick's comments
    3b1f84e [Aaron Davidson] [SPARK-4864] Add documentation to Netty-based configs
    
    (cherry picked from commit fbca6b6)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    aarondav authored and pwendell committed Dec 22, 2014
    Configuration menu
    Copy the full SHA
    4b2bded View commit details
    Browse the repository at this point in the history
  7. [SPARK-4920][UI]:current spark version in UI is not striking.

    It is not convenient to see the Spark version. We can keep the same style with Spark website.
    
    ![spark_version](https://cloud.githubusercontent.com/assets/7402327/5527025/1c8c721c-8a35-11e4-8d6a-2734f3c6bdf8.jpg)
    
    Author: genmao.ygm <genmao.ygm@alibaba-inc.com>
    
    Closes #3763 from uncleGen/master-clean-141222 and squashes the following commits:
    
    0dcb9a9 [genmao.ygm] [SPARK-4920][UI]:current spark version in UI is not striking.
    
    (cherry picked from commit de9d7d2)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    uncleGen authored and Andrew Or committed Dec 22, 2014
    Configuration menu
    Copy the full SHA
    a8a8e0e View commit details
    Browse the repository at this point in the history
  8. [SPARK-4818][Core] Add 'iterator' to reduce memory consumed by join

    In Scala, `map` and `flatMap` of `Iterable` will copy the contents of `Iterable` to a new `Seq`. Such as,
    ```Scala
      val iterable = Seq(1, 2, 3).map(v => {
        println(v)
        v
      })
      println("Iterable map done")
    
      val iterator = Seq(1, 2, 3).iterator.map(v => {
        println(v)
        v
      })
      println("Iterator map done")
    ```
    outputed
    ```
    1
    2
    3
    Iterable map done
    Iterator map done
    ```
    So we should use 'iterator' to reduce memory consumed by join.
    
    Found by Johannes Simon in http://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3C5BE70814-9D03-4F61-AE2C-0D63F2DE4446%40mail.de%3E
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes #3671 from zsxwing/SPARK-4824 and squashes the following commits:
    
    48ee7b9 [zsxwing] Remove the explicit types
    95d59d6 [zsxwing] Add 'iterator' to reduce memory consumed by join
    
    (cherry picked from commit c233ab3)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    zsxwing authored and JoshRosen committed Dec 22, 2014
    Configuration menu
    Copy the full SHA
    58e3702 View commit details
    Browse the repository at this point in the history

Commits on Dec 23, 2014

  1. [Docs] Minor typo fixes

    Author: Nicholas Chammas <nicholas.chammas@gmail.com>
    
    Closes #3772 from nchammas/patch-1 and squashes the following commits:
    
    b7d9083 [Nicholas Chammas] [Docs] Minor typo fixes
    
    (cherry picked from commit 0e532cc)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    nchammas authored and pwendell committed Dec 23, 2014
    Configuration menu
    Copy the full SHA
    f86fe08 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4931][Yarn][Docs] Fix the format of running-on-yarn.md

    Currently, the format about log4j in running-on-yarn.md is a bit messy.
    
    ![running-on-yarn](https://cloud.githubusercontent.com/assets/1000778/5535248/204c4b64-8ab4-11e4-83c3-b4722ea0ad9d.png)
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes #3774 from zsxwing/SPARK-4931 and squashes the following commits:
    
    4a5f853 [zsxwing] Fix the format of running-on-yarn.md
    
    (cherry picked from commit 2d215ae)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    zsxwing authored and JoshRosen committed Dec 23, 2014
    Configuration menu
    Copy the full SHA
    9fb86b8 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4834] [standalone] Clean up application files after app finishes.

    Commit 7aacb7b added support for sharing downloaded files among multiple
    executors of the same app. That works great in Yarn, since the app's directory
    is cleaned up after the app is done.
    
    But Spark standalone mode didn't do that, so the lock/cache files created
    by that change were left around and could eventually fill up the disk hosting
    /tmp.
    
    To solve that, create app-specific directories under the local dirs when
    launching executors. Multiple executors launched by the same Worker will
    use the same app directories, so they should be able to share the downloaded
    files. When the application finishes, a new message is sent to all workers
    telling them the application has finished; once that message has been received,
    and all executors registered for the application shut down, then those
    directories will be cleaned up by the Worker.
    
    Note: Unit testing this is hard (if even possible), since local-cluster mode
    doesn't seem to leave the Master/Worker daemons running long enough after
    `sc.stop()` is called for the clean up protocol to take effect.
    
    Author: Marcelo Vanzin <vanzin@cloudera.com>
    
    Closes #3705 from vanzin/SPARK-4834 and squashes the following commits:
    
    b430534 [Marcelo Vanzin] Remove seemingly unnecessary synchronization.
    50eb4b9 [Marcelo Vanzin] Review feedback.
    c0e5ea5 [Marcelo Vanzin] [SPARK-4834] [standalone] Clean up application files after app finishes.
    
    (cherry picked from commit dd15536)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Marcelo Vanzin authored and JoshRosen committed Dec 23, 2014
    Configuration menu
    Copy the full SHA
    ec11ffd View commit details
    Browse the repository at this point in the history
  4. [SPARK-4932] Add help comments in Analytics

    Trivial modifications for usability.
    
    Author: Takeshi Yamamuro <linguin.m.s@gmail.com>
    
    Closes #3775 from maropu/AddHelpCommentInAnalytics and squashes the following commits:
    
    fbea8f5 [Takeshi Yamamuro] Add help comments in Analytics
    
    (cherry picked from commit 9c251c5)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    maropu authored and JoshRosen committed Dec 23, 2014
    Configuration menu
    Copy the full SHA
    e74ce14 View commit details
    Browse the repository at this point in the history
  5. [SPARK-4914][Build] Cleans lib_managed before compiling with Hive 0.13.1

    This PR tries to fix the Hive tests failure encountered in PR #3157 by cleaning `lib_managed` before building assembly jar against Hive 0.13.1 in `dev/run-tests`. Otherwise two sets of datanucleus jars would be left in `lib_managed` and may mess up class paths while executing Hive test suites. Please refer to [this thread] [1] for details. A clean build would be even safer, but we only clean `lib_managed` here to save build time.
    
    This PR also takes the chance to clean up some minor typos and formatting issues in the comments.
    
    [1]: #3157 (comment)
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3756)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #3756 from liancheng/clean-lib-managed and squashes the following commits:
    
    e2bd21d [Cheng Lian] Adds lib_managed to clean set
    c9f2f3e [Cheng Lian] Cleans lib_managed before compiling with Hive 0.13.1
    
    (cherry picked from commit 395b771)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    liancheng authored and JoshRosen committed Dec 23, 2014
    Configuration menu
    Copy the full SHA
    7b5ba85 View commit details
    Browse the repository at this point in the history
  6. [SPARK-4730][YARN] Warn against deprecated YARN settings

    See https://issues.apache.org/jira/browse/SPARK-4730.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes #3590 from andrewor14/yarn-settings and squashes the following commits:
    
    36e0753 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-settings
    dcd1316 [Andrew Or] Warn against deprecated YARN settings
    
    (cherry picked from commit 27c5399)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Andrew Or authored and JoshRosen committed Dec 23, 2014
    Configuration menu
    Copy the full SHA
    6a46cc3 View commit details
    Browse the repository at this point in the history
  7. [SPARK-4802] [streaming] Remove receiverInfo once receiver is de-regi…

    …stered
    
      Once the streaming receiver is de-registered at executor, the `ReceiverTrackerActor` needs to
    remove the corresponding reveiverInfo from the `receiverInfo` map at `ReceiverTracker`.
    
    Author: Ilayaperumal Gopinathan <igopinathan@pivotal.io>
    
    Closes #3647 from ilayaperumalg/receiverInfo-RTracker and squashes the following commits:
    
    6eb97d5 [Ilayaperumal Gopinathan] Polishing based on the review
    3640c86 [Ilayaperumal Gopinathan] Remove receiverInfo once receiver is de-registered
    
    (cherry picked from commit 10d69e9)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    ilayaperumalg authored and tdas committed Dec 23, 2014
    Configuration menu
    Copy the full SHA
    01adf45 View commit details
    Browse the repository at this point in the history
  8. [SPARK-4671][Streaming]Do not replicate streaming block when WAL is e…

    …nabled
    
    Currently streaming block will be replicated when specific storage level is set, since WAL is already fault tolerant, so replication is needless and will hurt the throughput of streaming application.
    
    Hi tdas , as per discussed about this issue, I fixed with this implementation, I'm not is this the way you want, would you mind taking a look at it? Thanks a lot.
    
    Author: jerryshao <saisai.shao@intel.com>
    
    Closes #3534 from jerryshao/SPARK-4671 and squashes the following commits:
    
    500b456 [jerryshao] Do not replicate streaming block when WAL is enabled
    
    (cherry picked from commit 3f5f4cc)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    jerryshao authored and tdas committed Dec 23, 2014
    Configuration menu
    Copy the full SHA
    aa78c23 View commit details
    Browse the repository at this point in the history

Commits on Dec 24, 2014

  1. [SPARK-4606] Send EOF to child JVM when there's no more data to read.

    Author: Marcelo Vanzin <vanzin@cloudera.com>
    
    Closes #3460 from vanzin/SPARK-4606 and squashes the following commits:
    
    031207d [Marcelo Vanzin] [SPARK-4606] Send EOF to child JVM when there's no more data to read.
    
    (cherry picked from commit 7e2deb7)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Marcelo Vanzin authored and JoshRosen committed Dec 24, 2014
    Configuration menu
    Copy the full SHA
    1a4e2ba View commit details
    Browse the repository at this point in the history

Commits on Dec 25, 2014

  1. [SPARK-4873][Streaming] Use Future.zip instead of Future.flatMap(…

    …for-loop) in WriteAheadLogBasedBlockHandler
    
    Use `Future.zip` instead of `Future.flatMap`(for-loop). `zip` implies these two Futures will run concurrently, while `flatMap` usually means one Future depends on the other one.
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes #3721 from zsxwing/SPARK-4873 and squashes the following commits:
    
    46a2cd9 [zsxwing] Use Future.zip instead of Future.flatMap(for-loop)
    
    (cherry picked from commit b4d0db8)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    zsxwing authored and tdas committed Dec 25, 2014
    Configuration menu
    Copy the full SHA
    17d6f54 View commit details
    Browse the repository at this point in the history
  2. Fix "Building Spark With Maven" link in README.md

    Corrected link to the Building Spark with Maven page from its original (http://spark.apache.org/docs/latest/building-with-maven.html) to the current page (http://spark.apache.org/docs/latest/building-spark.html)
    
    Author: Denny Lee <denny.g.lee@gmail.com>
    
    Closes #3802 from dennyglee/patch-1 and squashes the following commits:
    
    15f601a [Denny Lee] Update README.md
    
    (cherry picked from commit 08b18c7)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    dennyglee authored and JoshRosen committed Dec 25, 2014
    Configuration menu
    Copy the full SHA
    475ab6e View commit details
    Browse the repository at this point in the history

Commits on Dec 26, 2014

  1. [SPARK-4537][Streaming] Expand StreamingSource to add more metrics

    Add `processingDelay`, `schedulingDelay` and `totalDelay` for the last completed batch. Add `lastReceivedBatchRecords` and `totalReceivedBatchRecords` to the received records counting.
    
    Author: jerryshao <saisai.shao@intel.com>
    
    Closes #3466 from jerryshao/SPARK-4537 and squashes the following commits:
    
    00f5f7f [jerryshao] Change the code style and add totalProcessedRecords
    44721a6 [jerryshao] Further address the comments
    c097ddc [jerryshao] Address the comments
    02dd44f [jerryshao] Fix the addressed comments
    c7a9376 [jerryshao] Expand StreamingSource to add more metrics
    
    (cherry picked from commit f205fe4)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    jerryshao authored and tdas committed Dec 26, 2014
    Configuration menu
    Copy the full SHA
    acf5c63 View commit details
    Browse the repository at this point in the history
  2. SPARK-4971: Fix typo in BlockGenerator comment

    Author: CodingCat <zhunansjtu@gmail.com>
    
    Closes #3807 from CodingCat/new_branch and squashes the following commits:
    
    5167f01 [CodingCat] fix typo in the comment
    
    (cherry picked from commit fda4331)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    CodingCat authored and JoshRosen committed Dec 26, 2014
    Configuration menu
    Copy the full SHA
    391080b View commit details
    Browse the repository at this point in the history

Commits on Dec 27, 2014

  1. [SPARK-3787][BUILD] Assembly jar name is wrong when we build with sbt…

    … omitting -Dhadoop.version
    
    This PR is another solution for When we build with sbt with profile for hadoop and without property for hadoop version like:
    
        sbt/sbt -Phadoop-2.2 assembly
    
    jar name is always used default version (1.0.4).
    
    When we build with maven with same condition for sbt, default version for each profile is used.
    For instance, if we  build like:
    
        mvn -Phadoop-2.2 package
    
    jar name is used hadoop2.2.0 as a default version of hadoop-2.2.
    
    Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    
    Closes #3046 from sarutak/fix-assembly-jarname-2 and squashes the following commits:
    
    41ef90e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname-2
    50c8676 [Kousuke Saruta] Merge branch 'fix-assembly-jarname-2' of github.com:sarutak/spark into fix-assembly-jarname-2
    52a1cd2 [Kousuke Saruta] Fixed comflicts
    dd30768 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname2
    f1c90bb [Kousuke Saruta] Fixed SparkBuild.scala in order to read `hadoop.version` property from pom.xml
    af6b100 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname
    c81806b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname
    ad1f96e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname
    b2318eb [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname
    5fc1259 [Kousuke Saruta] Fixed typo.
    eebbb7d [Kousuke Saruta] Fixed wrong jar name
    sarutak authored and pwendell committed Dec 27, 2014
    Configuration menu
    Copy the full SHA
    2e0af87 View commit details
    Browse the repository at this point in the history
  2. HOTFIX: Slight tweak on previous commit.

    Meant to merge this in when committing SPARK-3787.
    pwendell committed Dec 27, 2014
    Configuration menu
    Copy the full SHA
    3c4acac View commit details
    Browse the repository at this point in the history
  3. [SPARK-4952][Core]Handle ConcurrentModificationExceptions in SparkEnv…

    ….environmentDetails
    
    Author: GuoQiang Li <witgo@qq.com>
    
    Closes #3788 from witgo/SPARK-4952 and squashes the following commits:
    
    d903529 [GuoQiang Li] Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails
    
    (cherry picked from commit 080ceb7)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    witgo authored and pwendell committed Dec 27, 2014
    Configuration menu
    Copy the full SHA
    23d64cf View commit details
    Browse the repository at this point in the history

Commits on Dec 29, 2014

  1. [SPARK-4966][YARN]The MemoryOverhead value is setted not correctly

    Author: meiyoula <1039320815@qq.com>
    
    Closes #3797 from XuTingjun/MemoryOverhead and squashes the following commits:
    
    5a780fc [meiyoula] Update ClientArguments.scala
    
    (cherry picked from commit 14fa87b)
    Signed-off-by: Thomas Graves <tgraves@apache.org>
    XuTingjun authored and tgravescs committed Dec 29, 2014
    Configuration menu
    Copy the full SHA
    2cd446a View commit details
    Browse the repository at this point in the history
  2. [SPARK-4982][DOC] spark.ui.retainedJobs description is wrong in Spa…

    …rk UI configuration guide
    
    Author: wangxiaojing <u9jing@gmail.com>
    
    Closes #3818 from wangxiaojing/SPARK-4982 and squashes the following commits:
    
    fe2ad5f [wangxiaojing] change stages to jobs
    
    (cherry picked from commit 6645e52)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    wangxiaojing authored and JoshRosen committed Dec 29, 2014
    Configuration menu
    Copy the full SHA
    7604666 View commit details
    Browse the repository at this point in the history
  3. SPARK-4968: takeOrdered to skip reduce step in case mappers return no…

    … partitions
    
    takeOrdered should skip reduce step in case mapped RDDs have no partitions. This prevents the mentioned exception :
    
    4. run query
    SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100;
    Error trace
    java.lang.UnsupportedOperationException: empty collection
    at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
    at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.reduce(RDD.scala:863)
    at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136)
    
    Author: Yash Datta <Yash.Datta@guavus.com>
    
    Closes #3830 from saucam/fix_takeorder and squashes the following commits:
    
    5974d10 [Yash Datta] SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions
    
    (cherry picked from commit 9bc0df6)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    Yash Datta authored and rxin committed Dec 29, 2014
    Configuration menu
    Copy the full SHA
    e81c869 View commit details
    Browse the repository at this point in the history

Commits on Dec 30, 2014

  1. [SPARK-4920][UI] add version on master and worker page for standalone…

    … mode
    
    Author: Zhang, Liye <liye.zhang@intel.com>
    
    Closes #3769 from liyezhang556520/spark-4920_WebVersion and squashes the following commits:
    
    3bb7e0d [Zhang, Liye] add version on master and worker page
    
    (cherry picked from commit 9077e72)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    liyezhang556520 authored and JoshRosen committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    e20d632 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4882] Register PythonBroadcast with Kryo so that PySpark works…

    … with KryoSerializer
    
    This PR fixes an issue where PySpark broadcast variables caused NullPointerExceptions if KryoSerializer was used.  The fix is to register PythonBroadcast with Kryo so that it's deserialized with a KryoJavaSerializer.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #3831 from JoshRosen/SPARK-4882 and squashes the following commits:
    
    0466c7a [Josh Rosen] Register PythonBroadcast with Kryo.
    d5b409f [Josh Rosen] Enable registrationRequired, which would have caught this bug.
    069d8a7 [Josh Rosen] Add failing test for SPARK-4882
    
    (cherry picked from commit efa80a5)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    42809db View commit details
    Browse the repository at this point in the history
  3. [SPARK-4908][SQL] Prevent multiple concurrent hive native commands

    This is just a quick fix that locks when calling `runHive`.  If we can find a way to avoid the error without a global lock that would be better.
    
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #3834 from marmbrus/hiveConcurrency and squashes the following commits:
    
    bf25300 [Michael Armbrust] prevent multiple concurrent hive native commands
    
    (cherry picked from commit 480bd1d)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    cde8a31 View commit details
    Browse the repository at this point in the history
  4. [SPARK-4386] Improve performance when writing Parquet files

    Convert type of RowWriteSupport.attributes to Array.
    
    Analysis of performance for writing very wide tables shows that time is spent predominantly in apply method on  attributes var. Type of attributes previously was LinearSeqOptimized and apply is O(N) which made write O(N squared).
    
    Measurements on 575 column table showed this change made a 6x improvement in write times.
    
    Author: Michael Davies <Michael.BellDavies@gmail.com>
    
    Closes #3843 from MickDavies/SPARK-4386 and squashes the following commits:
    
    892519d [Michael Davies] [SPARK-4386] Improve performance when writing Parquet files
    
    (cherry picked from commit 7425bec)
    Signed-off-by: Michael Armbrust <michael@databricks.com>
    MickDavies authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    7a24541 View commit details
    Browse the repository at this point in the history
  5. [SPARK-4813][Streaming] Fix the issue that ContextWaiter didn't handl…

    …e 'spurious wakeup'
    
    Used `Condition` to rewrite `ContextWaiter` because it provides a convenient API `awaitNanos` for timeout.
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes #3661 from zsxwing/SPARK-4813 and squashes the following commits:
    
    52247f5 [zsxwing] Add explicit unit type
    be42bcf [zsxwing] Update as per review suggestion
    e06bd4f [zsxwing] Fix the issue that ContextWaiter didn't handle 'spurious wakeup'
    
    (cherry picked from commit 6a89782)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    zsxwing authored and tdas committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    edc96d8 View commit details
    Browse the repository at this point in the history

Commits on Dec 31, 2014

  1. [SPARK-1010] Clean up uses of System.setProperty in unit tests

    Several of our tests call System.setProperty (or test code which implicitly sets system properties) and don't always reset/clear the modified properties, which can create ordering dependencies between tests and cause hard-to-diagnose failures.
    
    This patch removes most uses of System.setProperty from our tests, since in most cases we can use SparkConf to set these configurations (there are a few exceptions, including the tests of SparkConf itself).
    
    For the cases where we continue to use System.setProperty, this patch introduces a `ResetSystemProperties` ScalaTest mixin class which snapshots the system properties before individual tests and to automatically restores them on test completion / failure.  See the block comment at the top of the ResetSystemProperties class for more details.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #3739 from JoshRosen/cleanup-system-properties-in-tests and squashes the following commits:
    
    0236d66 [Josh Rosen] Replace setProperty uses in two example programs / tools
    3888fe3 [Josh Rosen] Remove setProperty use in LocalJavaStreamingContext
    4f4031d [Josh Rosen] Add note on why SparkSubmitSuite needs ResetSystemProperties
    4742a5b [Josh Rosen] Clarify ResetSystemProperties trait inheritance ordering.
    0eaf0b6 [Josh Rosen] Remove setProperty call in TaskResultGetterSuite.
    7a3d224 [Josh Rosen] Fix trait ordering
    3fdb554 [Josh Rosen] Remove setProperty call in TaskSchedulerImplSuite
    bee20df [Josh Rosen] Remove setProperty calls in SparkContextSchedulerCreationSuite
    655587c [Josh Rosen] Remove setProperty calls in JobCancellationSuite
    3f2f955 [Josh Rosen] Remove System.setProperty calls in DistributedSuite
    cfe9cce [Josh Rosen] Remove use of system properties in SparkContextSuite
    8783ab0 [Josh Rosen] Remove TestUtils.setSystemProperty, since it is subsumed by the ResetSystemProperties trait.
    633a84a [Josh Rosen] Remove use of system properties in FileServerSuite
    25bfce2 [Josh Rosen] Use ResetSystemProperties in UtilsSuite
    1d1aa5a [Josh Rosen] Use ResetSystemProperties in SizeEstimatorSuite
    dd9492b [Josh Rosen] Use ResetSystemProperties in AkkaUtilsSuite
    b0daff2 [Josh Rosen] Use ResetSystemProperties in BlockManagerSuite
    e9ded62 [Josh Rosen] Use ResetSystemProperties in TaskSchedulerImplSuite
    5b3cb54 [Josh Rosen] Use ResetSystemProperties in SparkListenerSuite
    0995c4b [Josh Rosen] Use ResetSystemProperties in SparkContextSchedulerCreationSuite
    c83ded8 [Josh Rosen] Use ResetSystemProperties in SparkConfSuite
    51aa870 [Josh Rosen] Use withSystemProperty in ShuffleSuite
    60a63a1 [Josh Rosen] Use ResetSystemProperties in JobCancellationSuite
    14a92e4 [Josh Rosen] Use withSystemProperty in FileServerSuite
    628f46c [Josh Rosen] Use ResetSystemProperties in DistributedSuite
    9e3e0dd [Josh Rosen] Add ResetSystemProperties test fixture mixin; use it in SparkSubmitSuite.
    4dcea38 [Josh Rosen] Move withSystemProperty to TestUtils class.
    
    (cherry picked from commit 352ed6b)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Dec 31, 2014
    Configuration menu
    Copy the full SHA
    ad3dc81 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4298][Core] - The spark-submit cannot read Main-Class from Man…

    …ifest.
    
    Resolves a bug where the `Main-Class` from a .jar file wasn't being read in properly. This was caused by the fact that the `primaryResource` object was a URI and needed to be normalized through a call to `.getPath` before it could be passed into the `JarFile` object.
    
    Author: Brennon York <brennon.york@capitalone.com>
    
    Closes #3561 from brennonyork/SPARK-4298 and squashes the following commits:
    
    5e0fce1 [Brennon York] Use string interpolation for error messages, moved comment line from original code to above its necessary code segment
    14daa20 [Brennon York] pushed mainClass assignment into match statement, removed spurious spaces, removed { } from case statements, removed return values
    c6dad68 [Brennon York] Set case statement to support multiple jar URI's and enabled the 'file' URI to load the main-class
    8d20936 [Brennon York] updated to reset the error message back to the default
    a043039 [Brennon York] updated to split the uri and jar vals
    8da7cbf [Brennon York] fixes SPARK-4298
    
    (cherry picked from commit 8e14c5e)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Brennon York authored and JoshRosen committed Dec 31, 2014
    Configuration menu
    Copy the full SHA
    7c9c25b View commit details
    Browse the repository at this point in the history
  3. [HOTFIX] Disable Spark UI in SparkSubmitSuite tests

    This should fix a major cause of build breaks when running many parallel tests.
    JoshRosen committed Dec 31, 2014
    Configuration menu
    Copy the full SHA
    076de46 View commit details
    Browse the repository at this point in the history
  4. [SPARK-4790][STREAMING] Fix ReceivedBlockTrackerSuite waits for old f…

    …ile...
    
    ...s to get deleted before continuing.
    
    Since the deletes are happening asynchronously, the getFileStatus call might throw an exception in older HDFS
    versions, if the delete happens between the time listFiles is called on the directory and getFileStatus is called
    on the file in the getFileStatus method.
    
    This PR addresses this by adding an option to delete the files synchronously and then waiting for the deletion to
    complete before proceeding.
    
    Author: Hari Shreedharan <hshreedharan@apache.org>
    
    Closes #3726 from harishreedharan/spark-4790 and squashes the following commits:
    
    bbbacd1 [Hari Shreedharan] Call cleanUpOldLogs only once in the tests.
    3255f17 [Hari Shreedharan] Add test for async deletion. Remove method from ReceiverTracker that does not take waitForCompletion.
    e4c83ec [Hari Shreedharan] Making waitForCompletion a mandatory param. Remove eventually from WALSuite since the cleanup method returns only after all files are deleted.
    af00fd1 [Hari Shreedharan] [SPARK-4790][STREAMING] Fix ReceivedBlockTrackerSuite waits for old files to get deleted before continuing.
    
    (cherry picked from commit 3610d3c)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    harishreedharan authored and tdas committed Dec 31, 2014
    Configuration menu
    Copy the full SHA
    bd70ff9 View commit details
    Browse the repository at this point in the history
  5. [SPARK-5028][Streaming]Add total received and processed records metri…

    …cs to Streaming UI
    
    This is a follow-up work of [SPARK-4537](https://issues.apache.org/jira/browse/SPARK-4537). Adding total received records and processed records metrics back to UI.
    
    ![screenshot](https://dl.dropboxusercontent.com/u/19230832/screenshot.png)
    
    Author: jerryshao <saisai.shao@intel.com>
    
    Closes #3852 from jerryshao/SPARK-5028 and squashes the following commits:
    
    c8c4877 [jerryshao] Add total received and processed metrics to Streaming UI
    
    (cherry picked from commit fdc2aa4)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    jerryshao authored and tdas committed Dec 31, 2014
    Configuration menu
    Copy the full SHA
    14dbd83 View commit details
    Browse the repository at this point in the history

Commits on Jan 1, 2015

  1. [SPARK-5035] [Streaming] ReceiverMessage trait should extend Serializ…

    …able
    
    Spark Streaming's ReceiverMessage trait should extend Serializable in order to fix a subtle bug that only occurs when running on a real cluster:
    
    If you attempt to send a fire-and-forget message to a remote Akka actor and that message cannot be serialized, then this seems to lead to more-or-less silent failures. As an optimization, Akka skips message serialization for messages sent within the same JVM. As a result, Spark's unit tests will never fail due to non-serializable Akka messages, but these will cause mostly-silent failures when running on a real cluster.
    
    Before this patch, here was the code for ReceiverMessage:
    
    ```
    /** Messages sent to the NetworkReceiver. */
    private[streaming] sealed trait ReceiverMessage
    private[streaming] object StopReceiver extends ReceiverMessage
    ```
    
    Since ReceiverMessage does not extend Serializable and StopReceiver is a regular `object`, not a `case object`, StopReceiver will throw serialization errors. As a result, graceful receiver shutdown is broken on real clusters (and local-cluster mode) but works in local modes. If you want to reproduce this, try running the word count example from the Streaming Programming Guide in the Spark shell:
    
    ```
    import org.apache.spark._
    import org.apache.spark.streaming._
    import org.apache.spark.streaming.StreamingContext._
    val ssc = new StreamingContext(sc, Seconds(10))
    // Create a DStream that will connect to hostname:port, like localhost:9999
    val lines = ssc.socketTextStream("localhost", 9999)
    // Split each line into words
    val words = lines.flatMap(_.split(" "))
    import org.apache.spark.streaming.StreamingContext._
    // Count each word in each batch
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)
    // Print the first ten elements of each RDD generated in this DStream to the console
    wordCounts.print()
    ssc.start()
    Thread.sleep(10000)
    ssc.stop(true, true)
    ```
    
    Prior to this patch, this would work correctly in local mode but fail when running against a real cluster (it would report that some receivers were not shut down).
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #3857 from JoshRosen/SPARK-5035 and squashes the following commits:
    
    71d0eae [Josh Rosen] [SPARK-5035] ReceiverMessage trait should extend Serializable.
    
    (cherry picked from commit fe6efac)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    JoshRosen authored and tdas committed Jan 1, 2015
    Configuration menu
    Copy the full SHA
    434ea00 View commit details
    Browse the repository at this point in the history
  2. [HOTFIX] Bind web UI to ephemeral port in DriverSuite

    The job launched by DriverSuite should bind the web UI to an ephemeral port, since it looks like port contention in this test has caused a large number of Jenkins failures when many builds are started simultaneously.  Our tests already disable the web UI, but this doesn't affect subprocesses launched by our tests.  In this case, I've opted to bind to an ephemeral port instead of disabling the UI because disabling features in this test may mask its ability to catch certain bugs.
    
    See also: e24d3a9
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #3873 from JoshRosen/driversuite-webui-port and squashes the following commits:
    
    48cd05c [Josh Rosen] [HOTFIX] Bind web UI to ephemeral port in DriverSuite.
    
    (cherry picked from commit 0128398)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Jan 1, 2015
    Configuration menu
    Copy the full SHA
    da9a4b9 View commit details
    Browse the repository at this point in the history

Commits on Jan 2, 2015

  1. Fixed typos in streaming-kafka-integration.md

    Changed projrect to project :)
    
    Author: Akhil Das <akhld@darktech.ca>
    
    Closes #3876 from akhld/patch-1 and squashes the following commits:
    
    e0cf9ef [Akhil Das] Fixed typos in streaming-kafka-integration.md
    Akhil Das authored and tdas committed Jan 2, 2015
    Configuration menu
    Copy the full SHA
    33f0b14 View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2015

  1. [SPARK-5058] Updated broken links

    Updated the broken link pointing to the KafkaWordCount example to the correct one.
    
    Author: sigmoidanalytics <mayur@sigmoidanalytics.com>
    
    Closes #3877 from sigmoidanalytics/patch-1 and squashes the following commits:
    
    3e19b31 [sigmoidanalytics] Updated broken links
    
    (cherry picked from commit 342612b)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    sigmoidanalytics authored and tdas committed Jan 4, 2015
    Configuration menu
    Copy the full SHA
    93617dd View commit details
    Browse the repository at this point in the history
  2. [SPARK-4787] Stop SparkContext if a DAGScheduler init error occurs

    Author: Dale <tigerquoll@outlook.com>
    
    Closes #3809 from tigerquoll/SPARK-4787 and squashes the following commits:
    
    5661e01 [Dale] [SPARK-4787] Ensure that call to stop() doesn't lose the exception by using a finally block.
    2172578 [Dale] [SPARK-4787] Stop context properly if an exception occurs during DAGScheduler initialization.
    
    (cherry picked from commit 3fddc94)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    tigerquoll authored and JoshRosen committed Jan 4, 2015
    Configuration menu
    Copy the full SHA
    9dbb62e View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2015

  1. [SPARK-4631] unit test for MQTT

    Please review the unit test for MQTT
    
    Author: bilna <bilnap@am.amrita.edu>
    Author: Bilna P <bilna.p@gmail.com>
    
    Closes #3844 from Bilna/master and squashes the following commits:
    
    acea3a3 [bilna] Adding dependency with scope test
    28681fa [bilna] Merge remote-tracking branch 'upstream/master'
    fac3904 [bilna] Correction in Indentation and coding style
    ed9db4c [bilna] Merge remote-tracking branch 'upstream/master'
    4b34ee7 [Bilna P] Update MQTTStreamSuite.scala
    04503cf [bilna] Added embedded broker service for mqtt test
    89d804e [bilna] Merge remote-tracking branch 'upstream/master'
    fc8eb28 [bilna] Merge remote-tracking branch 'upstream/master'
    4b58094 [Bilna P] Update MQTTStreamSuite.scala
    b1ac4ad [bilna] Added BeforeAndAfter
    5f6bfd2 [bilna] Added BeforeAndAfter
    e8b6623 [Bilna P] Update MQTTStreamSuite.scala
    5ca6691 [Bilna P] Update MQTTStreamSuite.scala
    8616495 [bilna] [SPARK-4631] unit test for MQTT
    
    (cherry picked from commit e767d7d)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    bilna authored and tdas committed Jan 5, 2015
    Configuration menu
    Copy the full SHA
    67e2eb6 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4835] Disable validateOutputSpecs for Spark Streaming jobs

    This patch disables output spec. validation for jobs launched through Spark Streaming, since this interferes with checkpoint recovery.
    
    Hadoop OutputFormats have a `checkOutputSpecs` method which performs certain checks prior to writing output, such as checking whether the output directory already exists.  SPARK-1100 added checks for FileOutputFormat, SPARK-1677 (#947) added a SparkConf configuration to disable these checks, and SPARK-2309 (#1088) extended these checks to run for all OutputFormats, not just FileOutputFormat.
    
    In Spark Streaming, we might have to re-process a batch during checkpoint recovery, so `save` actions may be called multiple times.  In addition to `DStream`'s own save actions, users might use `transform` or `foreachRDD` and call the `RDD` and `PairRDD` save actions.  When output spec. validation is enabled, the second calls to these actions will fail due to existing output.
    
    This patch automatically disables output spec. validation for jobs submitted by the Spark Streaming scheduler.  This is done by using Scala's `DynamicVariable` to propagate the bypass setting without having to mutate SparkConf or introduce a global variable.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes #3832 from JoshRosen/SPARK-4835 and squashes the following commits:
    
    36eaf35 [Josh Rosen] Add comment explaining use of transform() in test.
    6485cf8 [Josh Rosen] Add test case in Streaming; fix bug for transform()
    7b3e06a [Josh Rosen] Remove Streaming-specific setting to undo this change; update conf. guide
    bf9094d [Josh Rosen] Revise disableOutputSpecValidation() comment to not refer to Spark Streaming.
    e581d17 [Josh Rosen] Deduplicate isOutputSpecValidationEnabled logic.
    762e473 [Josh Rosen] [SPARK-4835] Disable validateOutputSpecs for Spark Streaming jobs.
    
    (cherry picked from commit 939ba1f)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    JoshRosen authored and tdas committed Jan 5, 2015
    Configuration menu
    Copy the full SHA
    a0bb88e View commit details
    Browse the repository at this point in the history
  3. [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos enviro…

    …nme...
    
    ...nt at all.
    
    - fixed a scope of runAsSparkUser from MesosExecutorDriver.run to MesosExecutorBackend.launchTask
    - See the Jira Issue for more details.
    
    Author: Jongyoul Lee <jongyoul@gmail.com>
    
    Closes #3741 from jongyoul/SPARK-4465 and squashes the following commits:
    
    46ad71e [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - Removed unused import
    3d6631f [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - Removed comments and adjusted indentations
    2343f13 [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - fixed a scope of runAsSparkUser from MesosExecutorDriver.run to MesosExecutorBackend.launchTask
    
    (cherry picked from commit 1c0e7ce)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    jongyoul authored and JoshRosen committed Jan 5, 2015
    Configuration menu
    Copy the full SHA
    f979205 View commit details
    Browse the repository at this point in the history
  4. [SPARK-5089][PYSPARK][MLLIB] Fix vector convert

    This is a small change addressing a potentially significant bug in how PySpark + MLlib handles non-float64 numpy arrays. The automatic conversion to `DenseVector` that occurs when passing RDDs to MLlib algorithms in PySpark should automatically upcast to float64s, but currently this wasn't actually happening. As a result, non-float64 would be silently parsed inappropriately during SerDe, yielding erroneous results when running, for example, KMeans.
    
    The PR includes the fix, as well as a new test for the correct conversion behavior.
    
    davies
    
    Author: freeman <the.freeman.lab@gmail.com>
    
    Closes #3902 from freeman-lab/fix-vector-convert and squashes the following commits:
    
    764db47 [freeman] Add a test for proper conversion behavior
    704f97e [freeman] Return array after changing type
    
    (cherry picked from commit 6c6f325)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    freeman-lab authored and mengxr committed Jan 5, 2015
    Configuration menu
    Copy the full SHA
    cf55a2b View commit details
    Browse the repository at this point in the history

Commits on Jan 6, 2015

  1. [HOTFIX] Add missing SparkContext._ import to fix 1.2 build.

    This fixes a build break caused by a0bb88e
    JoshRosen committed Jan 6, 2015
    Configuration menu
    Copy the full SHA
    db83acb View commit details
    Browse the repository at this point in the history

Commits on Jan 7, 2015

  1. [YARN][SPARK-4929] Bug fix: fix the yarn-client code to support HA

    Nowadays, yarn-client will exit directly when the HA change happens no matter how many times the am should retry.
    The reason may be that the default final status only considerred the sys.exit, and the yarn-client HA cann't benefit from this.
    So we should distinct the default final status between client and cluster, because the SUCCEEDED status may cause the HA failed in client mode and UNDEFINED may cause the error reporter in cluster when using sys.exit.
    
    Author: huangzhaowei <carlmartinmax@gmail.com>
    
    Closes #3771 from SaintBacchus/YarnHA and squashes the following commits:
    
    c02bfcc [huangzhaowei] Improve the comment of the funciton 'getDefaultFinalStatus'
    0e69924 [huangzhaowei] Bug fix: fix the yarn-client code to support HA
    
    (cherry picked from commit 5fde661)
    Signed-off-by: Thomas Graves <tgraves@apache.org>
    SaintBacchus authored and tgravescs committed Jan 7, 2015
    Configuration menu
    Copy the full SHA
    7a4be0b View commit details
    Browse the repository at this point in the history
  2. [SPARK-5132][Core]Correct stage Attempt Id key in stageInfofromJson

    SPARK-5132:
    stageInfoToJson: Stage Attempt Id
    stageInfoFromJson: Attempt Id
    
    Author: hushan[胡珊] <hushan@xiaomi.com>
    
    Closes #3932 from suyanNone/json-stage and squashes the following commits:
    
    41419ab [hushan[胡珊]] Correct stage Attempt Id key in stageInfofromJson
    
    (cherry picked from commit d345ebe)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    suyanNone authored and JoshRosen committed Jan 7, 2015
    Configuration menu
    Copy the full SHA
    1770c51 View commit details
    Browse the repository at this point in the history