forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SKIPME merged Apache branch-1.6 #126
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…y when Jenkins load is high We need to make sure that the last entry is indeed the last entry in the queue. Author: Burak Yavuz <brkyvz@gmail.com> Closes apache#10110 from brkyvz/batch-wal-test-fix. (cherry picked from commit 6fd9e70) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
This PR: 1. Suppress all known warnings. 2. Cleanup test cases and fix some errors in test cases. 3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext. 4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat' 5. Make sure the default Hadoop file system is local when running test cases. 6. Turn on warnings into errors. Author: Sun Rui <rui.sun@intel.com> Closes apache#10030 from sun-rui/SPARK-12034. (cherry picked from commit 39d677c) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Currently, the current line is not cleared by Cltr-C After this patch ``` >>> asdfasdf^C Traceback (most recent call last): File "~/spark/python/pyspark/context.py", line 225, in signal_handler raise KeyboardInterrupt() KeyboardInterrupt ``` It's still worse than 1.5 (and before). Author: Davies Liu <davies@databricks.com> Closes apache#10134 from davies/fix_cltrc. (cherry picked from commit ef3f047) Signed-off-by: Davies Liu <davies.liu@gmail.com>
…ner not present The reason is that TrackStateRDDs generated by trackStateByKey expect the previous batch's TrackStateRDDs to have a partitioner. However, when recovery from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a partitioner attached to it. This is because RDD checkpoints do not preserve the partitioner (SPARK-12004). While apache#9983 solves SPARK-12004 by preserving the partitioner through RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes apache#9988 from tdas/SPARK-11932. (cherry picked from commit 5d80d8c) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
https://issues.apache.org/jira/browse/SPARK-11963 Author: Xusen Yin <yinxusen@gmail.com> Closes apache#9962 from yinxusen/SPARK-11963. (cherry picked from commit 871e85d) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…cala doc In SPARK-11946 the API for pivot was changed a bit and got updated doc, the doc changes were not made for the python api though. This PR updates the python doc to be consistent. Author: Andrew Ray <ray.andrew@gmail.com> Closes apache#10176 from aray/sql-pivot-python-doc. (cherry picked from commit 36282f7) Signed-off-by: Yin Huai <yhuai@databricks.com>
Switched from using SQLContext constructor to using getOrCreate, mainly in model save/load methods. This covers all instances in spark.mllib. There were no uses of the constructor in spark.ml. CC: mengxr yhuai Author: Joseph K. Bradley <joseph@databricks.com> Closes apache#10161 from jkbradley/mllib-sqlcontext-fix. (cherry picked from commit 3e7e05f) Signed-off-by: Xiangrui Meng <meng@databricks.com>
…ing include_example Made new patch contaning only markdown examples moved to exmaple/folder. Ony three java code were not shfted since they were contaning compliation error ,these classes are 1)StandardScale 2)NormalizerExample 3)VectorIndexer Author: Xusen Yin <yinxusen@gmail.com> Author: somideshmukh <somilde@us.ibm.com> Closes apache#10002 from somideshmukh/SomilBranch1.33. (cherry picked from commit 78209b0) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Add since annotation to ml.classification Author: Takahashi Hiroshi <takahashi.hiroshi@lab.ntt.co.jp> Closes apache#8534 from taishi-oss/issue10259. (cherry picked from commit 7d05a62) Signed-off-by: Xiangrui Meng <meng@databricks.com>
…mple code Add ```SQLTransformer``` user guide, example code and make Scala API doc more clear. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#10006 from yanboliang/spark-11958. (cherry picked from commit 4a39b5a) Signed-off-by: Xiangrui Meng <meng@databricks.com>
…means Value Author: cody koeninger <cody@koeninger.org> Closes apache#10132 from koeninger/SPARK-12103. (cherry picked from commit 48a9804) Signed-off-by: Sean Owen <sowen@cloudera.com>
Author: Jeff Zhang <zjffdu@apache.org> Closes apache#10172 from zjffdu/SPARK-12166. (cherry picked from commit 7081291) Signed-off-by: Sean Owen <sowen@cloudera.com>
This reverts PR apache#10002, commit 78209b0. The original PR wasn't tested on Jenkins before being merged. Author: Cheng Lian <lian@databricks.com> Closes apache#10200 from liancheng/revert-pr-10002. (cherry picked from commit da2012a) Signed-off-by: Cheng Lian <lian@databricks.com>
Fix commons-collection group ID to commons-collections for version 3.x Patches earlier PR at apache#9731 Author: Sean Owen <sowen@cloudera.com> Closes apache#10198 from srowen/SPARK-11652.2. (cherry picked from commit e3735ce) Signed-off-by: Sean Owen <sowen@cloudera.com>
checked with hive, greatest/least should cast their children to a tightest common type, i.e. `(int, long) => long`, `(int, string) => error`, `(decimal(10,5), decimal(5, 10)) => error` Author: Wenchen Fan <wenchen@databricks.com> Closes apache#10196 from cloud-fan/type-coercion. (cherry picked from commit 381f17b) Signed-off-by: Michael Armbrust <michael@databricks.com>
This PR is to add three more data types into Encoder, including `BigDecimal`, `Date` and `Timestamp`. marmbrus cloud-fan rxin Could you take a quick look at these three types? Not sure if it can be merged to 1.6. Thank you very much! Author: gatorsmile <gatorsmile@gmail.com> Closes apache#10188 from gatorsmile/dataTypesinEncoder. (cherry picked from commit c0b13d5) Signed-off-by: Michael Armbrust <michael@databricks.com>
… APIs This PR contains the following updates: - Created a new private variable `boundTEncoder` that can be shared by multiple functions, `RDD`, `select` and `collect`. - Replaced all the `queryExecution.analyzed` by the function call `logicalPlan` - A few API comments are using wrong class names (e.g., `DataFrame`) or parameter names (e.g., `n`) - A few API descriptions are wrong. (e.g., `mapPartitions`) marmbrus rxin cloud-fan Could you take a look and check if they are appropriate? Thank you! Author: gatorsmile <gatorsmile@gmail.com> Closes apache#10184 from gatorsmile/datasetClean. (cherry picked from commit 5d96a71) Signed-off-by: Michael Armbrust <michael@databricks.com>
jira: https://issues.apache.org/jira/browse/SPARK-10393 Since the logic of the text processing part has been moved to ML estimators/transformers, replace the related code in LDA Example with the ML pipeline. Author: Yuhao Yang <hhbyyh@gmail.com> Author: yuhaoyang <yuhao@zhanglipings-iMac.local> Closes apache#8551 from hhbyyh/ldaExUpdate. (cherry picked from commit 872a2ee) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…unction Delays application of ResolvePivot until all aggregates are resolved to prevent problems with UnresolvedFunction and adds unit test Author: Andrew Ray <ray.andrew@gmail.com> Closes apache#10202 from aray/sql-pivot-unresolved-function. (cherry picked from commit 4bcb894) Signed-off-by: Yin Huai <yhuai@databricks.com>
jira: https://issues.apache.org/jira/browse/SPARK-11605 Check Java compatibility for MLlib for this release. fix: 1. `StreamingTest.registerStream` needs java friendly interface. 2. `GradientBoostedTreesModel.computeInitialPredictionAndError` and `GradientBoostedTreesModel.updatePredictionError` has java compatibility issue. Mark them as `developerAPI`. TBD: [updated] no fix for now per discussion. `org.apache.spark.mllib.classification.LogisticRegressionModel` `public scala.Option<java.lang.Object> getThreshold();` has wrong return type for Java invocation. `SVMModel` has the similar issue. Yet adding a `scala.Option<java.util.Double> getThreshold()` would result in an overloading error due to the same function signature. And adding a new function with different name seems to be not necessary. cc jkbradley feynmanliang Author: Yuhao Yang <hhbyyh@gmail.com> Closes apache#10102 from hhbyyh/javaAPI. (cherry picked from commit 5cb4695) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Documentation regarding the `IndexToString` label transformer with code snippets in Scala/Java/Python. Author: BenFradet <benjamin.fradet@gmail.com> Closes apache#10166 from BenFradet/SPARK-12159. (cherry picked from commit 06746b3) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
This patch tightens them to `private[memory]`. Author: Andrew Or <andrew@databricks.com> Closes apache#10182 from andrewor14/memory-visibility. (cherry picked from commit 9494521) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Author: Michael Armbrust <michael@databricks.com> Closes apache#10060 from marmbrus/docs. (cherry picked from commit 3959489) Signed-off-by: Michael Armbrust <michael@databricks.com>
This PR moves pieces of the spark.ml user guide to reflect suggestions in SPARK-8517. It does not introduce new content, as requested. <img width="192" alt="screen shot 2015-12-08 at 11 36 00 am" src="https://cloud.githubusercontent.com/assets/7594753/11666166/e82b84f2-9d9f-11e5-8904-e215424d8444.png"> Author: Timothy Hunter <timhunter@databricks.com> Closes apache#10207 from thunterdb/spark-8517. (cherry picked from commit 765c67f) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…columns in RegressionEvaluator felixcheung , mengxr Just added a message to require() Author: Dominik Dahlem <dominik.dahlem@gmail.combination> Closes apache#9598 from dahlem/ddahlem_regression_evaluator_double_predictions_message_04112015. (cherry picked from commit a0046e3) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…throw Buffer underflow exception Jira: https://issues.apache.org/jira/browse/SPARK-12222 Deserialize RoaringBitmap using Kryo serializer throw Buffer underflow exception: ``` com.esotericsoftware.kryo.KryoException: Buffer underflow. at com.esotericsoftware.kryo.io.Input.require(Input.java:156) at com.esotericsoftware.kryo.io.Input.skip(Input.java:131) at com.esotericsoftware.kryo.io.Input.skip(Input.java:264) ``` This is caused by a bug of kryo's `Input.skip(long count)`(EsotericSoftware/kryo#119) and we call this method in `KryoInputDataInputBridge`. Instead of upgrade kryo's version, this pr bypass the kryo's `Input.skip(long count)` by directly call another `skip` method in kryo's Input.java(https://github.com/EsotericSoftware/kryo/blob/kryo-2.21/src/com/esotericsoftware/kryo/io/Input.java#L124), i.e. write the bug-fixed version of `Input.skip(long count)` in KryoInputDataInputBridge's `skipBytes` method. more detail link to apache#9748 (comment) Author: Fei Wang <wangfei1@huawei.com> Closes apache#10213 from scwf/patch-1. (cherry picked from commit 3934562) Signed-off-by: Davies Liu <davies.liu@gmail.com>
Author: uncleGen <hustyugm@gmail.com> Closes apache#10023 from uncleGen/1.6-bugfix. (cherry picked from commit a113216) Signed-off-by: Sean Owen <sowen@cloudera.com>
Currently word2vec has the window hard coded at 5, some users may want different sizes (for example if using on n-gram input or similar). User request comes from http://stackoverflow.com/questions/32231975/spark-word2vec-window-size . Author: Holden Karau <holden@us.ibm.com> Author: Holden Karau <holden@pigscanfly.ca> Closes apache#8513 from holdenk/SPARK-10299-word2vec-should-allow-users-to-specify-the-window-size. (cherry picked from commit 22b9a87) Signed-off-by: Sean Owen <sowen@cloudera.com>
markhamstra
added a commit
that referenced
this pull request
Dec 9, 2015
SKIPME merged Apache branch-1.6
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.