Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sr/new master #414

Merged
merged 218 commits into from
Oct 5, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
218 commits
Select commit Hold shift + click to select a range
3d6b68b
[SPARK-25313][SQL] Fix regression in FileFormatWriter output names
gengliangwang Sep 6, 2018
0a5a49a
[SPARK-25337][SQL][TEST] runSparkSubmit` should provide non-testing mode
dongjoon-hyun Sep 6, 2018
d749d03
[SPARK-25252][SQL] Support arrays of any types by to_json
MaxGekk Sep 6, 2018
64c314e
[SPARK-25317][CORE] Avoid perf regression in Murmur3 Hash on UTF8String
mgaido91 Sep 6, 2018
f5817d8
[SPARK-25108][SQL] Fix the show method to display the wide character …
xuejianbest Sep 6, 2018
7ef6d1d
[SPARK-25328][PYTHON] Add an example for having two columns as the gr…
HyukjinKwon Sep 6, 2018
3b6591b
[SPARK-25268][GRAPHX] run Parallel Personalized PageRank throws seria…
shahidki31 Sep 6, 2018
c84bc40
[SPARK-25072][PYSPARK] Forbid extra value for custom Row
Sep 6, 2018
27d3b0a
[SPARK-25222][K8S] Improve container status logging
rvesse Sep 6, 2018
da6fa38
[SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be tmpfs backed on K8S
rvesse Sep 6, 2018
1b1711e
[SPARK-25208][SQL][FOLLOW-UP] Reduce code size.
ueshin Sep 7, 2018
b0ada7d
[SPARK-25330][BUILD][BRANCH-2.3] Revert Hadoop 2.7 to 2.7.3
wangyum Sep 7, 2018
4e3365b
[SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles ignore minPart…
srowen Sep 7, 2018
ed249db
[SPARK-25237][SQL] Remove updateBytesReadWithFileSize in FileScanRDD
Sep 7, 2018
6d7bc5a
[SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation in the test c…
dilipbiswal Sep 7, 2018
f96a8bf
[SPARK-12321][SQL][FOLLOW-UP] Add tests for fromString
gatorsmile Sep 7, 2018
473f2fb
[SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test for CTAS
fjh100456 Sep 7, 2018
22a46ca
[SPARK-25270] lint-python: Add flake8 to find syntax errors and undef…
Sep 7, 2018
458f501
[MINOR][SS] Fix kafka-0-10-sql trivials
dongjinleekr Sep 7, 2018
9241e1e
[SPARK-23429][CORE] Add executor memory metrics to heartbeat and expo…
edwinalu Sep 7, 2018
01c3dfa
[MINOR][SQL] Add a debug log when a SQL text is used for a view
HyukjinKwon Sep 8, 2018
08c02e6
[SPARK-25345][ML] Deprecate public APIs from ImageSchema
WeichenXu123 Sep 8, 2018
26f74b7
[SPARK-25375][SQL][TEST] Reenable qualified perm. function checks in …
dongjoon-hyun Sep 8, 2018
78981ef
[SPARK-20636] Add new optimization rule to transpose adjacent Window …
ptkool Sep 8, 2018
1cfda44
[SPARK-25021][K8S] Add spark.executor.pyspark.memory limit for K8S
ifilonenko Sep 9, 2018
0b9ccd5
Revert [SPARK-10399] [SPARK-23879] [SPARK-23762] [SPARK-25317]
gatorsmile Sep 9, 2018
88a930d
[MINOR][ML] Remove `BisectingKMeansModel.setDistanceMeasure` method
WeichenXu123 Sep 9, 2018
77c9964
[SPARK-25368][SQL] Incorrect predicate pushdown returns wrong result
wangyum Sep 9, 2018
a0aed47
[SPARK-25175][SQL] Field resolution should fail if there is ambiguity…
seancxmao Sep 10, 2018
f8b4d5a
[SPARK-25313][SQL][FOLLOW-UP] Fix InsertIntoHiveDirCommand output sch…
wangyum Sep 10, 2018
e7853dc
[SPARK-24999][SQL] Reduce unnecessary 'new' memory operations
heary-cao Sep 10, 2018
6f65178
[SPARK-24849][SPARK-24911][SQL][FOLLOW-UP] Converting a value of Stru…
gatorsmile Sep 10, 2018
12e3e9f
[SPARK-25278][SQL] Avoid duplicated Exec nodes when the same logical …
mgaido91 Sep 10, 2018
da5685b
[SPARK-23672][PYTHON] Document support for nested return types in sca…
holdenk Sep 10, 2018
0736e72
[SPARK-25371][SQL] struct() should allow being called with 0 args
mgaido91 Sep 11, 2018
0e680dc
[SPARK-25278][SQL][FOLLOWUP] remove the hack in ProgressReporter
cloud-fan Sep 11, 2018
c9cb393
[SPARK-17916][SPARK-25241][SQL][FOLLOW-UP] Fix empty string being par…
mmolimar Sep 11, 2018
77579aa
[SPARK-25389][SQL] INSERT OVERWRITE DIRECTORY STORED AS should preven…
dongjoon-hyun Sep 11, 2018
bcb9a8c
[SPARK-25221][DEPLOY] Consistent trailing whitespace treatment of con…
gerashegalov Sep 11, 2018
14f3ad2
[SPARK-24889][CORE] Update block info when unpersist rdds
viirya Sep 11, 2018
9d9601a
[INFRA] Close stale PRs.
Sep 11, 2018
cfbdd6a
[SPARK-25398] Minor bugs from comparing unrelated types
srowen Sep 11, 2018
97d4afa
Revert "[SPARK-23820][CORE] Enable use of long form of callsite in logs"
srowen Sep 11, 2018
9f5c5b4
[SPARK-25399][SS] Continuous processing state should not affect micro…
mukulmurthy Sep 11, 2018
79cc597
[SPARK-25402][SQL] Null handling in BooleanSimplification
gatorsmile Sep 12, 2018
2f42239
[SPARK-25352][SQL] Perform ordered global limit when limit number is …
viirya Sep 12, 2018
3030b82
[SPARK-25363][SQL] Fix schema pruning in where clause by ignoring unn…
viirya Sep 12, 2018
ab25c96
[SPARK-23820][CORE] Enable use of long form of callsite in logs
michaelmior Sep 13, 2018
083c944
[SPARK-25387][SQL] Fix for NPE caused by bad CSV input
MaxGekk Sep 13, 2018
6dc5921
[SPARK-25357][SQL] Add metadata to SparkPlanInfo to dump more informa…
LantaoJin Sep 13, 2018
08c76b5
[SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4
srowen Sep 13, 2018
8b702e1
[SPARK-25415][SQL] Make plan change log in RuleExecutor configurable …
maryannxue Sep 13, 2018
3e75a9f
[SPARK-25295][K8S] Fix executor names collision
Sep 13, 2018
5b761c5
[SPARK-25352][SQL][FOLLOWUP] Add helper method and address style issue
viirya Sep 13, 2018
45c4ebc
[SPARK-25170][DOC] Add list and short description of Spark Executor T…
LucaCanali Sep 13, 2018
a7e5aa6
[SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, move calls to…
mallman Sep 13, 2018
f60cd7c
[SPARK-25338][TEST] Ensure to call super.beforeAll() and super.afterA…
kiszk Sep 13, 2018
9deddbb
[SPARK-25400][CORE][TEST] Increase test timeouts
squito Sep 13, 2018
a81ef9e
[SPARK-25418][SQL] The metadata of DataSource table should not includ…
ueshin Sep 14, 2018
9c25d7f
[SPARK-25431][SQL][EXAMPLES] Fix function examples and unify the form…
ueshin Sep 14, 2018
9bb798f
[SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle to v2.4.0
Sep 15, 2018
be454a7
Revert "[SPARK-25431][SQL][EXAMPLES] Fix function examples and unify …
ueshin Sep 15, 2018
5ebef33
[SPARK-25426][SQL] Remove the duplicate fallback logic in UnsafeProje…
maropu Sep 15, 2018
bb2f069
[SPARK-25436] Bump master branch version to 2.5.0-SNAPSHOT
gatorsmile Sep 15, 2018
e06da95
[SPARK-25425][SQL] Extra options should override session options in D…
MaxGekk Sep 16, 2018
fefaa3c
[SPARK-25438][SQL][TEST] Fix FilterPushdownBenchmark to use the same …
dongjoon-hyun Sep 16, 2018
02c2963
[SPARK-25439][TESTS][SQL] Fixes TPCHQuerySuite datatype of customer.c…
npoggi Sep 16, 2018
bfcf742
[SPARK-24418][FOLLOWUP][DOC] Update docs to show Scala 2.11.12
dongjoon-hyun Sep 16, 2018
a1dd782
[MINOR][DOCS] Axe deprecated doc refs
Sep 16, 2018
538e047
[SPARK-22713][CORE][TEST][FOLLOWUP] Fix flaky ExternalAppendOnlyMapSu…
dongjoon-hyun Sep 17, 2018
b66e14d
[SPARK-24685][BUILD][FOLLOWUP] Fix the nonexist profile name in relea…
jerryshao Sep 17, 2018
619c949
[SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS path for loadt…
sujith71955 Sep 17, 2018
0dd61ec
[SPARK-25427][SQL][TEST] Add BloomFilter creation test cases
dongjoon-hyun Sep 17, 2018
8cf6fd1
[SPARK-25431][SQL][EXAMPLES] Fix function examples and the example re…
ueshin Sep 17, 2018
30aa37f
[SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and NOTICE, and sp…
srowen Sep 17, 2018
4b9542e
[SPARK-25423][SQL] Output "dataFilters" in DataSourceScanExec.metadata
wangyum Sep 17, 2018
553af22
[SPARK-16323][SQL] Add IntegralDivide expression
mgaido91 Sep 17, 2018
58419b9
[PYSPARK] Updates to pyspark broadcast
squito Aug 14, 2018
8f5a5a9
[PYSPARK][SQL] Updates to RowQueue
squito Sep 6, 2018
a97001d
[CORE] Updates to remote cache reads
squito Aug 22, 2018
0f1413e
[SPARK-25443][BUILD] fix issues when building docs with release scrip…
cloud-fan Sep 18, 2018
acc6452
[SPARK-25444][SQL] Refactor GenArrayData.genCodeToCreateArrayData method
kiszk Sep 18, 2018
ba838fe
[SPARK-24151][SQL] Case insensitive resolution of CURRENT_DATE and CU…
jamesthomp Sep 18, 2018
1c0423b
[SPARK-25445][BUILD] the release script should be able to publish a s…
cloud-fan Sep 18, 2018
182da81
[SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JDK8
wangyum Sep 18, 2018
123f004
[SPARK-25291][K8S] Fixing Flakiness of Executor Pod tests
ifilonenko Sep 18, 2018
a6f37b0
[SPARK-25456][SQL][TEST] Fix PythonForeachWriterSuite
squito Sep 18, 2018
497f00f
[SPARK-23200] Reset Kubernetes-specific config on Checkpoint restore
ssaavedra Sep 19, 2018
6c7db7f
[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema
rxin Sep 19, 2018
4193c76
[SPARK-24626] Add statistics prefix to parallelFileListingInStatsComp…
rxin Sep 19, 2018
5534a3a
[SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for…
gengliangwang Sep 19, 2018
12b1e91
[SPARK-25358][SQL] MutableProjection supports fallback to an interpre…
maropu Sep 19, 2018
a71f6a1
[SPARK-25414][SS][TEST] make it clear that the numRows metrics should…
cloud-fan Sep 19, 2018
cb1b55c
Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema"
dongjoon-hyun Sep 19, 2018
6f681d4
[SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerate different s…
WeichenXu123 Sep 19, 2018
90e3955
[SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Pyt…
BryanCutler Sep 20, 2018
936c920
[SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicro…
rxin Sep 20, 2018
8aae49a
[SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries be…
mgaido91 Sep 20, 2018
47d6e80
[SPARK-25457][SQL] IntegralDivide returns data type of the operands
mgaido91 Sep 20, 2018
76399d7
[SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.e…
rxin Sep 20, 2018
95b177c
[SPARK-23648][R][SQL] Adds more types for hint in SparkR
huaxingao Sep 20, 2018
0e31a6f
[SPARK-25339][TEST] Refactor FilterPushdownBenchmark
wangyum Sep 20, 2018
7ff5386
[MINOR][PYTHON][TEST] Use collect() instead of show() to make the out…
HyukjinKwon Sep 20, 2018
89671a2
Revert [SPARK-19355][SPARK-25352]
viirya Sep 20, 2018
edf5cc6
[SPARK-25460][SS] DataSourceV2: SS sources do not respect SessionConf…
HyukjinKwon Sep 20, 2018
67f2c6a
[SPARK-25417][SQL] ArrayContains function may return incorrect result…
dilipbiswal Sep 20, 2018
88e7e87
[MINOR][PYTHON] Use a helper in `PythonUtils` instead of direct acces…
HyukjinKwon Sep 20, 2018
88446b6
[SPARK-25450][SQL] PushProjectThroughUnion rule uses the same exprId …
maryannxue Sep 20, 2018
a86f841
[SPARK-25381][SQL] Stratified sampling by Column argument
MaxGekk Sep 20, 2018
2f51e72
[SPARK-24918][CORE] Executor Plugin API
NiharS Sep 20, 2018
4d114fc
[SPARK-25366][SQL] Zstd and brotli CompressionCodec are not supported…
10110346 Sep 20, 2018
77e5244
[SPARK-25472][SS] Don't have legitimate stops of streams cause stream…
brkyvz Sep 20, 2018
950ab79
[SPARK-24777][SQL] Add write benchmark for AVRO
gengliangwang Sep 21, 2018
5d25e15
Revert "[SPARK-23715][SQL] the input of to/from_utc_timestamp can not…
gatorsmile Sep 21, 2018
596af21
[SPARK-25494][SQL] Upgrade Spark's use of Janino to 3.0.10
rednaxelafx Sep 21, 2018
1f4ca6f
[SPARK-25487][SQL][TEST] Refactor PrimitiveArrayBenchmark
seancxmao Sep 21, 2018
fb3276a
[SPARK-25384][SQL] Clarify fromJsonForceNullableSchema will be remove…
rxin Sep 21, 2018
411ecc3
[SPARK-23549][SQL] Rename config spark.sql.legacy.compareDateTimestam…
rxin Sep 21, 2018
2c9d8f5
[SPARK-25469][SQL] Eval methods of Concat, Reverse and ElementAt shou…
mn-mikke Sep 21, 2018
ff601cf
[SPARK-24355] Spark external shuffle server improvement to better han…
Sep 21, 2018
d25f425
[SPARK-25499][TEST] Refactor BenchmarkBase and Benchmark
gengliangwang Sep 21, 2018
4a11209
[SPARK-19724][SQL] allowCreatingManagedTableUsingNonemptyLocation sho…
rxin Sep 21, 2018
40edab2
[SPARK-25321][ML] Fix local LDA model constructor
WeichenXu123 Sep 21, 2018
6ca87eb
[SPARK-25465][TEST] Refactor Parquet test suites in project Hive
gengliangwang Sep 22, 2018
0fbba76
[MINOR][PYSPARK] Always Close the tempFile in _serialize_to_jvm
gatorsmile Sep 23, 2018
a72d118
[SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests failed on Python …
HyukjinKwon Sep 23, 2018
9bf04d8
[SPARK-25489][ML][TEST] Refactor UDTSerializationBenchmark
seancxmao Sep 23, 2018
d522a56
[SPARK-25415][SQL][FOLLOW-UP] Add Locale.ROOT when toUpperCase
wangyum Sep 24, 2018
c79072a
[SPARK-25478][SQL][TEST] Refactor CompressionSchemeBenchmark to use m…
wangyum Sep 24, 2018
c3b4a94
[SPARKR] Match pyspark features in SparkR communication protocol
HyukjinKwon Sep 24, 2018
804515f
[SPARK-21318][SQL] Improve exception message thrown by `lookupFunction`
stanzhai Sep 24, 2018
bb49661
[SPARK-25416][SQL] ArrayPosition function may return incorrect result…
dilipbiswal Sep 24, 2018
3ce2e00
[SPARK-25502][CORE][WEBUI] Empty Page when page number exceeds the re…
shahidki31 Sep 24, 2018
2c9ffda
[BUILD] Closes stale PR
HyukjinKwon Sep 24, 2018
615792d
[SPARK-25503][CORE][WEBUI] Total task message in stage page is ambiguous
shahidki31 Sep 25, 2018
7d8f5b6
[SPARK-25519][SQL] ArrayRemove function may return incorrect result w…
dilipbiswal Sep 25, 2018
9cbd001
[SPARK-23907][SQL] Revert regr_* functions entirely
rxin Sep 25, 2018
04db035
[SPARK-25486][TEST] Refactor SortBenchmark to use main method
yucai Sep 25, 2018
66d2987
[SPARK-25495][SS] FetchedData.reset should reset all fields
zsxwing Sep 25, 2018
9bb3a0c
[SPARK-25422][CORE] Don't memory map blocks streamed to disk.
squito Sep 26, 2018
8c2edf4
[SPARK-24324][PYTHON][FOLLOW-UP] Rename the Conf to spark.sql.legacy.…
gatorsmile Sep 26, 2018
cb77a66
[SPARK-21291][R] add R partitionBy API in DataFrame
huaxingao Sep 26, 2018
473d0d8
[SPARK-25514][SQL] Generating pretty JSON by to_json
MaxGekk Sep 26, 2018
81cbcca
[SPARK-25534][SQL] Make `SQLHelper` trait
dongjoon-hyun Sep 26, 2018
b39e228
[SPARK-25541][SQL] CaseInsensitiveMap should be serializable after '-…
gengliangwang Sep 26, 2018
44a7174
[SPARK-25379][SQL] Improve AttributeSet and ColumnPruning performance
mgaido91 Sep 26, 2018
cf5c9c4
[SPARK-20937][DOCS] Describe spark.sql.parquet.writeLegacyFormat prop…
seancxmao Sep 26, 2018
a2ac5a7
[SPARK-25509][CORE] Windows doesn't support POSIX permissions
Sep 26, 2018
bd2ae85
[SPARK-25318] Add exception handling when wrapping the input stream d…
Sep 26, 2018
e702fb1
[SPARK-24519][CORE] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS …
rxin Sep 26, 2018
5ee2166
[SPARK-25533][CORE][WEBUI] AppSummary should hold the information abo…
shahidki31 Sep 26, 2018
51540c2
[SPARK-25372][YARN][K8S] Deprecate and generalize keytab / principal …
ifilonenko Sep 27, 2018
d0990e3
[SPARK-25454][SQL] add a new config for picking minimum precision for…
cloud-fan Sep 27, 2018
c3c45cb
[SPARK-25540][SQL][PYSPARK] Make HiveContext in PySpark behave as the…
ueshin Sep 27, 2018
9063b17
[SPARK-25481][SQL][TEST] Refactor ColumnarBatchBenchmark to use main …
yucai Sep 27, 2018
5def10e
[SPARK-25536][CORE] metric value for METRIC_OUTPUT_RECORDS_WRITTEN is…
shahidki31 Sep 27, 2018
ee214ef
[SPARK-25525][SQL][PYSPARK] Do not update conf for existing SparkCont…
ueshin Sep 27, 2018
8b72799
[SPARK-25468][WEBUI] Highlight current page index in the spark UI
Sep 27, 2018
f309b28
[SPARK-25485][SQL][TEST] Refactor UnsafeProjectionBenchmark to use ma…
yucai Sep 27, 2018
ff87613
[SPARK-23715][SQL][DOC] improve document for from/to_utc_timestamp
cloud-fan Sep 27, 2018
d03e0af
[SPARK-25522][SQL] Improve type promotion for input arguments of elem…
dilipbiswal Sep 27, 2018
2a8cbfd
[SPARK-25314][SQL] Fix Python UDF accessing attributes from both side…
xuanyuanking Sep 27, 2018
86a2450
[SPARK-25551][SQL] Remove unused InSubquery expression
mgaido91 Sep 27, 2018
dd8f6b1
[SPARK-25541][SQL][FOLLOWUP] Remove overriding filterKeys in CaseInse…
gengliangwang Sep 27, 2018
f856fe4
[SPARK-21436][CORE] Take advantage of known partitioner for distinct …
holdenk Sep 27, 2018
a1adde5
[SPARK-24341][SQL][FOLLOWUP] remove duplicated error checking
cloud-fan Sep 27, 2018
5fd22d0
[SPARK-25546][CORE] Don't cache value of EVENT_LOG_CALLSITE_LONG_FORM.
Sep 27, 2018
3b7395f
[SPARK-25459][SQL] Add viewOriginalText back to CatalogTable
Sep 28, 2018
e120a38
[SPARK-25505][SQL] The output order of grouping columns in Pivot is d…
maryannxue Sep 28, 2018
0b33f08
[SPARK-23285][DOC][FOLLOWUP] Fix missing markup tag
dongjoon-hyun Sep 28, 2018
b7d8034
[SPARK-25542][CORE][TEST] Move flaky test in OpenHashMapSuite to Open…
viirya Sep 28, 2018
7deef7a
[SPARK-25458][SQL] Support FOR ALL COLUMNS in ANALYZE TABLE
dilipbiswal Sep 28, 2018
a281465
[SPARK-25429][SQL] Use Set instead of Array to improve lookup perform…
wangyum Sep 28, 2018
9362c5c
[SPARK-25449][CORE] Heartbeat shouldn't include accumulators for zero…
mukulmurthy Sep 28, 2018
5d726b8
[SPARK-25559][SQL] Remove the unsupported predicates in Parquet when …
dbtsai Sep 29, 2018
e99ba8d
[SPARK-25262][DOC][FOLLOWUP] Fix missing markup tag
dongjoon-hyun Sep 29, 2018
1e43783
[SPARK-25570][SQL][TEST] Replace 2.3.1 with 2.3.2 in HiveExternalCata…
dongjoon-hyun Sep 29, 2018
1007cae
[SPARK-25447][SQL] Support JSON options by schema_of_json()
MaxGekk Sep 29, 2018
dcb9a97
[SPARK-25262][DOC][FOLLOWUP] Fix link tags in html table
viirya Sep 29, 2018
623c2ec
[SPARK-25048][SQL] Pivoting by multiple columns in Scala/Java
MaxGekk Sep 29, 2018
f246813
[SPARK-25508][SQL][TEST] Refactor OrcReadBenchmark to use main method
yucai Sep 29, 2018
f4b1380
[SPARK-25572][SPARKR] test only if not cran
felixcheung Sep 29, 2018
b6b8a66
[SPARK-25568][CORE] Continue to update the remaining accumulators whe…
zsxwing Sep 30, 2018
a2f502c
[SPARK-25565][BUILD] Add scalastyle rule to check add Locale.ROOT to …
HyukjinKwon Sep 30, 2018
40e6ed8
[CORE][MINOR] Fix obvious error and compiling for Scala 2.12.7
da-liii Sep 30, 2018
4da541a
[SPARK-25543][K8S] Print debug message iff execIdsRemovedInThisRound …
ScrapCodes Sep 30, 2018
fb8f4c0
[SPARK-25505][SQL][FOLLOWUP] Fix for attributes cosmetically differen…
mgaido91 Oct 1, 2018
21f0b73
[SPARK-25453][SQL][TEST][.FFFFFFFFF] OracleIntegrationSuite IllegalAr…
seancxmao Oct 1, 2018
30f5d0f
[SPARK-23401][PYTHON][TESTS] Add more data types for PandasUDFTests
alex7c4 Oct 1, 2018
b96fd44
[SPARK-25476][SPARK-25510][TEST] Refactor AggregateBenchmark and add …
wangyum Oct 1, 2018
a802c69
[SPARK-18364][YARN] Expose metrics for YarnShuffleService
mareksimunek Oct 1, 2018
3422fc0
[SPARK-25575][WEBUI][SQL] SQL tab in the spark UI support hide tables…
shahidki31 Oct 1, 2018
5114db5
[SPARK-25578][BUILD] Update to Scala 2.12.7
srowen Oct 2, 2018
7187663
[SPARK-25583][DOC] Add history-server related configuration in the do…
shahidki31 Oct 2, 2018
9bf397c
[SPARK-25592] Setting version to 3.0.0-SNAPSHOT
gatorsmile Oct 2, 2018
7b4e94f
[SPARK-25581][SQL] Rename method `benchmark` as `runBenchmarkSuite` i…
gengliangwang Oct 2, 2018
d6be46e
[SPARK-24530][FOLLOWUP] run Sphinx with python 3 in docker
cloud-fan Oct 2, 2018
928d073
[SPARK-25595] Ignore corrupt Avro files if flag IGNORE_CORRUPT_FILES …
gengliangwang Oct 3, 2018
1a5d83b
[SPARK-25589][SQL][TEST] Add BloomFilterBenchmark
dongjoon-hyun Oct 3, 2018
56741c3
[SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to use main method
wangyum Oct 3, 2018
d7ae36a
[SPARK-25538][SQL] Zero-out all bytes when writing decimal
mgaido91 Oct 3, 2018
075dd62
[SPARK-25586][CORE] Remove outer objects from logdebug statements in …
ankuriitg Oct 3, 2018
6b87bfa
Merge branch 'master' of github.com:apache/spark into sr/new-master
sjrand Oct 3, 2018
79dd4c9
[SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs …
HyukjinKwon Oct 4, 2018
927e527
[SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs …
HyukjinKwon Oct 4, 2018
71c24aa
[SPARK-25602][SQL] SparkPlan.getByteArrayRdd should not consume the i…
cloud-fan Oct 4, 2018
95ae209
[SPARK-25479][TEST] Refactor DatasetBenchmark to use main method
wangyum Oct 4, 2018
3ae4f07
[SPARK-17159][STREAM] Significant speed up for running spark streamin…
ScrapCodes Oct 5, 2018
85a9359
[SPARK-25609][TESTS] Reduce time of test for SPARK-22226
mgaido91 Oct 5, 2018
f27d96b
[SPARK-25606][TEST] Reduce DateExpressionsSuite test time costs in Je…
wangyum Oct 5, 2018
8113b9c
[SPARK-25605][TESTS] Run cast string to timestamp tests for a subset …
mgaido91 Oct 5, 2018
44c1e1a
[SPARK-25408] Move to mode ideomatic Java8
Oct 5, 2018
5ae20cf
Revert "[SPARK-25408] Move to mode ideomatic Java8"
cloud-fan Oct 5, 2018
4597007
[SPARK-25521][SQL] Job id showing null in the logs when insert into c…
sujith71955 Oct 5, 2018
20c66ef
resolve conflicts
sjrand Oct 4, 2018
ab1650d
[SPARK-24601] Update Jackson to 2.9.6
Oct 5, 2018
44e04a3
merge in master again to get jackson fix
sjrand Oct 5, 2018
14ff6fd
resolve conflict in pom after second master merge
sjrand Oct 5, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ jobs:

run-spark-docker-gradle-plugin-tests:
<<: *test-defaults
resource_class: small
resource_class: medium
steps:
- *checkout-code
- setup_remote_docker
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: SparkR
Type: Package
Version: 2.4.0
Version: 3.0.0
Title: R Frontend for Apache Spark
Description: Provides an R Frontend for Apache Spark.
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
Expand Down
30 changes: 26 additions & 4 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -503,7 +503,6 @@ setMethod("createOrReplaceTempView",
#' @param x A SparkDataFrame
#' @param tableName A character vector containing the name of the table
#'
#' @family SparkDataFrame functions
#' @seealso \link{createOrReplaceTempView}
#' @rdname registerTempTable-deprecated
#' @name registerTempTable
Expand Down Expand Up @@ -2955,6 +2954,9 @@ setMethod("exceptAll",
#' @param source a name for external data source.
#' @param mode one of 'append', 'overwrite', 'error', 'errorifexists', 'ignore'
#' save mode (it is 'error' by default)
#' @param partitionBy a name or a list of names of columns to partition the output by on the file
#' system. If specified, the output is laid out on the file system similar
#' to Hive's partitioning scheme.
#' @param ... additional argument(s) passed to the method.
#'
#' @family SparkDataFrame functions
Expand All @@ -2966,13 +2968,13 @@ setMethod("exceptAll",
#' sparkR.session()
#' path <- "path/to/file.json"
#' df <- read.json(path)
#' write.df(df, "myfile", "parquet", "overwrite")
#' write.df(df, "myfile", "parquet", "overwrite", partitionBy = c("col1", "col2"))
#' saveDF(df, parquetPath2, "parquet", mode = "append", mergeSchema = TRUE)
#' }
#' @note write.df since 1.4.0
setMethod("write.df",
signature(df = "SparkDataFrame"),
function(df, path = NULL, source = NULL, mode = "error", ...) {
function(df, path = NULL, source = NULL, mode = "error", partitionBy = NULL, ...) {
if (!is.null(path) && !is.character(path)) {
stop("path should be character, NULL or omitted.")
}
Expand All @@ -2986,8 +2988,18 @@ setMethod("write.df",
if (is.null(source)) {
source <- getDefaultSqlSource()
}
cols <- NULL
if (!is.null(partitionBy)) {
if (!all(sapply(partitionBy, function(c) is.character(c)))) {
stop("All partitionBy column names should be characters.")
}
cols <- as.list(partitionBy)
}
write <- callJMethod(df@sdf, "write")
write <- callJMethod(write, "format", source)
if (!is.null(cols)) {
write <- callJMethod(write, "partitionBy", cols)
}
write <- setWriteOptions(write, path = path, mode = mode, ...)
write <- handledCallJMethod(write, "save")
})
Expand Down Expand Up @@ -3986,7 +3998,17 @@ setMethod("hint",
signature(x = "SparkDataFrame", name = "character"),
function(x, name, ...) {
parameters <- list(...)
stopifnot(all(sapply(parameters, is.character)))
if (!all(sapply(parameters, function(y) {
if (is.character(y) || is.numeric(y)) {
TRUE
} else if (is.list(y)) {
all(sapply(y, function(z) { is.character(z) || is.numeric(z) }))
} else {
FALSE
}
}))) {
stop("sql hint should be character, numeric, or list with character or numeric.")
}
jdf <- callJMethod(x@sdf, "hint", name, parameters)
dataFrame(jdf)
})
Expand Down
1 change: 0 additions & 1 deletion R/pkg/R/catalog.R
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,6 @@ createExternalTable <- function(x, ...) {
#' @param ... additional named parameters as options for the data source.
#' @return A SparkDataFrame.
#' @rdname createTable
#' @seealso \link{createExternalTable}
#' @examples
#'\dontrun{
#' sparkR.session()
Expand Down
43 changes: 31 additions & 12 deletions R/pkg/R/context.R
Original file line number Diff line number Diff line change
Expand Up @@ -167,18 +167,30 @@ parallelize <- function(sc, coll, numSlices = 1) {
# 2-tuples of raws
serializedSlices <- lapply(slices, serialize, connection = NULL)

# The PRC backend cannot handle arguments larger than 2GB (INT_MAX)
# The RPC backend cannot handle arguments larger than 2GB (INT_MAX)
# If serialized data is safely less than that threshold we send it over the PRC channel.
# Otherwise, we write it to a file and send the file name
if (objectSize < sizeLimit) {
jrdd <- callJStatic("org.apache.spark.api.r.RRDD", "createRDDFromArray", sc, serializedSlices)
} else {
fileName <- writeToTempFile(serializedSlices)
jrdd <- tryCatch(callJStatic(
"org.apache.spark.api.r.RRDD", "createRDDFromFile", sc, fileName, as.integer(numSlices)),
finally = {
file.remove(fileName)
})
if (callJStatic("org.apache.spark.api.r.RUtils", "getEncryptionEnabled", sc)) {
# the length of slices here is the parallelism to use in the jvm's sc.parallelize()
parallelism <- as.integer(numSlices)
jserver <- newJObject("org.apache.spark.api.r.RParallelizeServer", sc, parallelism)
authSecret <- callJMethod(jserver, "secret")
port <- callJMethod(jserver, "port")
conn <- socketConnection(port = port, blocking = TRUE, open = "wb", timeout = 1500)
doServerAuth(conn, authSecret)
writeToConnection(serializedSlices, conn)
jrdd <- callJMethod(jserver, "getResult")
} else {
fileName <- writeToTempFile(serializedSlices)
jrdd <- tryCatch(callJStatic(
"org.apache.spark.api.r.RRDD", "createRDDFromFile", sc, fileName, as.integer(numSlices)),
finally = {
file.remove(fileName)
})
}
}

RDD(jrdd, "byte")
Expand All @@ -194,14 +206,21 @@ getMaxAllocationLimit <- function(sc) {
))
}

writeToConnection <- function(serializedSlices, conn) {
tryCatch({
for (slice in serializedSlices) {
writeBin(as.integer(length(slice)), conn, endian = "big")
writeBin(slice, conn, endian = "big")
}
}, finally = {
close(conn)
})
}

writeToTempFile <- function(serializedSlices) {
fileName <- tempfile()
conn <- file(fileName, "wb")
for (slice in serializedSlices) {
writeBin(as.integer(length(slice)), conn, endian = "big")
writeBin(slice, conn, endian = "big")
}
close(conn)
writeToConnection(serializedSlices, conn)
fileName
}

Expand Down
35 changes: 25 additions & 10 deletions R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -198,8 +198,9 @@ NULL
#' }
#' @param ... additional argument(s). In \code{to_json} and \code{from_json}, this contains
#' additional named properties to control how it is converted, accepts the same
#' options as the JSON data source. In \code{arrays_zip}, this contains additional
#' Columns of arrays to be merged.
#' options as the JSON data source. Additionally \code{to_json} supports the "pretty"
#' option which enables pretty JSON generation. In \code{arrays_zip}, this contains
#' additional Columns of arrays to be merged.
#' @name column_collection_functions
#' @rdname column_collection_functions
#' @family collection functions
Expand Down Expand Up @@ -1699,8 +1700,8 @@ setMethod("to_date",
})

#' @details
#' \code{to_json}: Converts a column containing a \code{structType}, array of \code{structType},
#' a \code{mapType} or array of \code{mapType} into a Column of JSON string.
#' \code{to_json}: Converts a column containing a \code{structType}, a \code{mapType}
#' or an \code{arrayType} into a Column of JSON string.
#' Resolving the Column can fail if an unsupported type is encountered.
#'
#' @rdname column_collection_functions
Expand Down Expand Up @@ -2203,9 +2204,16 @@ setMethod("from_json", signature(x = "Column", schema = "characterOrstructType")
})

#' @details
#' \code{from_utc_timestamp}: Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a
#' time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1'
#' would yield '2017-07-14 03:40:00.0'.
#' \code{from_utc_timestamp}: This is a common function for databases supporting TIMESTAMP WITHOUT
#' TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a
#' timestamp in UTC, and renders that timestamp as a timestamp in the given time zone.
#' However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
#' timezone-agnostic. So in Spark this function just shift the timestamp value from UTC timezone to
#' the given timezone.
#' This function may return confusing result if the input is a string with timezone, e.g.
#' (\code{2018-03-13T06:18:23+00:00}). The reason is that, Spark firstly cast the string to
#' timestamp according to the timezone in the string, and finally display the result by converting
#' the timestamp to string according to the session local timezone.
#'
#' @rdname column_datetime_diff_functions
#'
Expand Down Expand Up @@ -2261,9 +2269,16 @@ setMethod("next_day", signature(y = "Column", x = "character"),
})

#' @details
#' \code{to_utc_timestamp}: Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a
#' time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1'
#' would yield '2017-07-14 01:40:00.0'.
#' \code{to_utc_timestamp}: This is a common function for databases supporting TIMESTAMP WITHOUT
#' TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a
#' timestamp in the given timezone, and renders that timestamp as a timestamp in UTC.
#' However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
#' timezone-agnostic. So in Spark this function just shift the timestamp value from the given
#' timezone to UTC timezone.
#' This function may return confusing result if the input is a string with timezone, e.g.
#' (\code{2018-03-13T06:18:23+00:00}). The reason is that, Spark firstly cast the string to
#' timestamp according to the timezone in the string, and finally display the result by converting
#' the timestamp to string according to the session local timezone.
#'
#' @rdname column_datetime_diff_functions
#' @aliases to_utc_timestamp to_utc_timestamp,Column,character-method
Expand Down
2 changes: 2 additions & 0 deletions R/pkg/R/sparkR.R
Original file line number Diff line number Diff line change
Expand Up @@ -626,6 +626,8 @@ sparkConfToSubmitOps[["spark.driver.extraLibraryPath"]] <- "--driver-library-pat
sparkConfToSubmitOps[["spark.master"]] <- "--master"
sparkConfToSubmitOps[["spark.yarn.keytab"]] <- "--keytab"
sparkConfToSubmitOps[["spark.yarn.principal"]] <- "--principal"
sparkConfToSubmitOps[["spark.kerberos.keytab"]] <- "--keytab"
sparkConfToSubmitOps[["spark.kerberos.principal"]] <- "--principal"


# Utility function that returns Spark Submit arguments as a string
Expand Down
32 changes: 32 additions & 0 deletions R/pkg/tests/fulltests/test_Serde.R
Original file line number Diff line number Diff line change
Expand Up @@ -124,3 +124,35 @@ test_that("SerDe of list of lists", {
})

sparkR.session.stop()

# Note that this test should be at the end of tests since the configruations used here are not
# specific to sessions, and the Spark context is restarted.
test_that("createDataFrame large objects", {
for (encryptionEnabled in list("true", "false")) {
# To simulate a large object scenario, we set spark.r.maxAllocationLimit to a smaller value
conf <- list(spark.r.maxAllocationLimit = "100",
spark.io.encryption.enabled = encryptionEnabled)

suppressWarnings(sparkR.session(master = sparkRTestMaster,
sparkConfig = conf,
enableHiveSupport = FALSE))

sc <- getSparkContext()
actual <- callJStatic("org.apache.spark.api.r.RUtils", "getEncryptionEnabled", sc)
expected <- as.logical(encryptionEnabled)
expect_equal(actual, expected)

tryCatch({
# suppress warnings from dot in the field names. See also SPARK-21536.
df <- suppressWarnings(createDataFrame(iris, numPartitions = 3))
expect_equal(getNumPartitions(df), 3)
expect_equal(dim(df), dim(iris))

df <- createDataFrame(cars, numPartitions = 3)
expect_equal(collect(df), cars)
},
finally = {
sparkR.stop()
})
}
})
38 changes: 26 additions & 12 deletions R/pkg/tests/fulltests/test_sparkSQL.R
Original file line number Diff line number Diff line change
Expand Up @@ -316,18 +316,6 @@ test_that("create DataFrame from RDD", {
unsetHiveContext()
})

test_that("createDataFrame uses files for large objects", {
# To simulate a large file scenario, we set spark.r.maxAllocationLimit to a smaller value
conf <- callJMethod(sparkSession, "conf")
callJMethod(conf, "set", "spark.r.maxAllocationLimit", "100")
df <- suppressWarnings(createDataFrame(iris, numPartitions = 3))
expect_equal(getNumPartitions(df), 3)

# Resetting the conf back to default value
callJMethod(conf, "set", "spark.r.maxAllocationLimit", toString(.Machine$integer.max / 10))
expect_equal(dim(df), dim(iris))
})

test_that("read/write csv as DataFrame", {
if (windows_with_hadoop()) {
csvPath <- tempfile(pattern = "sparkr-test", fileext = ".csv")
Expand Down Expand Up @@ -1686,6 +1674,15 @@ test_that("column functions", {
expect_true(any(apply(s, 1, function(x) { x[[1]]$age == 16 })))
}

# Test to_json() supports arrays of primitive types and arrays
df <- sql("SELECT array(19, 42, 70) as age")
j <- collect(select(df, alias(to_json(df$age), "json")))
expect_equal(j[order(j$json), ][1], "[19,42,70]")

df <- sql("SELECT array(array(1, 2), array(3, 4)) as matrix")
j <- collect(select(df, alias(to_json(df$matrix), "json")))
expect_equal(j[order(j$json), ][1], "[[1,2],[3,4]]")

# passing option
df <- as.DataFrame(list(list("col" = "{\"date\":\"21/10/2014\"}")))
schema2 <- structType(structField("date", "date"))
Expand Down Expand Up @@ -2410,6 +2407,15 @@ test_that("join(), crossJoin() and merge() on a DataFrame", {
expect_true(any(grepl("BroadcastHashJoin", execution_plan_broadcast)))
})

test_that("test hint", {
df <- sql("SELECT * FROM range(10e10)")
hintList <- list("hint2", "hint3", "hint4")
execution_plan_hint <- capture.output(
explain(hint(df, "hint1", 1.23456, "aaaaaaaaaa", hintList), TRUE)
)
expect_true(any(grepl("1.23456, aaaaaaaaaa", execution_plan_hint)))
})

test_that("toJSON() on DataFrame", {
df <- as.DataFrame(cars)
df_json <- toJSON(df)
Expand Down Expand Up @@ -2695,8 +2701,16 @@ test_that("read/write text files", {
expect_equal(colnames(df2), c("value"))
expect_equal(count(df2), count(df) * 2)

df3 <- createDataFrame(list(list(1L, "1"), list(2L, "2"), list(1L, "1"), list(2L, "2")),
schema = c("key", "value"))
textPath3 <- tempfile(pattern = "textPath3", fileext = ".txt")
write.df(df3, textPath3, "text", mode = "overwrite", partitionBy = "key")
df4 <- read.df(textPath3, "text")
expect_equal(count(df3), count(df4))

unlink(textPath)
unlink(textPath2)
unlink(textPath3)
})

test_that("read/write text files - compression option", {
Expand Down
Loading