Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State schema tws #12

Closed
wants to merge 925 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
925 commits
Select commit Hold shift + click to select a range
82b4ad2
[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.…
cloud-fan Jun 7, 2024
9491292
[SPARK-48548][BUILD] Add LICENSE/NOTICE for spark-core with shaded de…
yaooqinn Jun 7, 2024
b7d9c31
Revert "[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTable…
yaooqinn Jun 7, 2024
87b0f59
[SPARK-48561][PS][CONNECT] Throw `PandasNotImplementedError` for unsu…
zhengruifeng Jun 7, 2024
d81b1e3
[SPARK-48559][SQL] Fetch globalTempDatabase name directly without inv…
willwwt Jun 7, 2024
8911d59
[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.…
panbingkun Jun 7, 2024
201df0d
[MINOR][PYTHON][TESTS] Move a test out of parity tests
zhengruifeng Jun 7, 2024
24bce72
[SPARK-48012][SQL] SPJ: Support Transfrom Expressions for One Side Sh…
szehon-ho Jun 9, 2024
d9394ee
[SPARK-48560][SS][PYTHON] Make StreamingQueryListener.spark settable
HyukjinKwon Jun 9, 2024
1901669
[SPARK-48564][PYTHON][CONNECT] Propagate cached schema in set operations
zhengruifeng Jun 10, 2024
61fd936
[SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCAS…
uros-db Jun 10, 2024
3857a9d
[SPARK-48410][SQL] Fix InitCap expression for UTF8_BINARY_LCASE & ICU…
uros-db Jun 10, 2024
ec6db63
[SPARK-48569][SS][CONNECT] Handle edge cases in query.name
WweiL Jun 10, 2024
5a2f374
[SPARK-48544][SQL] Reduce memory pressure of empty TreeNode BitSets
n-young-db Jun 10, 2024
3fe6abd
[SPARK-48563][BUILD] Upgrade `pickle` to 1.5
LuciferYang Jun 11, 2024
1e4750e
[SPARK-47500][PYTHON][CONNECT][FOLLOWUP] Restore error message for `D…
zhengruifeng Jun 11, 2024
53d65fd
[SPARK-48565][UI] Fix thread dump display in UI
pan3793 Jun 11, 2024
452c1b6
[SPARK-48551][SQL] Perf improvement for escapePathName
yaooqinn Jun 11, 2024
df4156a
[SPARK-48372][SPARK-45716][PYTHON][FOLLOW-UP] Remove unused helper me…
zhengruifeng Jun 11, 2024
583ab05
[SPARK-47415][SQL] Add collation support for Levenshtein expression
uros-db Jun 11, 2024
224ba16
[SPARK-48556][SQL] Fix incorrect error message pointing to UNSUPPORTE…
nikolamand-db Jun 11, 2024
aad6771
[SPARK-48576][SQL] Rename UTF8_BINARY_LCASE to UTF8_LCASE
uros-db Jun 11, 2024
6107836
[SPARK-48576][SQL][FOLLOWUP] Rename UTF8_BINARY_LCASE to UTF8_LCASE
uros-db Jun 11, 2024
82a84ed
[SPARK-46937][SQL] Revert "[] Improve concurrency performance for Fun…
cloud-fan Jun 11, 2024
72df3cb
[SPARK-48582][BUILD] Upgrade `braces` from 3.0.2 to 3.0.3 in ui-test
LuciferYang Jun 12, 2024
334816a
[SPARK-48411][SS][PYTHON] Add E2E test for DropDuplicateWithinWatermark
eason-yuchen-liu Jun 12, 2024
8870efc
[SPARK-48581][BUILD] Upgrade dropwizard metrics to 4.2.26
wayneguow Jun 12, 2024
da81d8e
[SPARK-48584][SQL] Perf improvement for unescapePathName
yaooqinn Jun 12, 2024
a3625a9
[SPARK-48595][CORE] Cleanup deprecated api usage related to `commons-…
LuciferYang Jun 12, 2024
b5e1b79
[SPARK-48596][SQL] Perf improvement for calculating hex string for long
yaooqinn Jun 12, 2024
2d0b122
[SPARK-48594][PYTHON][CONNECT] Rename `parent` field to `child` in `C…
zhengruifeng Jun 12, 2024
d1d29c9
[SPARK-48598][PYTHON][CONNECT] Propagate cached schema in dataframe o…
zhengruifeng Jun 12, 2024
0bbd049
[SPARK-48591][PYTHON] Simplify the if-else branches with `F.lit`
zhengruifeng Jun 12, 2024
c059c84
[SPARK-48421][SQL] SPJ: Add documentation
szehon-ho Jun 12, 2024
3988548
[SPARK-48593][PYTHON][CONNECT] Fix the string representation of lambd…
zhengruifeng Jun 12, 2024
fd045c9
[SPARK-48583][SQL][TESTS] Replace deprecated classes and methods of `…
wayneguow Jun 13, 2024
ea2bca7
[SPARK-48602][SQL] Make csv generator support different output style …
yaooqinn Jun 13, 2024
78fd4e3
[SPARK-48584][SQL][FOLLOWUP] Improve the unescapePathName
beliefer Jun 13, 2024
b8c7aee
[SPARK-48609][BUILD] Upgrade `scala-xml` to 2.3.0
panbingkun Jun 13, 2024
bdcb79f
[SPARK-48543][SS] Track state row validation failures using explicit …
anishshri-db Jun 13, 2024
08e741b
[SPARK-48604][SQL] Replace deprecated `new ArrowType.Decimal(precisio…
wayneguow Jun 13, 2024
be154a3
[SPARK-48622][SQL] get SQLConf once when resolving column names
andrewxue-db Jun 14, 2024
70bdcc9
[MINOR][DOCS] Fix metrics info of shuffle service
Jun 14, 2024
0b214f1
[MINOR][DOCS][TESTS] Update repo name and link from `parquet-mr` to `…
wayneguow Jun 14, 2024
75fff90
[SPARK-45685][SQL][FOLLOWUP] Add handling for `Stream` where `LazyLis…
LuciferYang Jun 14, 2024
157b1e3
[SPARK-48612][SQL][SS] Cleanup deprecated api usage related to common…
LuciferYang Jun 14, 2024
3831886
[SPARK-48625][BUILD] Upgrade `mssql-jdbc` to 12.6.2.jre11
wayneguow Jun 14, 2024
878de00
[SPARK-48626][CORE] Change the scope of object LogKeys as private in …
gengliangwang Jun 14, 2024
dd8b05f
[SPARK-42252][CORE] Add `spark.shuffle.localDisk.file.output.buffer` …
wayneguow Jun 14, 2024
2d2bedf
[SPARK-48056][CONNECT][FOLLOW-UP] Scala Client re-execute plan if a S…
changgyoopark-db Jun 14, 2024
aa4bfb0
Revert "[SPARK-48591][PYTHON] Simplify the if-else branches with `F.l…
zhengruifeng Jun 14, 2024
8ee8aba
[SPARK-48621][SQL] Fix Like simplification in Optimizer for collated …
uros-db Jun 14, 2024
0775ea7
[SPARK-48611][CORE] Log TID for input split in HadoopRDD and NewHadoo…
pan3793 Jun 15, 2024
347f9c6
[SPARK-48302][PYTHON] Preserve nulls in map columns in PyArrow Tables
ianmcook Jun 16, 2024
c09039a
[SPARK-48597][SQL] Introduce a marker for isStreaming property in tex…
HeartSaVioR Jun 17, 2024
9881e0a
[SPARK-47777] fix python streaming data source connect test
chaoqin-li1123 Jun 17, 2024
8f662fc
[SPARK-48555][SQL][PYTHON][CONNECT] Support using Columns as paramete…
Jun 17, 2024
33a9c5d
[MINOR][PYTHON][DOCS] Fix pyspark.sql.functions.reduce docstring typo
kaashif Jun 17, 2024
257a788
[SPARK-48615][SQL] Perf improvement for parsing hex string
yaooqinn Jun 17, 2024
42cd961
[SPARK-48587][VARIANT] Avoid storage amplification when accessing a s…
cashmand Jun 17, 2024
0a4b112
[SPARK-48633][BUILD] Upgrade `scalacheck` to 1.18.0
wayneguow Jun 17, 2024
71475f7
[SPARK-48577][SQL] Invalid UTF-8 byte sequence replacement
uros-db Jun 17, 2024
0c16624
[SPARK-48627][SQL] Perf improvement for binary to to HEX_DISCRETE string
yaooqinn Jun 17, 2024
90d302a
[SPARK-48557][SQL] Support scalar subquery with group-by on column eq…
jchen5 Jun 17, 2024
d3da240
[SPARK-48610][SQL] refactor: use auxiliary idMap instead of OP_ID_TAG
liuzqt Jun 17, 2024
e00d26f
[SPARK-48600][SQL] Fix FrameLessOffsetWindowFunction expressions impl…
mihailom-db Jun 17, 2024
d3455df
[SPARK-48572][SQL] Fix DateSub, DateAdd, WindowTime, TimeWindow and S…
mihailom-db Jun 17, 2024
9ef092f
[SPARK-48641][BUILD] Upgrade `curator` to 5.7.0
wayneguow Jun 17, 2024
8fdd85f
[SPARK-48603][TEST] Update *ParquetReadSchemaSuite to cover type wide…
pan3793 Jun 17, 2024
66d8a29
[SPARK-47577][SPARK-47579] Correct misleading usage of log key TASK_ID
gengliangwang Jun 17, 2024
0864bbe
[SPARK-48566][PYTHON] Fix bug where partition indices are incorrect w…
dtenedor Jun 17, 2024
00a96bb
[SPARK-48642][CORE] False SparkOutOfMemoryError caused by killing tas…
pan3793 Jun 17, 2024
d8a24b7
[SPARK-48645][BUILD] Upgrade Maven to 3.9.8
dongjoon-hyun Jun 17, 2024
042804a
[SPARK-48567][SS] StreamingQuery.lastProgress should return the actua…
WweiL Jun 17, 2024
f0b7cfa
[SPARK-48497][PYTHON][DOCS] Add an example for Python data source wri…
allisonwang-db Jun 17, 2024
e265c60
[SPARK-47910][CORE] close stream when DiskBlockObjectWriter closeReso…
JacobZheng0927 Jun 18, 2024
738acd1
[SPARK-48648][PYTHON][CONNECT] Make SparkConnectClient.tags properly …
HyukjinKwon Jun 18, 2024
05c87e5
[SPARK-48644][SQL] Do a length check and throw COLLECTION_SIZE_LIMIT_…
yaooqinn Jun 18, 2024
c5809b6
[SPARK-48647][PYTHON][CONNECT] Refine the error message for `YearMont…
zhengruifeng Jun 18, 2024
a3feffd
[SPARK-48585][SQL] Make `built-in` JdbcDialect's method `classifyExce…
panbingkun Jun 18, 2024
9898e9d
[SPARK-48342][SQL] Introduction of SQL Scripting Parser
davidm-db Jun 18, 2024
58701d8
[SPARK-47148][SQL][FOLLOWUP] Use broadcast hint to make test more stable
cloud-fan Jun 18, 2024
80bba44
[SPARK-48459][CONNECT][PYTHON] Implement DataFrameQueryContext in Spa…
HyukjinKwon Jun 18, 2024
47ffe40
[SPARK-48646][PYTHON] Refine Python data source API docstring and typ…
allisonwang-db Jun 19, 2024
6ee7c25
[SPARK-48634][PYTHON][CONNECT] Avoid statically initialize threadpool…
HyukjinKwon Jun 19, 2024
5e28e95
[SPARK-48649][SQL] Add "ignoreInvalidPartitionPaths" and "spark.sql.f…
sadikovi Jun 19, 2024
878dd6a
[SPARK-48601][SQL] Give a more user friendly error message when setti…
stevomitric Jun 19, 2024
b77caf7
[SPARK-48651][DOC] Configuring different JDK for Spark on YARN
pan3793 Jun 19, 2024
1e868b2
Revert "[SPARK-48554][INFRA] Use R 4.4.0 in `windows` R GitHub Action…
HyukjinKwon Jun 19, 2024
d067fc6
Revert "[SPARK-48567][SS] StreamingQuery.lastProgress should return t…
HyukjinKwon Jun 19, 2024
b0e2cb5
[SPARK-48623][CORE] Structured Logging Migrations
asl3 Jun 19, 2024
2fe0692
[SPARK-48466][SQL] Create dedicated node for EmptyRelation in AQE
liuzqt Jun 19, 2024
3ac31b1
[SPARK-48574][SQL] Fix support for StructTypes with collations
mihailom-db Jun 19, 2024
484e7ac
[SPARK-48472][SQL] Enable reflect expressions with collated strings
mihailoale-db Jun 19, 2024
5458763
[SPARK-48541][CORE] Add a new exit code for executors killed by TaskR…
bozhang2820 Jun 19, 2024
5d6e9dd
[SPARK-47986][CONNECT][FOLLOW-UP] Unable to create a new session when…
changgyoopark-db Jun 19, 2024
248fd4c
[SPARK-48342][FOLLOWUP][SQL] Remove unnecessary import in AstBuilder
davidm-db Jun 20, 2024
9eadb2c
[SPARK-48634][PYTHON][CONNECT][FOLLOW-UP] Do not make a request if th…
HyukjinKwon Jun 20, 2024
0d9f8a1
[SPARK-48479][SQL] Support creating scalar and table SQL UDFs in parser
allisonwang-db Jun 20, 2024
692d869
[SPARK-48591][PYTHON] Add a helper function to simplify `Column.py`
zhengruifeng Jun 20, 2024
955349f
[SPARK-48620][PYTHON] Fix internal raw data leak in `YearMonthInterva…
zhengruifeng Jun 20, 2024
714699b
[SPARK-47911][SQL][FOLLOWUP] Enable binary format tests in ThriftServ…
yaooqinn Jun 20, 2024
904d4dd
[SPARK-48635][SQL] Assign classes to join type errors and as-of join …
wayneguow Jun 21, 2024
b0b02b2
[SPARK-48653][PYTHON] Fix invalid Python data source error class refe…
allisonwang-db Jun 21, 2024
e68b8ca
[SPARK-48677][BUILD] Upgrade `scalafmt` to 3.8.2
panbingkun Jun 21, 2024
6eb7978
[SQL][TEST] Re-run collation benchmark
uros-db Jun 21, 2024
d8099a2
[SPARK-48479][SQL][FOLLOWUP] Consolidate createOrReplaceTableColType …
allisonwang-db Jun 21, 2024
67c7187
[SPARK-48661][BUILD] Upgrade `RoaringBitmap` to 1.1.0
wayneguow Jun 21, 2024
f077759
[SPARK-48631][CORE][TEST] Fix test "error during accessing host local…
bozhang2820 Jun 21, 2024
62bad53
[SPARK-48672][DOC] Update Jakarta Servlet reference in security page
pan3793 Jun 21, 2024
b99bb00
[SPARK-48630][INFRA] Make `merge_spark_pr` keep the format of revert PR
zhengruifeng Jun 21, 2024
3469ec6
[SPARK-48656][CORE] Do a length check and throw COLLECTION_SIZE_LIMIT…
wayneguow Jun 21, 2024
cd8bf11
[SPARK-48659][SQL][TESTS] Unify v1 and v2 ALTER TABLE .. SET TBLPROPE…
panbingkun Jun 21, 2024
fdabe08
[SPARK-48490][CORE][FOLLOWUP] Properly process escape sequences
gengliangwang Jun 21, 2024
b5d0d07
[SPARK-48662][SQL] Fix StructsToXml expression with collations
mihailom-db Jun 21, 2024
97d3add
Revert "Revert "[SPARK-48554][INFRA] Use R 4.4.0 in `windows` R GitHu…
HyukjinKwon Jun 21, 2024
32861e0
[SPARK-48684][INFRA] Print related JIRA summary before proceeding merge
yaooqinn Jun 21, 2024
f0563ef
[SPARK-47172][CORE] Add support for AES-GCM for RPC encryption
sweisdb Jun 21, 2024
0bc38ac
[SPARK-48675][SQL] Fix cache table with collated column
nikolamand-db Jun 21, 2024
9414211
[SPARK-48490][CORE][TESTS][FOLLOWUP] Add some UT for the Windows path…
panbingkun Jun 21, 2024
7e5a461
[SPARK-48655][SQL] SPJ: Add tests for shuffle skipping for aggregate …
szehon-ho Jun 21, 2024
b1677a4
[SPARK-48545][SQL] Create to_avro and from_avro SQL functions to matc…
dtenedor Jun 21, 2024
c8d75c1
[SPARK-48620][PYTHON][FOLLOW-UP] Correct the error message for `Calen…
zhengruifeng Jun 22, 2024
84d278c
[MINOR] Fix some typos in `error-states.json`
wayneguow Jun 23, 2024
4b37eb8
[SPARK-48678][CORE] Performance optimizations for SparkConf.get(Confi…
JoshRosen Jun 23, 2024
e972dae
[SPARK-48688][SQL] Return reasonable error when calling SQL to_avro a…
dtenedor Jun 23, 2024
88cc153
[SPARK-48650][PYTHON] Display correct call site from IPython Notebook
itholic Jun 24, 2024
31fa9d8
[SQL][TEST][FOLLOWUP] Re-run collation benchmark (NonASCII)
uros-db Jun 24, 2024
4663b84
[SPARK-47681][FOLLOWUP] Fix schema_of_variant for float inputs
chenhao-db Jun 24, 2024
a7dc020
[SPARK-48681][SQL] Use ICU in Lower/Upper expressions for UTF8_BINARY…
uros-db Jun 24, 2024
8b16196
[SPARK-48680][SQL][DOCS] Add missing Java APIs and language-specific …
yaooqinn Jun 24, 2024
e459674
[SPARK-48683][SQL] Fix schema evolution with `df.mergeInto` losing `w…
xupefei Jun 24, 2024
09cb592
[SPARK-48639][CONNECT][PYTHON] Add Origin to Relation.RelationCommon
HyukjinKwon Jun 24, 2024
8e02a64
[SPARK-48695][PYTHON] `TimestampNTZType.fromInternal` not use the dep…
zhengruifeng Jun 24, 2024
fb5697d
[SPARK-48658][SQL] Encode/Decode functions report coding errors inste…
yaooqinn Jun 24, 2024
2ac2710
[SPARK-48702][INFRA] Fix `Python CodeGen check`
panbingkun Jun 25, 2024
5112e58
[SPARK-48692][BUILD] Upgrade `rocksdbjni` to 9.2.1
panbingkun Jun 25, 2024
d47f34f
[SPARK-48629] Migrate the residual code to structured logging framework
panbingkun Jun 25, 2024
b49479b
[SPARK-48704][INFRA] Update `build_sparkr_window.yml` to use `windows…
panbingkun Jun 25, 2024
51f1103
[SPARK-48686][SQL] Improve performance of ParserUtils.unescapeSQLString
JoshRosen Jun 25, 2024
8c4ca7e
[SPARK-48693][SQL] Simplify and unify toString of Invoke and StaticIn…
yaooqinn Jun 25, 2024
068be4b
[SPARK-48578][SQL] add UTF8 string validation related functions
uros-db Jun 25, 2024
5928908
[SPARK-48466][SQL][FOLLOWUP] Fix missing pattern match in EmptyRelati…
liuzqt Jun 25, 2024
9d4abaf
[SPARK-48638][CONNECT] Add ExecutionInfo support for DataFrame
grundprinzip Jun 25, 2024
ebacb91
[SPARK-48718][SQL] Handle and fix the case when deserializer in cogro…
anchovYu Jun 26, 2024
c459afb
[SPARK-48573][SQL] Upgrade ICU version
mihailom-db Jun 26, 2024
169346c
[SPARK-48578][SQL][FOLLOWUP] Fix `dev/scalastyle` error for `Expressi…
panbingkun Jun 26, 2024
07cbba6
[SPARK-48706][PYTHON] Python UDF in higher order functions should not…
HyukjinKwon Jun 26, 2024
ee3a612
[SPARK-48705][PYTHON] Explicitly use worker_main when it starts with …
HyukjinKwon Jun 26, 2024
bb21861
[SPARK-48059][CORE][FOLLOWUP] Fix bug for SparkLogger
panbingkun Jun 26, 2024
ec0ee86
[SPARK-48699][SQL] Refine collation API
uros-db Jun 26, 2024
7a1608b
[SPARK-48670][SQL] Providing suggestion as part of error message when…
dbatomic Jun 26, 2024
0fc5b0b
[SPARK-47353][SQL] Enable collation support for the Mode expression
GideonPotok Jun 26, 2024
4cf5450
[SPARK-48699][SQL][FOLLOWUP] Refine collation API
uros-db Jun 26, 2024
313479c
[SPARK-48713][SQL] Add index range check for UnsafeRow.pointTo when b…
Ngone51 Jun 26, 2024
e23d69b
[SPARK-48709][SQL] Fix varchar type resolution mismatch for DataSourc…
wangyum Jun 26, 2024
a474b88
[SPARK-48724][SQL][TESTS] Fix incorrect conf settings of `ignoreCorru…
wayneguow Jun 26, 2024
a50b30d
[SPARK-48687][SS] Add change to perform state schema validation and u…
anishshri-db Jun 26, 2024
7f5f96c
[SPARK-48691][BUILD] Upgrade scalatest related dependencies to the 3.…
wayneguow Jun 26, 2024
b47c614
writing schema
ericm-db Jun 25, 2024
c238e70
commenting
ericm-db Jun 25, 2024
acd8504
removing TODO
ericm-db Jun 26, 2024
de30c7a
tws tests pass
ericm-db Jun 26, 2024
e9d4fcc
added test case, serializing list of columnFamilyMetadata instead of …
ericm-db Jun 26, 2024
998a019
adding test case
ericm-db Jun 26, 2024
062f955
comment
ericm-db Jun 26, 2024
723f23c
adding purging logic
ericm-db Jun 26, 2024
32e73d0
added test case for purging
ericm-db Jun 26, 2024
1581264
[SPARK-48717][PYTHON][SS] Catch ForeachBatch py4j InterruptedExceptio…
WweiL Jun 27, 2024
906af78
[SPARK-48578][SQL][FOLLOWUP] add UTF8 string validation related funct…
uros-db Jun 27, 2024
b5f76b1
[SPARK-48723][INFRA] Run `git cherry-pick --abort` if backporting is …
yaooqinn Jun 27, 2024
14272e8
[SPARK-48721][SQL][DOCS] Fix the doc of `decode` function in SQL API …
yaooqinn Jun 27, 2024
ec7dde7
[SPARK-42610][CONNECT][FOLLOWUP] Add some test cases for `SQLImplicit…
wayneguow Jun 27, 2024
ea0cd01
[SPARK-39627][SQL][FOLLOWUP] Cleanup deprecated api usage related to …
wayneguow Jun 27, 2024
7c7c196
[SPARK-48712][SQL] Perf Improvement for encode with empty values or U…
yaooqinn Jun 27, 2024
48f39b8
[SPARK-48729][SQL] Add a UserDefinedFunction interface to represent a…
allisonwang-db Jun 27, 2024
2e31572
Revert "[SPARK-48639][CONNECT][PYTHON] Add Origin to Relation.Relatio…
HyukjinKwon Jun 27, 2024
b154623
[MINOR][TESTS] Always remove spark.master in ReusedConnectTestCase
HyukjinKwon Jun 27, 2024
58d1a89
[SPARK-48555][PYTHON][FOLLOW-UP] Simplify the support of `Any` parame…
zhengruifeng Jun 27, 2024
ff2d177
[MINOR][DOCS] Make pivot doctest deterministic
HyukjinKwon Jun 27, 2024
5943905
[MINOR][PYTHON][TESTS] Remove duplicate schema checking
HyukjinKwon Jun 27, 2024
642c4bb
[SPARK-48639][CONNECT][PYTHON] Add Origin to RelationCommon
HyukjinKwon Jun 27, 2024
5b53c6c
[MINOR][PYTHON][DOCS] Fix indents in function API references
zhengruifeng Jun 27, 2024
c788d12
[SPARK-48733][PYTHON][TESTS] Do not test SET command in Python UDTF test
HyukjinKwon Jun 27, 2024
7bf9119
[SPARK-48734][PYTHON][TESTS] Separate local cluster test from test_ar…
HyukjinKwon Jun 27, 2024
d89aad3
[SPARK-47927][SQL][FOLLOWUP] fix ScalaUDF output nullability
cloud-fan Jun 27, 2024
b11608c
[SPARK-48428][SQL] Fix IllegalStateException in NestedColumnAliasing
eejbyfeldt Jun 27, 2024
b5a55e4
[SPARK-46957][CORE] Decommission migrated shuffle files should be abl…
Ngone51 Jun 27, 2024
40ad829
[SPARK-48586][SS] Remove lock acquisition in doMaintenance() by makin…
riyaverm-db Jun 27, 2024
baf461b
[SPARK-48708][CORE] Remove three unnecessary type registrations from …
LuciferYang Jun 27, 2024
df13ca0
[SPARK-48735][SQL] Performance Improvement for BIN function
yaooqinn Jun 27, 2024
1cdd5fa
[SPARK-48736][PYTHON] Support infra fro additional includes for Pytho…
grundprinzip Jun 27, 2024
2c9eb1c
[SPARK-48682][SQL] Use ICU in InitCap expression for UTF8_BINARY strings
uros-db Jun 28, 2024
6304484
[SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE co…
uros-db Jun 28, 2024
8cd095f
[SPARK-48738][SQL] Correct since version for built-in func alias `ran…
wayneguow Jun 28, 2024
3bf7de0
[SPARK-48668][SQL] Support ALTER NAMESPACE ... UNSET PROPERTIES in v2
panbingkun Jun 28, 2024
2eeebef
[SPARK-46957][CORE][FOLLOW-UP] Use Collections.emptyMap for Java comp…
Ngone51 Jun 28, 2024
9141aa4
[SPARK-48744][CORE] Log entry should be constructed only once
gengliangwang Jun 28, 2024
a2c4be0
[SPARK-47233][CONNECT][SS][FOLLOW-UP] Add eventually for terminated e…
HyukjinKwon Jun 28, 2024
69f3a9b
[SPARK-48586][SS][FOLLOWUP] RocksDB and RocksDBFileManager code style…
riyaverm-db Jun 28, 2024
fc98ccd
[SPARK-48746][PYTHON][SS][TESTS] Avoid using global temp view in fore…
HyukjinKwon Jun 28, 2024
80277ee
[SPARK-48745][INFRA][PYTHON][TESTS] Remove unnecessary installation `…
panbingkun Jun 28, 2024
4e57f06
[SPARK-48307][SQL][FOLLOWUP] not-inlined CTE references sibling shoul…
cloud-fan Jun 28, 2024
6bfeb09
[SPARK-48757][CORE] Make `IndexShuffleBlockResolver` have explicit co…
dongjoon-hyun Jun 28, 2024
f49418b
[SPARK-48751][INFRA][PYTHON][TESTS] Re-balance `pyspark-pandas-connec…
panbingkun Jun 30, 2024
0487d78
[SPARK-48748][SQL] Cache numChars in UTF8String
uros-db Jul 1, 2024
a84a6a4
[SPARK-48749][SQL] Simplify UnaryPositive and eliminate its Catalyst …
yaooqinn Jul 1, 2024
f70ce13
[SPARK-48638][INFRA][FOLLOW-UP] Add graphviz into CI to run the relat…
HyukjinKwon Jul 1, 2024
399980e
[MINOR][DOCS] Fix the type hints of `functions.first(..., ignorenulls…
HyukjinKwon Jul 1, 2024
bc16b24
[SPARK-48765][DEPLOY] Enhance default value evaluation for SPARK_IDEN…
pan3793 Jul 1, 2024
cd1f1af
[SPARK-48747][SQL] Add code point iterator to UTF8String
uros-db Jul 1, 2024
48eb4d5
[SPARK-48737] Perf improvement during analysis - Create exception onl…
urosstan-db Jul 1, 2024
703b076
[SPARK-48697][SQL] Add collation aware string filters
stefankandic Jul 1, 2024
bab129d
combining rules
ericm-db Jul 1, 2024
9b94439
passing in encoders to columnfamilyschema
ericm-db Jul 1, 2024
5c29d8d
[SPARK-48768][PYTHON][CONNECT] Should not cache `explain`
zhengruifeng Jul 1, 2024
5ac7c9b
[SPARK-48766][PYTHON] Document the behavior difference of `extraction…
zhengruifeng Jul 1, 2024
6768eea
Feedback
ericm-db Jul 2, 2024
afb5e39
added base class
ericm-db Jul 2, 2024
d515740
rebase
ericm-db Jul 2, 2024
a1288a4
refactors
ericm-db Jul 2, 2024
a670585
comment
ericm-db Jul 2, 2024
9304223
[SPARK-44718][FOLLOWUP][DOCS] Avoid using ConfigEntry in spark.sql.co…
yaooqinn Jul 2, 2024
8a5f4e0
[SPARK-48759][SQL] Add migration doc for CREATE TABLE AS SELECT behav…
asl3 Jul 2, 2024
4ee37ed
[SPARK-48764][PYTHON] Filtering out IPython-related frames from user …
itholic Jul 2, 2024
353c2da
feedback, creating ColumnFamilySchemaFactory
ericm-db Jul 2, 2024
f471bfe
adding version in tws
ericm-db Jul 2, 2024
db9e1ac
[SPARK-48177][BUILD] Upgrade `Apache Parquet` to 1.14.1
Fokko Jul 2, 2024
49fece8
added override modifier
ericm-db Jul 2, 2024
7ad352f
feedback
ericm-db Jul 2, 2024
ee0d306
[SPARK-48589][SQL][SS] Add option snapshotStartBatchId and snapshotPa…
eason-yuchen-liu Jul 2, 2024
efe2e74
Merge branch 'master' into state-schema-tws
ericm-db Jul 2, 2024
3ea5d29
using tempdir
ericm-db Jul 2, 2024
5eee250
using map instead of list
ericm-db Jul 2, 2024
dce9968
removing purging
ericm-db Jul 2, 2024
9b1a7b2
tests pass
ericm-db Jul 3, 2024
f2857b9
combining rules
ericm-db Jul 3, 2024
4337016
removing metadataCacheEnabled
ericm-db Jul 3, 2024
dfb122f
removing COLUMN_FAMILY_SCHEMA_VERSION
ericm-db Jul 3, 2024
21af247
reverting hdfs metadata log changes
ericm-db Jul 3, 2024
3a36d06
feedback
ericm-db Jul 4, 2024
1b6ea1a
removing unused imports
ericm-db Jul 4, 2024
1df1e5d
line
ericm-db Jul 4, 2024
25b7b80
case match
ericm-db Jul 8, 2024
ede3136
feedback
ericm-db Jul 9, 2024
0a7945e
adding todos
ericm-db Jul 9, 2024
c1a041d
adding PR link
ericm-db Jul 9, 2024
feb4c01
removing batchId as dir
ericm-db Jul 9, 2024
c24490a
removing links
ericm-db Jul 10, 2024
b250aa4
sparkthrowablesuite
ericm-db Jul 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
6 changes: 2 additions & 4 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,12 +69,11 @@ jobs:
# In order to get diff files
with:
fetch-depth: 0
- name: Cache Scala, SBT and Maven
- name: Cache SBT and Maven
uses: actions/cache@v4
with:
path: |
build/apache-maven-*
build/scala-*
build/*.jar
~/.sbt
key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 'build/spark-build-info') }}
Expand Down Expand Up @@ -137,12 +136,11 @@ jobs:
# In order to get diff files
with:
fetch-depth: 0
- name: Cache Scala, SBT and Maven
- name: Cache SBT and Maven
uses: actions/cache@v4
with:
path: |
build/apache-maven-*
build/scala-*
build/*.jar
~/.sbt
key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 'build/spark-build-info') }}
Expand Down
279 changes: 166 additions & 113 deletions .github/workflows/build_and_test.yml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion .github/workflows/build_branch34.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,9 @@ jobs:
jobs: >-
{
"build": "true",
"pyspark": "true",
"sparkr": "true",
"tpcds-1g": "true",
"docker-integration-tests": "true",
"k8s-integration-tests": "true",
"lint" : "true"
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,22 +17,29 @@
# under the License.
#

name: Cancelling Duplicates
name: "Build / Python-only (branch-3.4)"

on:
workflow_run:
workflows:
- 'Build'
types: ['requested']
schedule:
- cron: '0 9 * * *'

jobs:
cancel-duplicate-workflow-runs:
name: "Cancel duplicate workflow runs"
runs-on: ubuntu-latest
steps:
- uses: potiuk/cancel-workflow-runs@4723494a065d162f8e9efd071b98e0126e00f866 # @master
name: "Cancel duplicate workflow runs"
with:
cancelMode: allDuplicates
token: ${{ secrets.GITHUB_TOKEN }}
sourceRunId: ${{ github.event.workflow_run.id }}
skipEventTypes: '["push", "schedule"]'
run-build:
permissions:
packages: write
name: Run
uses: ./.github/workflows/build_and_test.yml
if: github.repository == 'apache/spark'
with:
java: 8
branch: branch-3.4
hadoop: hadoop3
envs: >-
{
"PYTHON_TO_TEST": ""
}
jobs: >-
{
"pyspark": "true",
"pyspark-pandas": "true"
}
2 changes: 1 addition & 1 deletion .github/workflows/build_branch35.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,9 @@ jobs:
jobs: >-
{
"build": "true",
"pyspark": "true",
"sparkr": "true",
"tpcds-1g": "true",
"docker-integration-tests": "true",
"k8s-integration-tests": "true",
"lint" : "true"
}
45 changes: 45 additions & 0 deletions .github/workflows/build_branch35_python.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: "Build / Python-only (branch-3.5)"

on:
schedule:
- cron: '0 11 * * *'

jobs:
run-build:
permissions:
packages: write
name: Run
uses: ./.github/workflows/build_and_test.yml
if: github.repository == 'apache/spark'
with:
java: 8
branch: branch-3.5
hadoop: hadoop3
envs: >-
{
"PYTHON_TO_TEST": ""
}
jobs: >-
{
"pyspark": "true",
"pyspark-pandas": "true"
}
6 changes: 5 additions & 1 deletion .github/workflows/build_java21.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,5 +46,9 @@ jobs:
"pyspark": "true",
"sparkr": "true",
"tpcds-1g": "true",
"docker-integration-tests": "true"
"docker-integration-tests": "true",
"yarn": "true",
"k8s-integration-tests": "true",
"buf": "true",
"ui": "true"
}
2 changes: 1 addition & 1 deletion .github/workflows/build_maven_java21_macos14.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ name: "Build / Maven (master, Scala 2.13, Hadoop 3, JDK 21, macos-14)"

on:
schedule:
- cron: '0 20 * * *'
- cron: '0 20 */2 * *'

jobs:
run-build:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@
# under the License.
#

name: "Build / ANSI (master, Hadoop 3, JDK 17, Scala 2.13)"
name: "Build / Non-ANSI (master, Hadoop 3, JDK 17, Scala 2.13)"

on:
schedule:
- cron: '0 1,13 * * *'
- cron: '0 1 * * *'

jobs:
run-build:
Expand All @@ -36,13 +36,15 @@ jobs:
hadoop: hadoop3
envs: >-
{
"SPARK_ANSI_SQL_MODE": "true",
"SPARK_ANSI_SQL_MODE": "false",
}
jobs: >-
{
"build": "true",
"docs": "true",
"pyspark": "true",
"sparkr": "true",
"tpcds-1g": "true",
"docker-integration-tests": "true"
"docker-integration-tests": "true",
"yarn": "true"
}
45 changes: 45 additions & 0 deletions .github/workflows/build_python_3.10.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: "Build / Python-only (master, Python 3.10)"

on:
schedule:
- cron: '0 17 * * *'

jobs:
run-build:
permissions:
packages: write
name: Run
uses: ./.github/workflows/build_and_test.yml
if: github.repository == 'apache/spark'
with:
java: 17
branch: master
hadoop: hadoop3
envs: >-
{
"PYTHON_TO_TEST": "python3.10"
}
jobs: >-
{
"pyspark": "true",
"pyspark-pandas": "true"
}
45 changes: 45 additions & 0 deletions .github/workflows/build_python_3.12.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: "Build / Python-only (master, Python 3.12)"

on:
schedule:
- cron: '0 19 * * *'

jobs:
run-build:
permissions:
packages: write
name: Run
uses: ./.github/workflows/build_and_test.yml
if: github.repository == 'apache/spark'
with:
java: 17
branch: master
hadoop: hadoop3
envs: >-
{
"PYTHON_TO_TEST": "python3.12"
}
jobs: >-
{
"pyspark": "true",
"pyspark-pandas": "true"
}
47 changes: 33 additions & 14 deletions .github/workflows/build_python_connect.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,11 @@ jobs:
steps:
- name: Checkout Spark repository
uses: actions/checkout@v4
- name: Cache Scala, SBT and Maven
- name: Cache SBT and Maven
uses: actions/cache@v4
with:
path: |
build/apache-maven-*
build/scala-*
build/*.jar
~/.sbt
key: build-spark-connect-python-only-${{ hashFiles('**/pom.xml', 'project/build.properties', 'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 'build/spark-build-info') }}
Expand All @@ -63,16 +62,16 @@ jobs:
architecture: x64
- name: Build Spark
run: |
./build/sbt -Phive test:package
./build/sbt -Phive Test/package
- name: Install pure Python package (pyspark-connect)
env:
SPARK_TESTING: 1
run: |
cd python
python packaging/connect/setup.py sdist
cd dist
pip install pyspark-connect-*.tar.gz
pip install 'six==1.16.0' 'pandas<=2.2.2' scipy 'plotly>=4.8' 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler>=0.61.0' 'scikit-learn>=1.3.2' torch torchvision torcheval deepspeed unittest-xml-reporting
pip install pyspark*connect-*.tar.gz
pip install 'six==1.16.0' 'pandas<=2.2.2' scipy 'plotly>=4.8' 'mlflow>=2.8.1' coverage matplotlib openpyxl 'memory-profiler>=0.61.0' 'scikit-learn>=1.3.2' 'graphviz==0.20.3' torch torchvision torcheval deepspeed unittest-xml-reporting
- name: Run tests
env:
SPARK_TESTING: 1
Expand All @@ -81,26 +80,46 @@ jobs:
# Make less noisy
cp conf/log4j2.properties.template conf/log4j2.properties
sed -i 's/rootLogger.level = info/rootLogger.level = warn/g' conf/log4j2.properties
# Start a Spark Connect server
PYTHONPATH="python/lib/pyspark.zip:python/lib/py4j-0.10.9.7-src.zip:$PYTHONPATH" ./sbin/start-connect-server.sh --driver-java-options "-Dlog4j.configurationFile=file:$GITHUB_WORKSPACE/conf/log4j2.properties" --jars `find connector/connect/server/target -name spark-connect*SNAPSHOT.jar`
# Make sure running Python workers that contains pyspark.core once. They will be reused.
python -c "from pyspark.sql import SparkSession; _ = SparkSession.builder.remote('sc://localhost').getOrCreate().range(100).repartition(100).mapInPandas(lambda x: x, 'id INT').collect()"

# Start a Spark Connect server for local
PYTHONPATH="python/lib/pyspark.zip:python/lib/py4j-0.10.9.7-src.zip:$PYTHONPATH" ./sbin/start-connect-server.sh \
--driver-java-options "-Dlog4j.configurationFile=file:$GITHUB_WORKSPACE/conf/log4j2.properties" \
--jars "`find connector/connect/server/target -name spark-connect-*SNAPSHOT.jar`,`find connector/protobuf/target -name spark-protobuf-*SNAPSHOT.jar`,`find connector/avro/target -name spark-avro*SNAPSHOT.jar`"

# Remove Py4J and PySpark zipped library to make sure there is no JVM connection
rm python/lib/*
rm -r python/pyspark
mv python/lib lib.back
mv python/pyspark pyspark.back

# Several tests related to catalog requires to run them sequencially, e.g., writing a table in a listener.
./python/run-tests --parallelism=1 --python-executables=python3 --modules pyspark-connect,pyspark-ml-connect
# None of tests are dependent on each other in Pandas API on Spark so run them in parallel
./python/run-tests --parallelism=4 --python-executables=python3 --modules pyspark-pandas-connect-part0,pyspark-pandas-connect-part1,pyspark-pandas-connect-part2,pyspark-pandas-connect-part3

# Stop Spark Connect server.
./sbin/stop-connect-server.sh
mv lib.back python/lib
mv pyspark.back python/pyspark

# Start a Spark Connect server for local-cluster
PYTHONPATH="python/lib/pyspark.zip:python/lib/py4j-0.10.9.7-src.zip:$PYTHONPATH" ./sbin/start-connect-server.sh \
--master "local-cluster[2, 4, 1024]" \
--driver-java-options "-Dlog4j.configurationFile=file:$GITHUB_WORKSPACE/conf/log4j2.properties" \
--jars "`find connector/connect/server/target -name spark-connect-*SNAPSHOT.jar`,`find connector/protobuf/target -name spark-protobuf-*SNAPSHOT.jar`,`find connector/avro/target -name spark-avro*SNAPSHOT.jar`"

# Remove Py4J and PySpark zipped library to make sure there is no JVM connection
mv python/lib lib.back
mv python/pyspark lib.back

./python/run-tests --parallelism=1 --python-executables=python3 --testnames "pyspark.resource.tests.test_connect_resources,pyspark.sql.tests.connect.client.test_artifact,pyspark.sql.tests.connect.client.test_artifact_localcluster,pyspark.sql.tests.connect.test_resources"
- name: Upload test results to report
if: always()
uses: actions/upload-artifact@v4
with:
name: test-results-spark-connect-python-only
path: "**/target/test-reports/*.xml"
- name: Upload unit tests log files
if: failure()
- name: Upload Spark Connect server log file
if: ${{ !success() }}
uses: actions/upload-artifact@v4
with:
name: unit-tests-log-spark-connect-python-only
path: "**/target/unit-tests.log"
path: logs/*.out
Loading
Loading