-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove useless imports #12
Conversation
Closing and re-opening to trigger a test |
Hey @runzhliu thank you for your PR. Give us a few days to polish up our infrastructure to properly test PRs, ensure style checks, etc. We will get back to this PR as soon as possible. Thanks for you patience. |
Hi @tdas, happy to get ur reply, I'm very looking forward to the development of this project! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for taking so long to get to this; we were still working out our processing for merging changes. LGTM, I will merge this change.
Obviously, there is currently no check for code styles including useless import, now I remove all of the useless imports. Closes #12 Closes #5260 from mukulmurthy/6020wsjz. Lead-authored-by: runzhliu <runzhliu@163.com> Co-authored-by: Mukul Murthy <mukul.murthy@databricks.com> Co-authored-by: Mukul Murthy <38224594+mukulmurthy@users.noreply.github.com> Signed-off-by: Mukul Murthy <mukul.murthy@databricks.com> GitOrigin-RevId: af35df3c7e881336641b182743c92a2aedfb1d6e
Obviously, there is currently no check for code styles including useless import, now I remove all of the useless imports. Closes delta-io#12 Closes #5260 from mukulmurthy/6020wsjz. Lead-authored-by: runzhliu <runzhliu@163.com> Co-authored-by: Mukul Murthy <mukul.murthy@databricks.com> Co-authored-by: Mukul Murthy <38224594+mukulmurthy@users.noreply.github.com> Signed-off-by: Mukul Murthy <mukul.murthy@databricks.com> GitOrigin-RevId: af35df3c7e881336641b182743c92a2aedfb1d6e
commit f462089fe53f95c38089b1804f904bb2918b87f2 Author: Wesley Hoffman <wesleyhoffman109@gmail.com> Date: Tue Sep 10 11:39:51 2019 -0700 Squashed commit of the following: commit 5026b73dc7073f3de0215252f9a549671db915d6 Author: Pranav Anand <anandpranavv@gmail.com> Date: Fri Sep 6 20:12:10 2019 +0000 [SC-20615] Refactor tests to make it easier to extend them Split the utility methods into a separate trait so that they can be reused in other tests. The existing tests should still work, no behavior should have been changed Author: Pranav Anand <anandpranavv@gmail.com> GitOrigin-RevId: e430beaf69ebfd870fd955a5b8bc1c4cec4c6bb9 commit e025ac1de374ce652849a8498fc36476844c96e2 Author: Pranav Anand <anandpranavv@gmail.com> Date: Fri Sep 6 10:58:14 2019 -0700 [SC-20730] Log invalid DeltaOptions - Adds a verification method to `DeltaOptions` which checks if given options are valid or not in which case it usage logs them - Adds tests which check different ways users may be able to pass in incorrect DeltaOptions and asserts whether they are logged correctly in `DeltaLogSuiteEdge` - Test case has both a unit test as well as an "end to end" test where writing to and from delta are tested Closes #5949 from pranavanand123/options-checker-delta. Authored-by: Pranav Anand <anandpranavv@gmail.com> Signed-off-by: Burak Yavuz <brkyvz@gmail.com> GitOrigin-RevId: 37d6c5f5453eed9fab2e1647940a8df91db4fc75 commit 5ba1a0dcc2aab37cb27a16d5e3231e2dfa0fa692 Author: Burak Yavuz <brkyvz@gmail.com> Date: Fri Sep 6 16:40:17 2019 +0000 [SC-20947] Always make the output attributes nullable when writing to Delta Even though we change the schema as nullable when writing to Delta, the output attributes may remain not-null. In such cases, when writing to Parquet, we still want to keep the attributes as nullable, to avoid potential corruption with Parquet. This unfortunately bloats certain parquet file sizes, but the changes in the tests suggest that we weren't doing the right thing in the first place. Regression test Author: Burak Yavuz <brkyvz@gmail.com> GitOrigin-RevId: ee2941458fd1ba1ba62e911e8b1987f219dce10e commit 103f5cb8937182c69b09b855587579dd91067c9b Author: Pranav Anand <anandpranavv@gmail.com> Date: Thu Sep 5 22:51:41 2019 +0000 [SC-18153] Minor refactoring in the exceptions Changed exception to take table names and paths. Author: Pranav Anand <anandpranavv@gmail.com> GitOrigin-RevId: 6aa9d068c332c0cb782f67bac0ef8d9125fb20e7 commit 55ca0054a7ba01a05acda5d8edc961f7a2ce9cdc Author: Yishuang Lu <luystu@gmail.com> Date: Wed Sep 4 18:12:49 2019 +0000 [DELTA-OSS-EXTERNAL] Remove all unused imports in the code Remove all unused imports in the code Signed-off-by: Yishuang Lu <luystu@gmail.com> Closes delta-io/delta#135 Author: lys0716 <luystu@gmail.com> Author: Shixiong Zhu <zsxwing@gmail.com> GitOrigin-RevId: 5732e6337d9a66dad1f95cf60dd4bbab34924325 commit 20840668e18a89f85568d1d8e025f0de5d440394 Author: dmatrix <dmatrix@comcast.net> Date: Tue Sep 3 23:34:23 2019 +0000 [DELTA-OSS-EXTERNAL] found minor typos and usage; fixed it Minor edits in usage in the README.md. Closes delta-io/delta#57 Author: dmatrix <dmatrix@comcast.net> #6265 is resolved by zsxwing/sk5qnz6j. GitOrigin-RevId: 40f1c948fbb0c493f4f24c5117e5105e40d50942 commit 06e33df20a724ed5f4e76e6746b5c4bccf674b1d Author: Shixiong Zhu <zsxwing@gmail.com> Date: Tue Sep 3 10:12:29 2019 -0700 [SC-21941][WARMFIX] Log cleanup should delete the checksum file for version 0 Right now as we are listing from the delta file name from version 0, it will not return the checkpoint file for version 0. This is usually fine since we don't checkpoint version 0. However, technically, we can create a checkpoint for version 0, so it's better to also handle it by using the checkpoint file name to list. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com> GitOrigin-RevId: 335904ef1738aebd94d0e30684956c675576f266 commit 64905d8ccd2013a1b77118b1f9004aa1fe352372 Author: Wesley Hoffman <wesleyhoffman109@gmail.com> Date: Thu Aug 29 21:02:57 2019 +0000 [DELTA-OSS-EXTERNAL] Document features not supported in OSS Delta fixes #77 Closes delta-io/delta#129 Author: Wesley Hoffman <SpaceRangerWes@users.noreply.github.com> GitOrigin-RevId: 4e7513968ec154ac629a61c23e996b03e83d63de commit 68aec53866b4159acf3318149f59a6c884ce0b08 Author: Rahul Mahadev <rahul.mahadev@databricks.com> Date: Mon Aug 26 16:52:54 2019 -0700 [DELTA-OSS-EXTERNAL] Add MIMA - Scala Binary Incompatability Check to sbt This PR adds MIMA to the build process of Delta Lake. During build process we fetch the latest release of Delta Lake and check if the new change would break any binary compatibility with the previous versions. Note: `sbt test` would trigger MIMA check however they are not triggered with `sbt testOnly` Closes delta-io/delta#137 Closes #6171 from rahulsmahadev/720b5kn1. Authored-by: Rahul Shivu Mahadev <51690557+rahulsmahadev@users.noreply.github.com> Signed-off-by: Rahul Mahadev <rahul.mahadev@databricks.com> GitOrigin-RevId: e5bbd0dc17fad44331176d0d17a427dfb2dcc830 commit 778cc6923bb3985122062f7ce4af4cd58a15f849 Author: Jungtaek Lim <kabhwan@gmail.com> Date: Mon Aug 26 19:50:22 2019 +0000 [DELTA-OSS-EXTERNAL] Try to delete leaked CRC file in HDFSLogStore due to HADOOP-16255 Due to [HADOOP-16255](https://issues.apache.org/jira/browse/HADOOP-16255), `fc.rename` doesn't correctly rename CRC file of source file if filesystem is descendant of `ChecksumFs` (specifically `LocalFs`), which makes HDFSLogStore leak CRC files of temp files. This patch will try to delete CRC file of source file when renaming, but just do as a "best-effort" since it's OK to leak some CRC file instead of let write fail. Also added verification logic to check any leaked CRC files. Closes delta-io/delta#139 Author: Jungtaek Lim <kabhwan@gmail.com> #6165 is resolved by zsxwing/l406oky1. GitOrigin-RevId: 2abafc268b3f0407115378d218c5b7a11118e200 commit ee1770714072874f295d6ce3205849b702c5eda2 Author: Jose Torres <joseph.torres@databricks.com> Date: Fri Aug 23 20:09:55 2019 +0000 [SC-19523][DELTA] Move DeltaSource offset forward even if there are no AddFiles Currently, only commits with data move the DeltaSource offset forward. So if there are many no-data commits over a long period of time, the retention period will eventually hit the old commit that the DeltaSource is at, causing it to fail. new unit test Author: Jose Torres <joseph.torres@databricks.com> GitOrigin-RevId: 2c69af0f16a186b52742e4bd4e21ef4ee1597230 commit ad72d485c6c481c870ed80e0b52e922223721af6 Author: Yishuang Lu <luystu@gmail.com> Date: Wed Aug 21 13:36:52 2019 -0700 [DELTA-OSS-EXTERNAL] Fix typos in the code Fix typos in the code Signed-off-by: Yishuang Lu <luystugmail.com> Closes delta-io/delta#132 Closes #6105 from mukulmurthy/czjxk36z. Lead-authored-by: Yishuang Lu <luystu@gmail.com> Co-authored-by: Mukul Murthy <38224594+mukulmurthy@users.noreply.github.com> Co-authored-by: lys0716 <luystu@gmail.com> Signed-off-by: Mukul Murthy <mukul.murthy@databricks.com> GitOrigin-RevId: a3f46c85c09f7ca33482beb45020720e8dde1e4f commit 5adb8d1f43b2c921a589e0edacaac52e8af44724 Author: Rahul Mahadev <rahul.mahadev@databricks.com> Date: Tue Aug 20 11:46:47 2019 -0700 [SC-21403][DELTA] Describe History Scala API test cleanup Refactored the DescribeDeltaHistorySuite to remove an unnecessary DeltaLog creation. Closes #6081 from rahulsmahadev/describe_history_ss. Authored-by: Rahul Mahadev <rahul.mahadev@databricks.com> Signed-off-by: Rahul Mahadev <rahul.mahadev@databricks.com> GitOrigin-RevId: aa3126ab362df2a5e46ae14c2ff530e93ecaee1b commit d3b5e42f955d45e71ad2f20253800e37eadee7b6 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Tue Aug 20 18:30:03 2019 +0000 [DELTA-OSS-EXTERNAL] Hide package private types and methods in javadoc Add `-P:genjavadoc:strictVisibility=true` to scalac options in order to hide package private types and methods in javadoc. Manually ran `build/sbt clean unidoc` and verified the generated javadoc doesn't show package private methods of `DeltaTable`. Closes delta-io/delta#130 Author: Shixiong Zhu <zsxwing@gmail.com> #6079 is resolved by zsxwing/nar2ifi2. GitOrigin-RevId: 085060f712eaee15379e8366358913cac64d441d commit 94572bfe1387766d8b9d2d9d5f613d4d888ab7a6 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Tue Aug 20 10:45:13 2019 +0000 [SC-20935] Add DeltaLogging to DeltaMergeBuilder Add DeltaLogging to DeltaMergeBuilder to allow tracking metrics about merges. Author: Tathagata Das <tathagata.das1565@gmail.com> GitOrigin-RevId: 4298da297e73ec916ef5d87ac9b96f6f5a03f518 commit 07f1e6f6be377a10e4257c6d9e44c8c0bc557d54 Author: Terry Kim <yuminkim@gmail.com> Date: Tue Aug 13 15:58:37 2019 +0000 [DELTA-OSS-EXTERNAL] Fix build warnings and remove unnecessary imports `build/sbt compile` shows: ``` [info] Compiling 72 Scala sources to /tmp/delta/target/scala-2.12/classes... [warn] /tmp/delta/src/main/scala/org/apache/spark/sql/delta/PreprocessTableMerge.scala:21: imported `DeltaErrors' is permanently hidden by definition of object DeltaErrors in package delta [warn] import org.apache.spark.sql.delta.{DeltaErrors, DeltaFullTable} [warn] ^ [warn] /tmp/delta/src/main/scala/org/apache/spark/sql/delta/PreprocessTableMerge.scala:21: imported `DeltaFullTable' is permanently hidden by definition of object DeltaFullTable in package delta [warn] import org.apache.spark.sql.delta.{DeltaErrors, DeltaFullTable} [warn] ^ [warn] /tmp/delta/src/main/scala/org/apache/spark/sql/delta/PreprocessTableUpdate.scala:19: imported `DeltaErrors' is permanently hidden by definition of object DeltaErrors in package delta [warn] import org.apache.spark.sql.delta.{DeltaErrors, DeltaFullTable} [warn] ^ [warn] /tmp/delta/src/main/scala/org/apache/spark/sql/delta/PreprocessTableUpdate.scala:19: imported `DeltaFullTable' is permanently hidden by definition of object DeltaFullTable in package delta [warn] import org.apache.spark.sql.delta.{DeltaErrors, DeltaFullTable} [warn] ^ [warn] /tmp/delta/src/main/scala/org/apache/spark/sql/delta/UpdateExpressionsSupport.scala:19: imported `DeltaErrors' is permanently hidden by definition of object DeltaErrors in package delta [warn] import org.apache.spark.sql.delta.DeltaErrors [warn] ^ [warn] there were two deprecation warnings; re-run with -deprecation for details [warn] 6 warnings found ``` Remove unnecessary imports in some other files as well. Closes delta-io/delta#120 Author: Mukul Murthy <mukul.murthy@databricks.com> Author: Terry Kim <yuminkim@gmail.com> GitOrigin-RevId: c9a40e1063abf96aada8a6a6b19ac04425b799bd commit 6cb6406483d4fbd06ca0f791b1da8d748466ffe3 Author: Jose Torres <joseph.torres@databricks.com> Date: Tue Aug 13 14:58:19 2019 +0000 [SC-20682][DELTA] Save partition schema in PreparedDeltaFileIndex PreparedDeltaFileIndex right now recomputes the partition schema from snapshot every time. There's no need for this, and it ends up meaning time travel creates a whole new snapshot for each partition (since the time travel snapshot isn't the most recent one in the Delta log). Added a file operations test. The number of listed paths in the provided test case goes down from 309 to 9. Author: Jose Torres <joseph.torres@databricks.com> GitOrigin-RevId: e78974ac4122f72833c874f156c975a5bd022156 commit 1b8b376f43b2d56b54cf9e78694c749754e2ff9f Author: Mukul Murthy <mukul.murthy@gmail.com> Date: Tue Aug 13 09:34:31 2019 -0700 [DELTA-REFACTOR] Cleanup and merge a couple import statements Authored-by: Mukul Murthy <mukul.murthy@databricks.com> Signed-off-by: Mukul Murthy <mukul.murthy@databricks.com> GitOrigin-RevId: 8d34431be055121705c8462a65969e0af67fcd35 commit 36bc2f111f3a761534d6beda8067e08f6cc84c21 Author: Burak Yavuz <brkyvz@gmail.com> Date: Mon Aug 12 23:25:30 2019 +0000 [DELTA-OSS-EXTERNAL] Reduce file listing parallelism for tests This should make Vacuum tests a lot faster. It's running 10,000 individual Spark jobs right now due to the very high setting of file parallelism setting. cc @rahulsmahadev Closes delta-io/delta#113 Author: Burak Yavuz <brkyvz@gmail.com> GitOrigin-RevId: 94dbb425368952bba633991b2c1a8045b8b53a4f commit 94c407c28a9b4f0bf8aeb6fe7180c23379afcc96 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Mon Aug 12 14:45:01 2019 -0700 [DELTA-OSS-EXTERNAL] Disable the automatic async log cleanup in DeltaRetentionSuite to make tests stable Sometimes tests in DeltaRetentionSuite fail because of the automatic async log cleanup. This PR just disables to make tests stable. Closes delta-io/delta#125 Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Mukul Murthy <mukul.murthy@databricks.com> GitOrigin-RevId: 6531f320ba97cc019b4c3a7b74b493a5f165ea8d commit ccda9e4a5acfae4e71a3e7ab43f45575a5900ea1 Author: Yucai Yu <yyu1@ebay.com> Date: Mon Aug 12 14:39:17 2019 -0700 [DELTA-OSS-EXTERNAL] Fix comments in DeltaLog.createRelation The return type of `createRelation` should be `BaseRelation` instead of `DataFrame`. Closes delta-io/delta#117 Authored-by: Yucai Yu <yyu1@ebay.com> Signed-off-by: Mukul Murthy <mukul.murthy@databricks.com> GitOrigin-RevId: a3678d988cf3799c9999d67e88e0534be43b4fca commit f5ed7dd5c534819d3bdfdbe7c1ccac5b685ce33b Author: lys0716 <luystu@gmail.com> Date: Mon Aug 12 17:37:32 2019 +0000 [DELTA-OSS-EXTERNAL] Update binaries version in README.md from 0.2.0 to 0.3.0 Update binaries version in README.md from 0.2.0 to 0.3.0 Signed-off-by: Yishuang Lu <luystu@gmail.com> Closes delta-io/delta#119 Author: lys0716 <luystu@gmail.com> GitOrigin-RevId: 5e5e541b89d0b46ac84b5b0ba3a28069cb2cb6cd commit d9749f7c26fa63c5f05fdb2ed4c7116bdd56a51d Author: liwensun <liwen.sun@databricks.com> Date: Fri Aug 9 11:17:50 2019 -0700 [SC-20980][DELTA] Bump oss build version to 0.3.1-SNAPSHOT ## What changes were proposed in this pull request? a.t.t Closes #5991 from liwensun/sc20980-oss-version. Authored-by: liwensun <liwen.sun@databricks.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com> GitOrigin-RevId: b4c5d1fdb50b6c49b5b07a54edbccfd210c58e97 commit b18b1775b9be6dcf855265268e3c29fdc6c9ed35 Author: Rahul Mahadev <rahul.mahadev@databricks.com> Date: Thu Aug 8 15:42:27 2019 -0700 [SC-20260][DELTA] Evolvability test for describe history command. ## What changes were proposed in this pull request? Added evolvability test for Describe History Scala API. Generated the resource files for OSS using `build/sbt "test:runMain org.apache.spark.sql.delta.EvolvabilitySuite src/test/resources/delta/delta-0.2.0`. on delta lake 0.2.0 release. ## How was this patch tested? Closes #5929 from rahulsmahadev/evolvability_suite. Authored-by: Rahul Mahadev <rahul.mahadev@databricks.com> Signed-off-by: Rahul Mahadev <rahul.mahadev@databricks.com> GitOrigin-RevId: adca1e06f3d59d99ed3841f2b7967fc4031ebc2b commit 2da5bcfc8449473c0f1d0e3eae4b9da3dc03a554 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Mon Aug 5 19:40:04 2019 +0000 [SC-20741]Remove DeltaTable.apply and add DeltaTableTestUtils to open it in tests only ## What changes were proposed in this pull request? This PR removes the public method `DeltaTable.apply` to avoid exposing an internal API. I made the constructor of `DeltaTable` package private and add `DeltaTableTestUtils` in tests to open it up in tests only. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxwing@gmail.com> #5951 is resolved by zsxwing/SC-20741. GitOrigin-RevId: 679c9d245dacf5849c6fa2ed055e4b2bd2db0c70 commit 75439ff80157b7abbc50eb3ed5963f68e0503296 Author: Rahul Mahadev <rahul.mahadev@databricks.com> Date: Fri Aug 2 11:25:04 2019 -0700 [SC-20324] Refactored DeltaTable Refactored DeltaTable to take SparkSession and DeltaLog as parameters instead of DataFrame. Closes #5934 from rahulsmahadev/deltatable_refacotr. Authored-by: Rahul Mahadev <rahul.mahadev@databricks.com> Signed-off-by: Rahul Mahadev <rahul.mahadev@databricks.com> GitOrigin-RevId: 663ab2070c58e900ca3dd73452696281cbff1fc7 commit 3100b6d87247946ec3d65686bee78c43f50c551b Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Thu Aug 1 16:41:11 2019 -0700 Setting version to 0.3.1-SNAPSHOT commit 21ba848dad637ed6e7e64f785397112b46770c15 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Thu Aug 1 16:39:41 2019 -0700 Setting version to 0.3.0 commit e75c8d14270a5f12fcf44cde05eb151575496d15 Author: Rahul Mahadev <rahul.mahadev@databricks.com> Date: Thu Aug 1 20:06:21 2019 +0000 Refactoring io.delta package to io.delta.tables Staging this PR for now. Changing the namespace `io.delta` to `io.delta.tables` No new tests added, re ran old tests. Author: Rahul Mahadev <rahul.mahadev@databricks.com> GitOrigin-RevId: ab6a5968a29f698426bea27db0f33f431da637ef commit df0393e66a614d993b1dbf4ebd02f5aeb69e6a12 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Wed Jul 31 23:14:46 2019 +0000 [SC-20413]Snapshot staleness should use the last update timestamp ## What changes were proposed in this pull request? Right now we use the timestamp of latest commit to check staleness. This is not great since if a table is old and doesn't have recent commits, we will always run `update` to check, which is wasting a list request. We can remember the timestamp of the last update, and use it to check staleness. Then we can save a list request when a table is not stale. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxwing@gmail.com> #5904 is resolved by zsxwing/update-staleness. GitOrigin-RevId: 8f6939d7d84997ff0ec9490b3ec41db0519d450e commit 890158a020494948275c63e6966159246dbe25a3 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Wed Jul 31 20:04:30 2019 +0000 [SC-20377]Delta streaming source should check the latest protocol of a table ## What changes were proposed in this pull request? - Delta streaming source should verify if it's allowed to read a table when loading a json file. - Add tests to make sure protocolRead/Write is called with the right Protocol instance. Author: Shixiong Zhu <zsxwing@gmail.com> #5881 is resolved by zsxwing/protocol-fix. GitOrigin-RevId: fd64c4b528a1d482f2f3edede24cb93c3b1a54a0 commit 388f1f4aebb24d45df09995f3787f63b14175950 Author: Rahul Mahadev <rahul.mahadev@databricks.com> Date: Fri Jul 26 23:00:50 2019 +0000 Fixed comments in vacuum Minor changes Author: Rahul Mahadev <rahul.mahadev@databricks.com> GitOrigin-RevId: 3f40622903e5ee301af3be06c0cc91f4722a52f8 commit 46b995b56a8b15724980eade822df7a91c1bb224 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Fri Jul 26 22:58:05 2019 +0000 [DELTA-OSS-EXTERNAL] Refactored and improved API docs for DeltaTable operations - Moved the update() methods to DeltaTable class to fix java doc issues - Converted builder classes from case class to simple class because case classes have unnecessary public methods (e.g. productArity, etc.) that show up in the API docs. - Added details docs for update and merge Closes delta-io/delta#101 Author: Tathagata Das <tathagata.das1565@gmail.com> #5897 is resolved by tdas/dunpvr01. GitOrigin-RevId: cc7b9fe82c3182859ffa0ff0e7bc6942ffb02be7 commit e624d92d7cfa7669ef97e423a9c451de88f4e5ca Author: Rahul Mahadev <rahul.mahadev@databricks.com> Date: Fri Jul 26 13:31:29 2019 -0700 [SC-19634] Add Describe History Scala APIs to OSS Delta Lake ## What changes were proposed in this pull request? Adding DescribeDeltaHistory history Scala API DescribeDeltaHistory on a DeltaTable would return a DataFrame with the commit info in reverse chronological order. The limit optional parameter specifies the last limit operations to fetch the History on. Sample usage : deltaTable = new DeltaTable(spark.table(tableName)) deltaTable.history(limit = 10) deltaTable.history() ## How was this patch tested? DeltaDescribeHistorySuite Closes #5852 from rahulsmahadev/describe-history. Authored-by: Rahul Mahadev <rahul.mahadev@databricks.com> Signed-off-by: Rahul Mahadev <rahul.mahadev@databricks.com> GitOrigin-RevId: 071cb9cdcbd0e56e35eb9cbb727ee3c80fb57904 commit 96ec65407ba178a37ad7b389954faae2b0dd1d10 Author: Zhitong Yan <zhitong.yan@databricks.com> Date: Thu Jul 25 23:52:35 2019 +0000 [DELTA-OSS] Enable JUnit tests in Delta Lake ## What changes were proposed in this pull request? After this PR, Delta Lake will have the ability to run JUnit tests in SBT. ## How was this patch tested? by `JavaDeltaTableSuite.java` Closes delta-io/delta#97 Author: Zhitong Yan <zhitong.yan@databricks.com> #5654 is resolved by ZhitongDB/enable-java-test-in-oss. GitOrigin-RevId: 89f93948a8deda9c260df3e5265d440b70d45931 commit 43c30f6631b4922234b81357df0b518649e6dd30 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Thu Jul 25 22:41:53 2019 +0000 [SC-20256]Save one Snapshot when creating DeltaLog ## What changes were proposed in this pull request? Right now creating DeltaLog needs to create 2 Snapshots when the last checkpoint version is not the same as the latest version. We load the parquet checkpoint to create a `Snapshot`, then call `update` to pick up latest json files after the checkpoint. We can load the checkpoint and json files together to save one `Snapshot`. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxwing@gmail.com> #5865 is resolved by zsxwing/load. GitOrigin-RevId: 6ab2ce407f613ecf2cbcd5b6136443176470442d commit d6dd8bca125cebc2bda9f0d51c1321a652b8ab01 Author: Arul Ajmani <arulajmani@gmail.com> Date: Thu Jul 25 20:11:41 2019 +0000 [DELTA-OSS-EXTERNAL] Update analysisException to take all parameters AnalysisException has AnalysisException takes line, startPosition, and cause as constructor parameters in addition to message and plan -- this change updates analysisException to accept those as well. Closes delta-io/delta#102 Author: Arul Ajmani <arulajmani@gmail.com> #5885 is resolved by mukulmurthy/zw6c4nhx. GitOrigin-RevId: bb66861c1400f5e06afbb59877233190688e10e8 commit 73f4a64284f7e971be4a5238aa4b3f310360df85 Author: Rahul Mahadev <rahul.mahadev@databricks.com> Date: Thu Jul 25 03:39:31 2019 +0000 [SC-19632][DELTA] - VACUUM Scala API ## What changes were proposed in this pull request? Users can Vacuum a given DeltaTable with a given retention period. Vacuum would Recursively delete files and directories in the table that are not needed by the table for maintaining older versions up to the given retention threshold. Note: Vacuum would disable the ability to time travel beyond the retention period. Sample usage : deltaTable = new DeltaTable(spark.table(tableName)) deltaTable.vacuum(retentionHours = 13) deltaTable.vacuum() ## How was this patch tested? - DeltaVacuumSuite to test the functionality of Vacuum. Closes delta-io/delta#95 ORIGINAL_AUTHOR=rahulsmahadev rahul.mahadev@databricks.com Author: Rahul Mahadev <rahul.mahadev@databricks.com> #5741 is resolved by rahulsmahadev/vacuum. GitOrigin-RevId: 1fe58f3f8877fd82a3770e7d12e282ab0241b89a commit 39070b448e8d9d1c2c0b60a71c6d36b34bceb045 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Thu Jul 25 12:21:43 2019 -0700 Fix GitOrigin-RevId GitOrigin-RevId: ec7fecac856b4c2033f288cae787f0bb7f6b3149 commit 526982101e2aeade3e9e98968312403d42bbdc12 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Fri Jul 19 06:46:46 2019 +0000 Merge Scala API for DeltaTable This PR is for #42, after the change, delta table has the ability to merge some source table/query with optional condition and update/insert rules. This PR has been tested by MergeIntoScalaSuite.scala Closes delta-io/delta#96 Author: Zhitong Yan <zhitong.yan@databricks.com> Author: Tathagata Das <tathagata.das1565@gmail.com> GitOrigin-RevId: 7d5b61738546c63460d717db24720843718fd3d5 commit 824989c61458856e771770322e95e055e6ff09f9 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Fri Jul 19 01:03:49 2019 +0000 [SC-19637] [Delta] Add nested data support to Update Scala/Java APIs in OSS ## What changes were proposed in this pull request? Supporting nested data in Update required explicitly resolving dotted column names in the analysis phase. This was done by DBR's Analyzer which explicitly handled `UpdateTable` logical plan. But in OSS, Apache Spark's Analyzer does not do this. Hence we need to explicitly resolve nested columns in OSS and throw errors if they dont resolve. Here are the changes - Moved all the resolution code from DBR's analyzer to UpdateTable so that it can be invoked directly. - Updated DeltaTableOperations.executeUpdate to explicitly resolve expressions to extract the nested name parts. - To resolve references of expressions from outside the Analyzer (without writing a rule or invoking ResolveReferences rule directly), I stuck each expression in a fake LogicalPlan and invoked the Analyzer to resolve it. This keeps the dependency on internal APIs at a minimum. ## How was this patch tested? Moved nested data tests from UpdateScalaSuite to UpdateSuiteBase to ensure they run in OSS. Author: Tathagata Das <tathagata.das1565@gmail.com> #5794 is resolved by tdas/SC-19637. GitOrigin-RevId: 2bc7adf42a1862200af32bb3de7818690983528a commit f6e28a34e247c3cdff5a9eb6a137eed25732eb19 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Tue Jul 16 22:30:20 2019 +0000 [DELTA-OSS-EXTERNAL] [DOCS] Added scripts to generate api docs and API stability annotations - Used Spark annotations to annotate public APIs regarding their stability - Copied Spark scripts to patch the generated docs files to make the annotations visible - Additional `:: Evolving ::` is necessary for these scripts to dynamically add badges for the annotation. Without this snippet + patching, generated Java docs do not show annotations. - The `generate_api_docs()` script does the following. 1. Generates Scala and Java docs with `sbt unidoc` 1. Patches the docs' html, js, and css to dynamically inject badges 1. Copies the docs to a standard location `docs/_site/api/` which will be useful for publishing the API docs in the Delta docs. - Moved public methods in DeltaTableOperations into DeltaTable because Java docs generated by unidoc incorrectly handles inherited methods in some cases. In this case, it was showing `delete` as static methods. **Java docs** ![image](https://user-images.githubusercontent.com/663212/61239363-8fb86180-a6f3-11e9-866d-0a2852f6be74.png) **Scala docs** ![image](https://user-images.githubusercontent.com/663212/61254509-893ce080-a719-11e9-98ff-c8239cc2ca74.png) Closes delta-io/delta#93 Author: Tathagata Das <tdas@databricks.com> Author: Tathagata Das <tathagata.das1565@gmail.com> #5779 is resolved by tdas/myt2d8a3. GitOrigin-RevId: a433a0ffdab0eeb3471f4a4c8be66316bd5d828e commit 5bf990d1a4ce4d429212118fa2d6008d982820bf Author: ZhitongDB <zhitong.yan@databricks.com> Date: Mon Jul 15 18:32:51 2019 +0000 [SC-19217] Add Update Scala API to DBR + OSS ## What changes were proposed in this pull request? In this PR, added the update operations for DeltaTable. After the change, DeltaTable can perform a update operation based on the condition specified. ## How was this patch tested? by `UpdateScalaSuite` Closes delta-io/delta#86 Author: Zhitong Yan <zhitong.yan@databricks.com> Author: Tathagata Das <tathagata.das1565@gmail.com> #5458 is resolved by ZhitongDB/update-scala-api. GitOrigin-RevId: 54517b5b64a2a6d14185d8109cd21d71a095f230 commit 3f8e7541e894dcb06b8a3b315ea2617c9e977925 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Thu Jul 11 23:27:36 2019 +0000 [SC-19619]Remove unused codes and change the default Scala version to 2.12 - Remove unused codes - Change the default Scala version to 2.12 Author: Shixiong Zhu <zsxwing@gmail.com> GitOrigin-RevId: 8a31c6df41417c6fff84d8e4026d2e821532b9c2 commit f5b84b6ddaa1bbfd6f51bf1e176da6d44dcd1905 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Thu Jul 11 01:38:48 2019 +0000 [DELTA-OSS-EXTERNAL] Added ability to generate scala and java docs Added sbt-unidoc to generate Scala and Java docs. To generate both docs, just run `build/sbt unidoc` - Any classes not directly in `io.delta` will be ignored from the docs. - Unidoc will be generated when testing so that we can verify that docs are never broken. The overhead of generating the docs is just a few seconds, so does not add much to test times. Closes delta-io/delta#88 Author: Tathagata Das <tathagata.das1565@gmail.com> #5735 is resolved by tdas/5jt6u51s. GitOrigin-RevId: ebf7aff73e50cb762be74fd78d9adc5825ab14cf commit 635520855284a887747c6c9668454d25676fa97d Author: Jose Torres <joseph.torres@databricks.com> Date: Wed Jul 3 11:12:36 2019 -0700 [SC-15200][DELTA] Make SaveAsTable create the table with an empty partitioned dataframe. ## What changes were proposed in this pull request? There's a special case to skip over making a commit if no data files were written. This bug indicates we don't want that special case. ## How was this patch tested? new unit test Closes #5634 from jose-torres/fixempty. Authored-by: Jose Torres <joseph.torres@databricks.com> Signed-off-by: Jose Torres <joseph.torres@databricks.com> GitOrigin-RevId: 4e0cef36df7138e7aa032edc50ecd3b443a08519 commit f22a7e2eed5249ad8841bdb6bc0087aa40399e95 Author: liwensun <liwen.sun@databricks.com> Date: Wed Jul 3 05:27:34 2019 +0000 [SC-19357][DELTA][WARMFIX]Fix performance regression on small tables PR #5500 is a bug fix regarding DeltaLog cache polluting spark sessions, but it introduced a regression on small table optimization for Delta. This PR fixes this regression by caching the collected rows instead of having to compute them repeatedly. existing tests Author: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: da298b2ee21c1727998b34cf97a7d005384c47d4 commit 729ccb552184acda4095c17e0ebcaebc0bb1d83b Author: Rahul Mahadev <rahul.mahadev@databricks.com> Date: Tue Jul 2 15:33:36 2019 -0700 Minor change Lead-authored-by: Rahul Mahadev <rahul.mahadev@databricks.com> Co-authored-by: Burak Yavuz <burak@databricks.com> Signed-off-by: Rahul Mahadev <rahul.mahadev@databricks.com> GitOrigin-RevId: d0b4eb75fa10727293dfca93d37bb0c41465206a commit 5483d5475d62d6b44e6c33a5afa10be211505284 Author: Zhitong Yan <zhitong.yan@databricks.com> Date: Tue Jul 2 11:29:42 2019 -0700 [SC-18879] Add Delete Scala API to DBR + OSS In this PR, added the delete operations for DeltaTable. After the change, DeltaTable can perform a delete operation based on the condition specified. By `DeleteScalaSuite` Closes delta-io/delta#75 Closes #5391 from ZhitongDB/delete-scala-api. Lead-authored-by: Zhitong Yan <zhitong.yan@databricks.com> Co-authored-by: Tathagata Das <tathagata.das1565@gmail.com> Co-authored-by: ZhitongDB <50844714+ZhitongDB@users.noreply.github.com> Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> GitOrigin-RevId: f692e0020a8a6d279a25d4a5106fb19ee105ec49 commit 07f53f4296b1154adda0b1022fa81bf828db9aa0 Author: Zhitong Yan <zhitong.yan@databricks.com> Date: Sun Jun 30 17:21:55 2019 -0700 [DELTA-OSS-EXTERNAL] Fix DeltaTable forPath to not accept non-delta table paths In this PR, I added exception handling for `forPath` method in `DeltaTable.scala`. After the change, Delta Table will make sure create instance just for "Delta Source", other sources will throw an exception. This PR is tested by `DeltaTableSuite.scala` Closes delta-io/delta#79 Closes #5599 from tdas/qy70ntky. Lead-authored-by: Zhitong Yan <zhitong.yan@databricks.com> Co-authored-by: Tathagata Das <tdas@databricks.com> Co-authored-by: ZhitongDB <50844714+ZhitongDB@users.noreply.github.com> Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> GitOrigin-RevId: eb3a014714aacc1b97d165089e610186b1bb9f83 commit ab260c9a7ee912357a5516447001442863d4c4e2 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Fri Jun 28 03:38:41 2019 +0000 [DELTA-OSS-EXTERNAL] Add a system property for the DeltaLog cache size Add "delta.log.cacheSize" system property for the DeltaLog cache size, and set it to 10 in tests to reduce the memory footprint. Closes delta-io/delta#80 GitOrigin-RevId: cf9e0560df0cdc08ff7c1978963ff9584278d050 Author: Shixiong Zhu <zsxwing@gmail.com> #5595 is resolved by tdas/fa38kpiz. GitOrigin-RevId: 2c19cd19e491501fd42efff200e9d2b5ddcb9c22 commit adaee91212233ae5391b9bf4e069e680d1fae4da Author: Zhitong Yan <zhitong.yan@databricks.com> Date: Thu Jun 27 20:15:57 2019 +0000 [DELTA-OSS] Fix filterFiles() in OptimisticTransaction ## What changes were proposed in this pull request? In this PR, changed the implementation of `filterFiles` method in `OptimisticTransaction.scala`. Original one may return empty matched file list based on some prediction, if there is some partition specified in delta table. ## How was this patch tested? By `OptimisticTransactionSuite.scala` zhitong.yan@databricks.com Author: Zhitong Yan <zhitong.yan@databricks.com> #5534 is resolved by ZhitongDB/filter-files. GitOrigin-RevId: cac5cb2d198413f10305666eb1a6b66c490adb7e commit ffa2575f2596c0fec83b81076ec99aeff6f62671 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Wed Jun 26 23:17:43 2019 +0000 [SC-19257] Don't use DataSourceOptions in Delta ## What changes were proposed in this pull request? Apache Spark has removed DataSourceOptions in master and it will not be available in Spark 3.0. See https://github.com/apache/spark/commit/2a80a4cd39c7bcee44b6f6432769ca9fdba137e4 We should avoid using it in Delta. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxwing@gmail.com> #5586 is resolved by zsxwing/SC-19257. GitOrigin-RevId: c4fe4d18d5029fffb2ad725ef8b8e9cd725b8bda commit 911267a0c6e06a81ea22b1f9d160eed9a02a4a74 Author: Burak Yavuz <brkyvz@gmail.com> Date: Tue Jun 25 23:55:29 2019 -0700 [SC-19019] Remove another unused configuration ## What changes were proposed in this pull request? Found another unused configuration... ## How was this patch tested? YOLO Closes #5564 from brkyvz/turnOnMQO2. Authored-by: Burak Yavuz <brkyvz@gmail.com> Signed-off-by: Burak Yavuz <brkyvz@gmail.com> GitOrigin-RevId: d189c033f4d52d9f8269fb0a47b672b380737fcb commit 5569557a8cce61f1bdaba84b76c91b835159ee53 Author: Jose Torres <joseph.torres@databricks.com> Date: Wed Jun 26 05:38:45 2019 +0000 [SC-18753] Minor code cleanup GitOrigin-RevId: 4e0463d89d3af39b150b76f02f6d83e68bd1dc5b commit a1e4ff82eb045cc7a4ffe4d9e88a378a19c769e9 Author: liwensun <liwen.sun@databricks.com> Date: Tue Jun 25 17:52:07 2019 -0700 [SC-14260][DELTA] Stop caching DFs in DeltaLog to prevent spark session pollution ## What changes were proposed in this pull request? TL;DR: This PR changes the Dataframe fields `state` and `withStats` from ` val` to `def`. This is to prevent these DFs, when cached as part of a `DeltaLog` instance, from polluting the active spark session. When these cached DFs are executed from a different session, the original session in which these DFs were created will become the active session (because Spark just sets the current DF's session as active session). This pollutes session-specific configs and libraries. By making these DFs a method call instead of a cacheable field, they will be created using the current active session every time they are called. Hopefully this change won't degrade the performance too much because these DFs can still be evaluated from underlying RDD cache instead of being computed from scratch. ## How was this patch tested? A regression test to make sure DeltaLog operations no long changes active sessions unexpectedly. Authored-by: liwensun <liwen.sun@databricks.com> Signed-off-by: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: 406ae31b224f276921c9952b044866d6ee48406c commit 02fa7d94edf6181b7bfcb698896a252d866be405 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Tue Jun 25 22:01:07 2019 +0000 [SC-19030] Add a new API to invalid DeltaLog for a path Add a new API to invalid DeltaLog for a path. Jenkins Author: Shixiong Zhu <zsxwing@gmail.com> GitOrigin-RevId: 9f34f1ab9fb81d9aa78bcc110f87b3c54a2099be commit e7efebc1cdc1e5e056bd1c6989b16f2660af7147 Author: Maxim Gekk <maxim.gekk@databricks.com> Date: Fri Jun 21 10:36:17 2019 +0000 [SC-18743] Minor code cleanup GitOrigin-RevId: 969d5fd29b01e916e2367ee0a59f48ec9122c1dc commit bbc7c981deaaf3a3171279177ef82384ea172886 Author: Liwen Sun <36902243+liwensunusers.noreply.github.com> Date: Thu Jun 20 13:44:21 2019 -0700 [DELTA-OSS-EXTERNAL] Update readme for 0.2.0 - update the latest version to 0.2.0 - point storage and concurrency control to docs. Closes delta-io/delta#74 Closes #5496 from liwensun/092oqefd. Authored-by: Liwen Sun <36902243+liwensun@users.noreply.github.com> Signed-off-by: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: a1ed89c626d374cd0c353d054754f0372969a302 commit 6b81231cecbedded552e5ab542fbcd358f8caf46 Author: Zhitong Yan <zhitong.yan@databricks.com> Date: Thu Jun 20 20:41:24 2019 +0000 Export delta table to oss ## What changes were proposed in this pull request? added support for export DeltaTable.scala to OSS. ## How was this patch tested? By DeltaTableSuite.scala zhitong.yan@databricks.com Author: Zhitong Yan <zhitong.yan@databricks.com> #5482 is resolved by ZhitongDB/export-delta-table-to-OSS. GitOrigin-RevId: bbad3c44d0e36707c660d29ee4d9e413e68eb988 commit f88fc36669831d29c6e860042b301d2782810e69 Author: liwensun <liwen.sun@databricks.com> Date: Tue Jun 18 11:34:37 2019 -0700 Setting version to 0.2.1-SNAPSHOT commit ae3daa85be7cfb574a83f8d73eb10920243e4014 Author: liwensun <liwen.sun@databricks.com> Date: Tue Jun 18 11:33:18 2019 -0700 Setting version to 0.2.0 commit d85727026506649e3f2716d46f72d1eeb2089acb Author: liwensun <liwen.sun@databricks.com> Date: Mon Jun 17 14:03:43 2019 -0700 [DELTA-OSS] Small edits on readme ## What changes were proposed in this pull request? small edits on readme ## How was this patch tested? existing tests. Closes #5455 from liwensun/readme-edits-0.2.0. Authored-by: liwensun <liwen.sun@databricks.com> Signed-off-by: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: 4fccf5635133264662b632919e6f87cdbae7c54f commit 973e34a15a59d693a146a916343cb60fccf46507 Author: Liwen Sun <36902243+liwensunusers.noreply.github.com> Date: Mon Jun 17 11:53:15 2019 -0700 [DELTA-OSS-EXTERNAL] Update Readme for changes in 0.2.0 a.t.t. Closes delta-io/delta#67 Closes #5448 from liwensun/k2rge5ic. Lead-authored-by: liwensun <liwen.sun@databricks.com> Co-authored-by: Liwen Sun <36902243+liwensun@users.noreply.github.com> Signed-off-by: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: 44b45f5ae869be13ba1fb2ddef67aecbb4adcc50 commit bf6efb6bc1f228f893333b0b84f79fc17ab496b2 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Fri Jun 14 21:12:56 2019 +0000 [SC-18796]Delta checkpoint should not fail because of FileAlreadyExistsException ## What changes were proposed in this pull request? When a stage gets retried, a zombie task may still run and write the checkpoint file. This will cause the new tasks in the retried stage fail because of FileAlreadyExistsException on **S3**. Since the zombie task actually writes the same checkpoint, we can just make the new task successful if the checkpoint file exists. I also added a TaskFailureListener to delete temp files when a task fails. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxwing@gmail.com> #5401 is resolved by zsxwing/SC-18796. GitOrigin-RevId: a01ed042ab1f59c32b3e2620b2d56600420724a5 commit b6aa8c199fcbe7ef68c23513e72aa148de8c1d0e Author: Liwen Sun <36902243+liwensun@users.noreply.github.com> Date: Fri Jun 14 07:13:45 2019 +0000 [DELTA-OSS-EXTERNAL] Change the LogStore conf key name Right now the LogStore conf key is "delta.logStore.class", but this is not getting picked up because Spark requires spark conf to start with "spark.". So we change this to "spark.delta.logStore.class". Closes delta-io/delta#66 Author: Liwen Sun <36902243+liwensun@users.noreply.github.com> #5440 is resolved by liwensun/x310tcc3. GitOrigin-RevId: 3554916081e8be86bd8e25630f285f9e3809831f commit c3805a8fed12544c8450928ce839c84b0091306a Author: liwensun <liwen.sun@databricks.com> Date: Fri Jun 14 03:02:53 2019 +0000 [SC-17326][DELTA-OSS] Allow concurrent appends In Delta OSS: - Add `checkRetry` - Allow concurrent appends New unit tests Author: liwensun <liwen.sun@databricks.com> Author: Liwen Sun <liwen.sun@databricks.com> GitOrigin-RevId: b4cd65f15402263621021667d7259295ff951f9a commit b1bdec75b9e4df249c2c6df9ab02c675d92b2795 Author: Rahul Mahadev <rahul.mahadev@databricks.com> Date: Thu Jun 13 15:27:38 2019 -0700 [SC-18596][DELTA] ZORDERing on a partition column should throw a better error message ## What changes were proposed in this pull request? Changed the error message when Z-Ordering is done on a partitioned column. ## How was this patch tested? Unit test available with this commit. Closes #5421 from rahulsmahadev/SC-18596. Lead-authored-by: Rahul Mahadev <rahul.mahadev@databricks.com> Co-authored-by: rahulsmahadev <51690557+rahulsmahadev@users.noreply.github.com> Signed-off-by: Rahul Mahadev <rahul.mahadev@databricks.com> GitOrigin-RevId: d0f9b6913c1338396b666b19aa0cec65b22a0e8d commit b3b3ccf65eff6ccac6f5837192b46d4e31103d3e Author: Naoki Takezoe <takezoe@gmail.com> Date: Thu Jun 13 06:20:22 2019 +0000 [DELTA-OSS-EXTERNAL] Minor code enhancements Includes following fixes: - Use StandardCharsets.UTF_8 instead of “utf-8” - Remove unnecessary return statement - Fix typo in error message - Fix warnings Closes delta-io/delta#60 Author: Naoki Takezoe <takezoe@gmail.com> #5426 is resolved by zsxwing/rlcw8og1. GitOrigin-RevId: 20e1f065e38c1e73e621221a738e593ab87bfe01 commit 2d77fca9164dc01fcdd0b7704625de683f62a2a1 Author: feiwang <hzfeiwang@163.com> Date: Wed Jun 12 14:13:26 2019 -0700 [DELTA-OSS-EXTERNAL] Fix code style for the naming of variables. The naming of `checkpointMetaData` and `checkpointMetadata` is confusing. Closes delta-io/delta#64 Closes #5422 from zsxwing/2s78063f. Authored-by: feiwang <hzfeiwang@163.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com> GitOrigin-RevId: 3d23e386387ab56391242630c477d9110c32521f commit 352eaec8b203c4b1186cb8b887fadd1efd806f57 Author: liwensun <liwen.sun@databricks.com> Date: Mon Jun 10 16:57:55 2019 -0700 [SC-18394][DELTA] Remove HDFSLogStore and other LogStore renames - Rename `FileSystemLogStore` to `HadoopFileSystemLogStore` - Rename `S3LogStore` to `S3SingleDriverLogStore` - Add a new `LocalLogStore` and make it the default LogStore implementation for DBR, while `HDFSLogStore` will be the default for OSS. Existing tests. Author: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: 0ff96c6015a50b30855251b408fc331c15c05cd3 commit e380e5a6149447fe2cddcc4f03d9e2e851fbad41 Author: Burak Yavuz <burak@databricks.com> Date: Mon Jun 10 12:36:02 2019 -0700 [SC-18778] Clean-up some unused configurations ## What changes were proposed in this pull request? This PR cleans up some unused configurations. ## How was this patch tested? Existing tests Closes #5399 from brkyvz/obscureConfs. Lead-authored-by: Burak Yavuz <burak@databricks.com> Co-authored-by: Burak Yavuz <brkyvz@gmail.com> Signed-off-by: Burak Yavuz <brkyvz@gmail.com> GitOrigin-RevId: 63071a2c77a23e789f4944121a91b8f5c7d33d50 commit 7af3bf57477ffdfe70214f379a5f204915d21f74 Author: Burak Yavuz <brkyvz@gmail.com> Date: Fri Jun 7 11:05:45 2019 -0700 [SC-18732] Remove unused configs from DeltaSQLConf Clean up some unused configurations from DeltaSQLConf. Build Closes #5387 from brkyvz/aoRemove. Authored-by: Burak Yavuz <brkyvz@gmail.com> Signed-off-by: Burak Yavuz <brkyvz@gmail.com> GitOrigin-RevId: 428bef6a9485b1547aeba1eaace61cf204bf6369 commit f1c39c362f8bc0032cf9934dc045917ed5874c82 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Thu Jun 6 22:01:19 2019 +0000 [SC-18233][Delta] Fix logStore class config to not have "databricks" The log store class config is "spark.databricks.tahoe.logStore.class" which was not fixed for Delta Lake. So this PR changes the config to "delta.logStore.class". Existing unit tests and new unit tests. Author: Tathagata Das <tathagata.das1565@gmail.com> GitOrigin-RevId: bd2903849c01eb614848142635de93176636e8f5 commit db90371105daae2bdc6dc50dcabfa0fa4ed4c69d Author: liwensun <liwen.sun@databricks.com> Date: Tue Jun 4 13:55:39 2019 -0700 [SC-18267][DELTA] Cleanup import in LogStore suite as the title existing tests. Closes #5352 from liwensun/sc18267-mixin. Authored-by: liwensun <liwen.sun@databricks.com> Signed-off-by: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: f7a608b61073c15f2275fd7e8ca1a32032d1f6bc commit 27d86b08384c0eafb49099314669b2561b4cee3c Author: Shixiong Zhu <zsxwing@gmail.com> Date: Fri May 31 20:49:13 2019 +0000 [DELTA-OSS]Display test duration for Delta OSS tests ## What changes were proposed in this pull request? The whole build right now takes about 20 minutes. It's better to display the test duration so that we can find out which tests take too long. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxwing@gmail.com> #5326 is resolved by zsxwing/show-test-duration. GitOrigin-RevId: 42c65974a708cd3d3e221e2d698c78e4d11f06b0 commit c8169bd106e1a509d4138c5d1655457aa11179c5 Author: liwensun <liwen.sun@databricks.com> Date: Wed May 29 23:36:12 2019 +0000 [SC-18034][DELTA-OSS] S3 Support Add a S3 LogStore implementation. new unit tests Author: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: 5071e09398fd7237d4ec3de4d3ed80103ec0371f commit ae4aa3c8450f548ddecb36f54f084a16b59c727e Author: liwensun <liwen.sun@databricks.com> Date: Wed May 29 00:19:55 2019 +0000 [SC-18133][DELTA] Add a `isPartialWriteVisible` interface in LogStore The writing of checkpoint files doesn't go through log store, so we add an interface `isPartialWriteVisible` to `LogStore` to let the out-of-band writers know whether to use rename or not. This is a temporary solution - ultimately it would be good to encapsulate this information within log store. Add a simple end to end test for different log store implementations. Author: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: 5bf04c818d702ea24d28bed981f838d164a8d373 commit 43a7a7a1304a00cefdfc7dfcf1c98077ee2e54ef Author: Cheng Lian <lian@databricks.com> Date: Tue May 28 22:35:49 2019 +0000 [SPARK-24601][SPARK-27051][CHERRY-PICK] Update Jackson to 2.9.8 Existing tests. Author: Cheng Lian <lian@databricks.com> GitOrigin-RevId: 4b9428c7939dabd12f3380ef70f090eebe32e3e2 commit 95e4d0d8c9be5e7494d16b52dc3664e00a5de93b Author: Kaushal Prajapati <kaushal.prajapati3@gmail.com> Date: Tue May 28 19:35:27 2019 +0000 [DELTA-OSS-EXTERNAL] Delta conf fix Closes delta-io/delta#26 Author: Kaushal Prajapati <kaushal.prajapati3@gmail.com> #5288 is resolved by mukulmurthy/cktxuu1w. GitOrigin-RevId: 79c7e9018710fcd63a8a00dbe8d3c7272306cdf4 commit 6c34e7c49422a9ee21adf9f6b3811c3b9f8bf34b Author: Naoki Takezoe <takezoe@gmail.com> Date: Tue May 28 19:23:49 2019 +0000 [DELTA-OSS-EXTERNAL] Use Set.empty instead of Set() in DelayedCommitProtocol Closes delta-io/delta#56 Author: Naoki Takezoe <takezoe@gmail.com> #5289 is resolved by mukulmurthy/bx50c6g3. GitOrigin-RevId: 7f7068bbb417f2f11706fedc5f5d983c24bfc9f9 commit fcbe2e7fe002a5285a2a68fbf06390567c033dbc Author: Shixiong Zhu <zsxwing@gmail.com> Date: Tue May 28 07:13:56 2019 +0000 [SC-17993]Fix an issue when re-adding the same file If deleting a file and re-adding it back, a `Snapshot` may contain both `AddFile` and `RemoveFile` for this path. If this `Snapshot` is used to build a new `Snapshot`, as `input_file_name` for AddFile and `RemoveFile` is `null` and the order of them is not stable, InMemoryLogReplay may just delete this new added file if its `RemoveFile` is after `AddFile`. This PR updates InMemoryLogReplay to remove the tombstone when a file is added. This will ensure `InMemoryLogReplay` always output only one `FileAction` (either `AddFile` or `RemoveFile`) for each path. This PR also removes unused `stateSize` and `hadoopConf` from `InMemoryLogReplay`. The new unit test. Author: Shixiong Zhu <zsxwing@gmail.com> GitOrigin-RevId: 7c3e96c2d344d7c08383019ac2bbfda22f0f6119 commit d29a0e1a797be2b6911fc26cea92649b4ed11424 Author: Liang-Chi Hsieh <viirya@gmail.com> Date: Thu May 23 22:43:42 2019 +0000 [DELTA-OSS-EXTERNAL] Fix wrong data when recording event in protocolWrite In `protocolWrite`, it records event with wrong data `minReaderVersion`, currently. This patch goes to fix that. This also fixes few other typos and style. Closes delta-io/delta#45 Author: Liang-Chi Hsieh <viirya@gmail.com> #5262 is resolved by mukulmurthy/j98bt8j6. GitOrigin-RevId: 1d996b3aaed93a5657f811c90f032312e14411f0 commit e20843875da4ac35483c8261e88f8819158621b6 Author: Naoki Takezoe <takezoe@gmail.com> Date: Thu May 23 22:23:36 2019 +0000 [DELTA-OSS-EXTERNAL] Fix typo in an argument name of OptimisticTransactionImpl Closes delta-io/delta#55 Author: Naoki Takezoe <takezoe@gmail.com> #5261 is resolved by mukulmurthy/60bvtd9h. GitOrigin-RevId: 47b33a8841d10022985be3462866c04090cd8525 commit 35d0e0039b03351d00a3b672b96eef80c17105bc Author: runzhliu <runzhliu163.com> Date: Thu May 23 15:09:25 2019 -0700 [DELTA-OSS-EXTERNAL] remove useless imports Obviously, there is currently no check for code styles including useless import, now I remove all of the useless imports. Closes delta-io/delta#12 Closes #5260 from mukulmurthy/6020wsjz. Lead-authored-by: runzhliu <runzhliu@163.com> Co-authored-by: Mukul Murthy <mukul.murthy@databricks.com> Co-authored-by: Mukul Murthy <38224594+mukulmurthy@users.noreply.github.com> Signed-off-by: Mukul Murthy <mukul.murthy@databricks.com> GitOrigin-RevId: af35df3c7e881336641b182743c92a2aedfb1d6e commit eed0dcc3695befbee060017140435e2e79f59190 Author: Shixiong Zhu <zsxwing@gmail.com> Date: Wed May 22 20:19:53 2019 +0000 [SC-17875]Parsing interval Delta config should be case-insensitive ## What changes were proposed in this pull request? Some Delta configs accept an interval string. However, they only accept lower case. This is inconvenient. This PR forks `CalendarInterval.fromCaseInsensitiveString` from https://github.com/apache/spark/pull/24619 and uses it to parse the interval string to support upper case. It also improves the error message when the input string is an invalid interval (Right now it just throws NPE). ## How was this patch tested? New unit test. Author: Shixiong Zhu <zsxwing@gmail.com> #5213 is resolved by zsxwing/SC-17875. GitOrigin-RevId: d4d9cc0c2684096812be5697d1c093f4b000ce51 commit a98550011591d78d55bd6e003ce455b9a9f7a232 Author: liwensun <liwen.sun@databricks.com> Date: Wed May 22 00:14:47 2019 +0000 [SC-18029][DELTA] Azure support This PR adds Azure support for Delta LogStore: - Create a class `FileSystemLogStore` with a default implementation for any underlying storage that implements hadoop FileSystem APIs (e.g., Azure and S3) - Create a `AzureLogStore` that extends `FileSystemLogStore` and relies on Azure's atomic rename. We will make a S3 implementation that extends `FileSystemLogStore` in a separate PR. Checkpoints should also use rename for AzureLogStore. We will do that in a follow up PR. New unit tests on basic operations. Author: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: a800eacf77ea7199b259d232e2040a9ee93ffd71 commit c28f7ce4c308ca8f0f9ad89e336da41e8d770f64 Author: Joe Ellis <ellis125gmail.com> Date: Tue May 21 02:58:11 2019 -0700 [DELTA-OSS-EXTERNAL] Fix spelling Closes delta-io/delta#19 Closes #5235 from tdas/yaviiaq9. Authored-by: Joe Ellis <ellis125@gmail.com> Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> GitOrigin-RevId: 040df282385745dcbdaca50df6edac4c1b4fa4f7 commit 81601a1b025c4cc925d8e3b043fd8a046d59d6fc Author: Naoki Takezoe <takezoegmail.com> Date: Tue May 21 00:47:50 2019 -0700 [DELTA-OSS-EXTERNAL] Fix Scaladoc of DeltaLogging.scala DeltaLogging's Scaladoc says that underneath it uses `com.databricks.spark.util.UsageLogging`, but I guess it's `com.databricks.spark.util.DatabricksLogging`. Closes delta-io/delta#25 Closes #5230 from tdas/hdnle5x7. Authored-by: Naoki Takezoe <takezoe@gmail.com> Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> GitOrigin-RevId: 6cf2878e70baf7a4544d028dc7ee823854120499 commit e3b0b8ef4a1854414598e7f8a3e95c2af2fd98ce Author: merrily01 <maruileijd.com> Date: Tue May 21 00:39:38 2019 -0700 [DELTA-OSS-EXTERNAL] Fix scalastyle errors of DeltaLogging.scala ## What changes were proposed in this pull request? Fix scalastyle errors of DeltaLogging.scala. Change "// scalastyle:off on" to "// scalastyle:on println" to ensure the validity of grammar checks. Closes delta-io/delta#36 Closes #5225 from tdas/znzmjb67. Authored-by: merrily01 <maruilei@jd.com> Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com> GitOrigin-RevId: 7b0826f7171a7c40704957116d456d6a42b67b23 commit 015597667d42184cb0b3efe6668b38aeb815f03c Author: liwensun <liwen.sun@databricks.com> Date: Sun May 19 19:59:45 2019 +0000 [ALL TESTS][SC-17838][DELTA] Migrate EvolvabilitySuite ## What changes were proposed in this pull request? Migrate EvolvabilitySuite ## How was this patch tested? New tests for OSS. Author: liwensun <liwen.sun@databricks.com> #5167 is resolved by liwensun/sc17838-evolvability-suite. GitOrigin-RevId: c2db307ef36041f53aba681559c21974f98c844e commit 22fe83946aae84b45916012cc7203720b5f6f131 Author: liwensun <liwen.sun@databricks.com> Date: Sat May 18 21:27:21 2019 +0000 [SC-17690][DELTA] Migrate DeltaSuite to OSS Port DeltaSuite to OSS Moving around existing tests Author: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: 72cb23c9e308e4cd79c1b5f7620cf5812afc85df commit 97f132083b1442715fd33867619653dcf3a4abbe Author: Shixiong Zhu <zsxwing@gmail.com> Date: Fri May 17 23:02:12 2019 +0000 [DELTA-OSS]Remove PGP plugin for OSS Delta ## What changes were proposed in this pull request? We enabled automatic content signing to simply the release process. The PGP plugin is not needed any more, so just remove it. I also removed `bintrayReleaseOnPublish in ThisBuild := false`. Turning this off will make the release invisible until someone goes to Bintray and click a button. This was added basically for release testing and it's not needed now. ## How was this patch tested? Manually pushed a test release to Bintray and confirmed that Bintray signed files for us. Author: Shixiong Zhu <zsxwing@gmail.com> #5203 is resolved by zsxwing/remove-pgp-plugin. GitOrigin-RevId: a2428c2bd35bf119c0310625734c539384b4d0ba commit d1c55a74d0ddfbce02f783bd08e34a9acbde00bb Author: liwensun <liwen.sun@databricks.com> Date: Wed May 15 21:36:01 2019 +0000 [SC-17737][DELTA] DeltaRetentionSuite migration and cleanup ## What changes were proposed in this pull request? Migrate DeltaRetentionSuite, also some cleanup ## How was this patch tested? Existing tests. New tests for OSS. Author: liwensun <liwen.sun@databricks.com> #5165 is resolved by liwensun/sc17737-delta-retention-suite. GitOrigin-RevId: 45ed25679a9151e422e0a1b99b11db18a12be300 commit 21af841058f3aedbec93877f8385548c6cdf5289 Author: Mukul Murthy <mukul.murthy@databricks.com> Date: Wed May 15 12:44:30 2019 -0700 [SC-17535] DeltaSourceOffset should not write null json field ## What changes were proposed in this pull request? DeltaSourceOffset should not write null JSON field. An earlier change updated our serialization to always include null values meant we were writing out a `"json": null` in the offset. While this is not wrong by itself, https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala expects that the serialized field is either not present or nonnull. We fix this by making the json in DeltaSourceOffset a def instead of a val so it doesn't get serialized as null. ## How was this patch tested? New unit test Closes #5084 from mukulmurthy/bug. Authored-by: Mukul Murthy <mukul.murthy@databricks.com> Signed-off-by: Mukul Murthy <mukul.murthy@databricks.com> GitOrigin-RevId: b9455563eaf6df087e5b165be7d1f5d673a131e6 commit 60dc435ead5e4d462f7b3ae72ca700bafcc88d6d Author: Burak Yavuz <brkyvz@gmail.com> Date: Tue May 14 16:12:20 2019 -0700 [SC-17862][DELTA] Fix time travel when the partitioning of a table changes ## What changes were proposed in this pull request? When the partitioning of a table changes with `mode("overwrite").option("overwriteSchema", true)`, we use the latest partitionSchema in the file index. This PR fixes that bug. ## How was this patch tested? Unit test Closes #5184 from brkyvz/ttPartition. Authored-by: Burak Yavuz <brkyvz@gmail.com> Signed-off-by: Burak Yavuz <brkyvz@gmail.com> GitOrigin-RevId: 4d10e57dd61b4fbb1a62d03be302fc404390d385 commit 1977fd2d4b35a9c5aab6c51c38db1e5e3830bb01 Author: liwensun <liwen.sun@databricks.com> Date: Mon May 13 07:17:53 2019 +0000 [SC-17689][DELTA] DeltaLogSuite migration and cleanup This PR cleans up and migrates DeltaLogSuite. Existing tests. N/A Author: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: 9cda2cd5e50f98cb3547a3e93dc777c6f790fbd2 commit bcf4d92bb1668f42dff85da2ab1456bd38f45b13 Author: liwensun <liwen.sun@databricks.com> Date: Sun May 12 07:15:32 2019 +0000 [SC-17804][DELTA] Re-enable OSS tests Existing tests Author: liwensun <liwen.sun@databricks.com> GitOrigin-RevId: 38aaec6880cbe00cbd0438268968f02031bafb1d commit 6febef767b960d400f7200bf64f37c1abac4101e Author: Mukul Murthy <mukul.murthy@databricks.com> Date: Thu May 9 21:30:16 2019 -0700 [SC-17683][DELTA] Fix configs Lead-authored-by: Mukul Murthy <mukul.murthy@databricks.com> Co-authored-by: Adrian Ionescu <adrian@databricks.com> Signed-off-by: Mukul Murthy <mukul.murthy@databricks.com> GitOrigin-RevId: 06f2a3a1dddfcc345b5e591368e7346573698c6e commit 0e9ff7030a259b33ff02959e7434fdda3c459258 Author: Tathagata Das <tathagata.das1565@gmail.com> Date: Fri May 10 00:15:16 2019 +0000 [SC-17706] Remove config Author: Tathagata Das <tathagata.das1565@gmail.com> GitOrigin-RevId: 1ee42d89d826bec32cf4f12040201c11e…
# This is the 1st commit message: flush # This is the commit message delta-io#2: flush # This is the commit message delta-io#3: First sane version without isRowDeleted # This is the commit message delta-io#4: Hack RowIndexMarkingFilters # This is the commit message delta-io#5: Add support for non-vectorized readers # This is the commit message delta-io#6: Metadata column fix # This is the commit message delta-io#7: Avoid non-deterministic UDF to filter deleted rows # This is the commit message delta-io#8: metadata with Expression ID # This is the commit message delta-io#9: Fix complex views issue # This is the commit message delta-io#10: Tests # This is the commit message delta-io#11: cleaning # This is the commit message delta-io#12: More tests and fixes
# This is the 1st commit message: flush # This is the commit message delta-io#2: flush # This is the commit message delta-io#3: First sane version without isRowDeleted # This is the commit message delta-io#4: Hack RowIndexMarkingFilters # This is the commit message delta-io#5: Add support for non-vectorized readers # This is the commit message delta-io#6: Metadata column fix # This is the commit message delta-io#7: Avoid non-deterministic UDF to filter deleted rows # This is the commit message delta-io#8: metadata with Expression ID # This is the commit message delta-io#9: Fix complex views issue # This is the commit message delta-io#10: Tests # This is the commit message delta-io#11: cleaning # This is the commit message delta-io#12: More tests and fixes # This is the commit message delta-io#13: Partial cleaning # This is the commit message delta-io#14: cleaning and improvements # This is the commit message delta-io#15: cleaning and improvements # This is the commit message delta-io#16: Clean RowIndexFilter
# This is the 1st commit message: flush # This is the commit message delta-io#2: flush # This is the commit message delta-io#3: First sane version without isRowDeleted # This is the commit message delta-io#4: Hack RowIndexMarkingFilters # This is the commit message delta-io#5: Add support for non-vectorized readers # This is the commit message delta-io#6: Metadata column fix # This is the commit message delta-io#7: Avoid non-deterministic UDF to filter deleted rows # This is the commit message delta-io#8: metadata with Expression ID # This is the commit message delta-io#9: Fix complex views issue # This is the commit message delta-io#10: Tests # This is the commit message delta-io#11: cleaning # This is the commit message delta-io#12: More tests and fixes
# This is the 1st commit message: flush # This is the commit message delta-io#2: flush # This is the commit message delta-io#3: First sane version without isRowDeleted # This is the commit message delta-io#4: Hack RowIndexMarkingFilters # This is the commit message delta-io#5: Add support for non-vectorized readers # This is the commit message delta-io#6: Metadata column fix # This is the commit message delta-io#7: Avoid non-deterministic UDF to filter deleted rows # This is the commit message delta-io#8: metadata with Expression ID # This is the commit message delta-io#9: Fix complex views issue # This is the commit message delta-io#10: Tests # This is the commit message delta-io#11: cleaning # This is the commit message delta-io#12: More tests and fixes # This is the commit message delta-io#13: Partial cleaning # This is the commit message delta-io#14: cleaning and improvements # This is the commit message delta-io#15: cleaning and improvements # This is the commit message delta-io#16: Clean RowIndexFilter
Obviously, there is currently no check for code styles including useless import, now I remove all of the useless imports.