Skip to content

Commit

Permalink
[SPARK-13713][SQL] Migrate parser from ANTLR3 to ANTLR4
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
The current ANTLR3 parser is quite complex to maintain and suffers from code blow-ups. This PR introduces a new parser that is based on ANTLR4.

This parser is based on the [Presto's SQL parser](https://github.com/facebook/presto/blob/master/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4). The current implementation can parse and create Catalyst and SQL plans. Large parts of the HiveQl DDL and some of the DML functionality is currently missing, the plan is to add this in follow-up PRs.

This PR is a work in progress, and work needs to be done in the following area's:

- [x] Error handling should be improved.
- [x] Documentation should be improved.
- [x] Multi-Insert needs to be tested.
- [ ] Naming and package locations.

### How was this patch tested?

Catalyst and SQL unit tests.

Author: Herman van Hovell <hvanhovell@questtec.nl>

Closes apache#11557 from hvanhovell/ngParser.
  • Loading branch information
hvanhovell authored and rxin committed Mar 28, 2016
1 parent 1528ff4 commit 600c0b6
Show file tree
Hide file tree
Showing 29 changed files with 4,127 additions and 66 deletions.
1 change: 1 addition & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(BSD 3 Clause) netlib core (com.github.fommil.netlib:core:1.1.2 - https://github.com/fommil/netlib-java/core)
(BSD 3 Clause) JPMML-Model (org.jpmml:pmml-model:1.2.7 - https://github.com/jpmml/jpmml-model)
(BSD License) AntLR Parser Generator (antlr:antlr:2.7.7 - http://www.antlr.org/)
(BSD License) ANTLR 4.5.2-1 (org.antlr:antlr4:4.5.2-1 - http://wwww.antlr.org/)
(BSD licence) ANTLR ST4 4.0.4 (org.antlr:ST4:4.0.4 - http://www.stringtemplate.org)
(BSD licence) ANTLR StringTemplate (org.antlr:stringtemplate:3.2.1 - http://www.stringtemplate.org)
(BSD License) Javolution (javolution:javolution:5.5.1 - http://javolution.org)
Expand Down
1 change: 1 addition & 0 deletions dev/deps/spark-deps-hadoop-2.2
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ RoaringBitmap-0.5.11.jar
ST4-4.0.4.jar
activation-1.1.jar
antlr-runtime-3.5.2.jar
antlr4-runtime-4.5.2-1.jar
aopalliance-1.0.jar
apache-log4j-extras-1.2.17.jar
arpack_combined_all-0.1.jar
Expand Down
1 change: 1 addition & 0 deletions dev/deps/spark-deps-hadoop-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ RoaringBitmap-0.5.11.jar
ST4-4.0.4.jar
activation-1.1.1.jar
antlr-runtime-3.5.2.jar
antlr4-runtime-4.5.2-1.jar
aopalliance-1.0.jar
apache-log4j-extras-1.2.17.jar
arpack_combined_all-0.1.jar
Expand Down
1 change: 1 addition & 0 deletions dev/deps/spark-deps-hadoop-2.4
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ RoaringBitmap-0.5.11.jar
ST4-4.0.4.jar
activation-1.1.1.jar
antlr-runtime-3.5.2.jar
antlr4-runtime-4.5.2-1.jar
aopalliance-1.0.jar
apache-log4j-extras-1.2.17.jar
arpack_combined_all-0.1.jar
Expand Down
1 change: 1 addition & 0 deletions dev/deps/spark-deps-hadoop-2.6
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ RoaringBitmap-0.5.11.jar
ST4-4.0.4.jar
activation-1.1.1.jar
antlr-runtime-3.5.2.jar
antlr4-runtime-4.5.2-1.jar
aopalliance-1.0.jar
apache-log4j-extras-1.2.17.jar
apacheds-i18n-2.0.0-M15.jar
Expand Down
1 change: 1 addition & 0 deletions dev/deps/spark-deps-hadoop-2.7
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ RoaringBitmap-0.5.11.jar
ST4-4.0.4.jar
activation-1.1.1.jar
antlr-runtime-3.5.2.jar
antlr4-runtime-4.5.2-1.jar
aopalliance-1.0.jar
apache-log4j-extras-1.2.17.jar
apacheds-i18n-2.0.0-M15.jar
Expand Down
6 changes: 6 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@
<jsr305.version>1.3.9</jsr305.version>
<libthrift.version>0.9.2</libthrift.version>
<antlr.version>3.5.2</antlr.version>
<antlr4.version>4.5.2-1</antlr4.version>

<test.java.home>${java.home}</test.java.home>
<test.exclude.tags></test.exclude.tags>
Expand Down Expand Up @@ -1759,6 +1760,11 @@
<artifactId>antlr-runtime</artifactId>
<version>${antlr.version}</version>
</dependency>
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr4-runtime</artifactId>
<version>${antlr4.version}</version>
</dependency>
</dependencies>
</dependencyManagement>

Expand Down
8 changes: 6 additions & 2 deletions project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import sbt._
import sbt.Classpaths.publishTask
import sbt.Keys._
import sbtunidoc.Plugin.UnidocKeys.unidocGenjavadocVersion
import com.simplytyped.Antlr4Plugin._
import com.typesafe.sbt.pom.{PomBuild, SbtPomKeys}
import com.typesafe.tools.mima.plugin.MimaKeys

Expand Down Expand Up @@ -401,7 +402,10 @@ object OldDeps {
}

object Catalyst {
lazy val settings = Seq(
lazy val settings = antlr4Settings ++ Seq(
antlr4PackageName in Antlr4 := Some("org.apache.spark.sql.catalyst.parser.ng"),
antlr4GenListener in Antlr4 := true,
antlr4GenVisitor in Antlr4 := true,
// ANTLR code-generation step.
//
// This has been heavily inspired by com.github.stefri.sbt-antlr (0.5.3). It fixes a number of
Expand All @@ -414,7 +418,7 @@ object Catalyst {
"SparkSqlLexer.g",
"SparkSqlParser.g")
val sourceDir = (sourceDirectory in Compile).value / "antlr3"
val targetDir = (sourceManaged in Compile).value
val targetDir = (sourceManaged in Compile).value / "antlr3"

// Create default ANTLR Tool.
val antlr = new org.antlr.Tool
Expand Down
6 changes: 6 additions & 0 deletions project/plugins.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,9 @@ libraryDependencies += "org.ow2.asm" % "asm" % "5.0.3"
libraryDependencies += "org.ow2.asm" % "asm-commons" % "5.0.3"

libraryDependencies += "org.antlr" % "antlr" % "3.5.2"


// TODO I am not sure we want such a dep.
resolvers += "simplytyped" at "http://simplytyped.github.io/repo/releases"

addSbtPlugin("com.simplytyped" % "sbt-antlr4" % "0.7.10")
6 changes: 4 additions & 2 deletions python/pyspark/sql/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
from pyspark.tests import ReusedPySparkTestCase
from pyspark.sql.functions import UserDefinedFunction, sha2
from pyspark.sql.window import Window
from pyspark.sql.utils import AnalysisException, IllegalArgumentException
from pyspark.sql.utils import AnalysisException, ParseException, IllegalArgumentException


class UTCOffsetTimezone(datetime.tzinfo):
Expand Down Expand Up @@ -1130,7 +1130,9 @@ def test_replace(self):
def test_capture_analysis_exception(self):
self.assertRaises(AnalysisException, lambda: self.sqlCtx.sql("select abc"))
self.assertRaises(AnalysisException, lambda: self.df.selectExpr("a + b"))
self.assertRaises(AnalysisException, lambda: self.sqlCtx.sql("abc"))

def test_capture_parse_exception(self):
self.assertRaises(ParseException, lambda: self.sqlCtx.sql("abc"))

def test_capture_illegalargument_exception(self):
self.assertRaisesRegexp(IllegalArgumentException, "Setting negative mapred.reduce.tasks",
Expand Down
8 changes: 8 additions & 0 deletions python/pyspark/sql/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@ class AnalysisException(CapturedException):
"""


class ParseException(CapturedException):
"""
Failed to parse a SQL command.
"""


class IllegalArgumentException(CapturedException):
"""
Passed an illegal or inappropriate argument.
Expand All @@ -49,6 +55,8 @@ def deco(*a, **kw):
e.java_exception.getStackTrace()))
if s.startswith('org.apache.spark.sql.AnalysisException: '):
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
if s.startswith('org.apache.spark.sql.catalyst.parser.ng.ParseException: '):
raise ParseException(s.split(': ', 1)[1], stackTrace)
if s.startswith('java.lang.IllegalArgumentException: '):
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
raise
Expand Down
4 changes: 4 additions & 0 deletions sql/catalyst/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,10 @@
<groupId>org.antlr</groupId>
<artifactId>antlr-runtime</artifactId>
</dependency>
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr4-runtime</artifactId>
</dependency>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
Expand Down
Loading

0 comments on commit 600c0b6

Please sign in to comment.