Skip to content

Commit

Permalink
Cherry picked the Hive Thrift server
Browse files Browse the repository at this point in the history
  • Loading branch information
liancheng committed Jul 24, 2014
1 parent a45d548 commit 2c4c539
Show file tree
Hide file tree
Showing 43 changed files with 1,488 additions and 38 deletions.
5 changes: 5 additions & 0 deletions assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,11 @@
<artifactId>spark-hive_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive-thriftserver_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</profile>
<profile>
Expand Down
2 changes: 1 addition & 1 deletion bagel/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-bagel_2.10</artifactId>
<properties>
<sbt.project.name>bagel</sbt.project.name>
<sbt.project.name>bagel</sbt.project.name>
</properties>
<packaging>jar</packaging>
<name>Spark Project Bagel</name>
Expand Down
45 changes: 45 additions & 0 deletions bin/beeline
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Figure out where Spark is installed
FWDIR="$(cd `dirname $0`/..; pwd)"

# Find the java binary
if [ -n "${JAVA_HOME}" ]; then
RUNNER="${JAVA_HOME}/bin/java"
else
if [ `command -v java` ]; then
RUNNER="java"
else
echo "JAVA_HOME is not set" >&2
exit 1
fi
fi

# Compute classpath using external script
classpath_output=$($FWDIR/bin/compute-classpath.sh)
if [[ "$?" != "0" ]]; then
echo "$classpath_output"
exit 1
else
CLASSPATH=$classpath_output
fi

CLASS="org.apache.hive.beeline.BeeLine"
exec "$RUNNER" -cp "$CLASSPATH" $CLASS "$@"
1 change: 1 addition & 0 deletions bin/compute-classpath.sh
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ if [ -n "$SPARK_PREPEND_CLASSES" ]; then
CLASSPATH="$CLASSPATH:$FWDIR/sql/catalyst/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/core/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive-thriftserver/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/yarn/stable/target/scala-$SCALA_VERSION/classes"
fi

Expand Down
24 changes: 24 additions & 0 deletions bin/spark-sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Figure out where Spark is installed
FWDIR="$(cd `dirname $0`/..; pwd)"

CLASS="org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver"
$FWDIR/bin/spark-class $CLASS $@
2 changes: 1 addition & 1 deletion core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<properties>
<sbt.project.name>core</sbt.project.name>
<sbt.project.name>core</sbt.project.name>
</properties>
<packaging>jar</packaging>
<name>Spark Project Core</name>
Expand Down
41 changes: 39 additions & 2 deletions docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -548,7 +548,6 @@ results = hiveContext.hql("FROM src SELECT key, value").collect()
</div>
</div>


# Writing Language-Integrated Relational Queries

**Language-Integrated queries are currently only supported in Scala.**
Expand All @@ -573,4 +572,42 @@ prefixed with a tick (`'`). Implicit conversions turn these symbols into expres
evaluated by the SQL execution engine. A full list of the functions supported can be found in the
[ScalaDoc](api/scala/index.html#org.apache.spark.sql.SchemaRDD).

<!-- TODO: Include the table of operations here. -->
<!-- TODO: Include the table of operations here. -->

## Running the Thrift JDBC server

The Thrift JDBC server implemented here corresponds to the [`HiveServer2`]
(https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2) in Hive 0.12. You can test
the JDBC server with the beeline script comes with either Spark or Hive 0.12.

To start the JDBC server, run the following in the Spark directory:

./sbin/start-thriftserver.sh

The default port the server listens on is 10000. Now you can use beeline to test the Thrift JDBC
server:

./bin/beeline

Connect to the JDBC server in beeline with:

beeline> !connect jdbc:hive2://localhost:10000

Beeline will ask you for a username and password. In non-secure mode, simply enter the username on
your machine and a blank password. For secure mode, please follow the instructions given in the
[beeline documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients)

Configuration of Hive is done by placing your `hive-site.xml` file in `conf/`.

You may also use the beeline script comes with Hive.

## Running the Spark SQL CLI

The Spark SQL CLI is a convenient tool to run the Hive metastore service in local mode and execute
queries input from command line. Note: the Spark SQL CLI cannot talk to the Thrift JDBC server.

To start the Spark SQL CLI, run the following in the Spark directory:

./bin/spark-sql

Configuration of Hive is done by placing your `hive-site.xml` file in `conf/`.
2 changes: 1 addition & 1 deletion examples/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-examples_2.10</artifactId>
<properties>
<sbt.project.name>examples</sbt.project.name>
<sbt.project.name>examples</sbt.project.name>
</properties>
<packaging>jar</packaging>
<name>Spark Project Examples</name>
Expand Down
2 changes: 1 addition & 1 deletion external/flume/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-flume_2.10</artifactId>
<properties>
<sbt.project.name>streaming-flume</sbt.project.name>
<sbt.project.name>streaming-flume</sbt.project.name>
</properties>
<packaging>jar</packaging>
<name>Spark Project External Flume</name>
Expand Down
2 changes: 1 addition & 1 deletion external/kafka/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<properties>
<sbt.project.name>streaming-kafka</sbt.project.name>
<sbt.project.name>streaming-kafka</sbt.project.name>
</properties>
<packaging>jar</packaging>
<name>Spark Project External Kafka</name>
Expand Down
2 changes: 1 addition & 1 deletion external/mqtt/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-mqtt_2.10</artifactId>
<properties>
<sbt.project.name>streaming-mqtt</sbt.project.name>
<sbt.project.name>streaming-mqtt</sbt.project.name>
</properties>
<packaging>jar</packaging>
<name>Spark Project External MQTT</name>
Expand Down
2 changes: 1 addition & 1 deletion external/twitter/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-twitter_2.10</artifactId>
<properties>
<sbt.project.name>streaming-twitter</sbt.project.name>
<sbt.project.name>streaming-twitter</sbt.project.name>
</properties>
<packaging>jar</packaging>
<name>Spark Project External Twitter</name>
Expand Down
2 changes: 1 addition & 1 deletion external/zeromq/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-zeromq_2.10</artifactId>
<properties>
<sbt.project.name>streaming-zeromq</sbt.project.name>
<sbt.project.name>streaming-zeromq</sbt.project.name>
</properties>
<packaging>jar</packaging>
<name>Spark Project External ZeroMQ</name>
Expand Down
2 changes: 1 addition & 1 deletion graphx/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.10</artifactId>
<properties>
<sbt.project.name>graphx</sbt.project.name>
<sbt.project.name>graphx</sbt.project.name>
</properties>
<packaging>jar</packaging>
<name>Spark Project GraphX</name>
Expand Down
2 changes: 1 addition & 1 deletion mllib/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId>
<properties>
<sbt.project.name>mllib</sbt.project.name>
<sbt.project.name>mllib</sbt.project.name>
</properties>
<packaging>jar</packaging>
<name>Spark Project ML Library</name>
Expand Down
7 changes: 4 additions & 3 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@
<module>sql/catalyst</module>
<module>sql/core</module>
<module>sql/hive</module>
<module>sql/hive-thriftserver</module>
<module>repl</module>
<module>assembly</module>
<module>external/twitter</module>
Expand Down Expand Up @@ -252,9 +253,9 @@
<version>3.3.2</version>
</dependency>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.5</version>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.5</version>
</dependency>
<dependency>
<groupId>com.google.code.findbugs</groupId>
Expand Down
14 changes: 7 additions & 7 deletions project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ object BuildCommons {

private val buildLocation = file(".").getAbsoluteFile.getParentFile

val allProjects@Seq(bagel, catalyst, core, graphx, hive, mllib, repl, spark, sql, streaming,
streamingFlume, streamingKafka, streamingMqtt, streamingTwitter, streamingZeromq) =
Seq("bagel", "catalyst", "core", "graphx", "hive", "mllib", "repl", "spark", "sql",
"streaming", "streaming-flume", "streaming-kafka", "streaming-mqtt", "streaming-twitter",
"streaming-zeromq").map(ProjectRef(buildLocation, _))
val allProjects@Seq(bagel, catalyst, core, graphx, hive, hiveThriftServer, mllib, repl, spark, sql,
streaming, streamingFlume, streamingKafka, streamingMqtt, streamingTwitter, streamingZeromq) =
Seq("bagel", "catalyst", "core", "graphx", "hive", "hive-thriftserver", "mllib", "repl",
"spark", "sql", "streaming", "streaming-flume", "streaming-kafka", "streaming-mqtt",
"streaming-twitter", "streaming-zeromq").map(ProjectRef(buildLocation, _))

val optionallyEnabledProjects@Seq(yarn, yarnStable, yarnAlpha, java8Tests, sparkGangliaLgpl) =
Seq("yarn", "yarn-stable", "yarn-alpha", "java8-tests", "ganglia-lgpl")
Expand Down Expand Up @@ -99,7 +99,7 @@ object SparkBuild extends PomBuild {
Properties.envOrNone("SBT_MAVEN_PROPERTIES") match {
case Some(v) =>
v.split("(\\s+|,)").filterNot(_.isEmpty).map(_.split("=")).foreach(x => System.setProperty(x(0), x(1)))
case _ =>
case _ =>
}

override val userPropertiesMap = System.getProperties.toMap
Expand Down Expand Up @@ -157,7 +157,7 @@ object SparkBuild extends PomBuild {

/* Enable Mima for all projects except spark, hive, catalyst, sql and repl */
// TODO: Add Sql to mima checks
allProjects.filterNot(y => Seq(spark, sql, hive, catalyst, repl).exists(x => x == y)).
allProjects.filterNot(x => Seq(spark, sql, hive, catalyst, repl).contains(x)).
foreach (x => enable(MimaBuild.mimaSettings(sparkHome, x))(x))

/* Enable Assembly for all assembly projects */
Expand Down
24 changes: 24 additions & 0 deletions sbin/start-thriftserver.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Figure out where Spark is installed
FWDIR="$(cd `dirname $0`/..; pwd)"

CLASS="org.apache.spark.sql.hive.thriftserver.HiveThriftServer2"
$FWDIR/bin/spark-class $CLASS $@
2 changes: 1 addition & 1 deletion sql/catalyst/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
<name>Spark Project Catalyst</name>
<url>http://spark.apache.org/</url>
<properties>
<sbt.project.name>catalyst</sbt.project.name>
<sbt.project.name>catalyst</sbt.project.name>
</properties>

<dependencies>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,7 @@ case class NativeCommand(cmd: String) extends Command {
*/
case class SetCommand(key: Option[String], value: Option[String]) extends Command {
override def output = Seq(
BoundReference(0, AttributeReference("key", StringType, nullable = false)()),
BoundReference(1, AttributeReference("value", StringType, nullable = false)()))
BoundReference(1, AttributeReference("", StringType, nullable = false)()))
}

/**
Expand Down
2 changes: 1 addition & 1 deletion sql/core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
<name>Spark Project SQL</name>
<url>http://spark.apache.org/</url>
<properties>
<sbt.project.name>sql</sbt.project.name>
<sbt.project.name>sql</sbt.project.name>
</properties>

<dependencies>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,26 +46,44 @@ case class SetCommand(
@transient context: SQLContext)
extends LeafNode with Command {

override protected[sql] lazy val sideEffectResult: Seq[(String, String)] = (key, value) match {
override protected[sql] lazy val sideEffectResult: Seq[String] = (key, value) match {
// Set value for key k.
case (Some(k), Some(v)) =>
context.set(k, v)
Array(k -> v)
Array(s"$k=$v")

// Query the value bound to key k.
case (Some(k), _) =>
Array(k -> context.getOption(k).getOrElse("<undefined>"))
// TODO (lian) This is just a workaround to make the Simba ODBC driver work.
// Should remove this once we get the ODBC driver updated.
if (k == "-v") {
val hiveJars = Seq(
"hive-exec-0.12.0.jar",
"hive-service-0.12.0.jar",
"hive-common-0.12.0.jar",
"hive-hwi-0.12.0.jar",
"hive-0.12.0.jar").mkString(":")

Array(
"system:java.class.path=" + hiveJars,
"system:sun.java.command=shark.SharkServer2")
}
else {
Array(s"$k=${context.getOption(k).getOrElse("<undefined>")}")
}

// Query all key-value pairs that are set in the SQLConf of the context.
case (None, None) =>
context.getAll
context.getAll.map { case (k, v) =>
s"$k=$v"
}

case _ =>
throw new IllegalArgumentException()
}

def execute(): RDD[Row] = {
val rows = sideEffectResult.map { case (k, v) => new GenericRow(Array[Any](k, v)) }
val rows = sideEffectResult.map { line => new GenericRow(Array[Any](line)) }
context.sparkContext.parallelize(rows, 1)
}

Expand Down
Loading

0 comments on commit 2c4c539

Please sign in to comment.