-
Notifications
You must be signed in to change notification settings - Fork 36
Spatial search
Anastasios Zouzias edited this page Sep 17, 2016
·
1 revision
Some examples using ShapeLuceneRDD
We assume that you initiated the spark-shell using the script ./spark-shell-csv.sh
(which loads the spark-csv package)
import org.zouzias.spark.lucenerdd.spatial.shape.ShapeLuceneRDD
import org.zouzias.spark.lucenerdd.spatial.shape._
import org.zouzias.spark.lucenerdd._
import org.zouzias.spark.lucenerdd.LuceneRDD
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "false").option("inferSchema", "true").option("delimiter", "\t").load("src/test/resources/spatial/CH.txt")
val swissCities = df.select("C0", "C1", "C5", "C4").map(row => ((row.getDouble(2), row.getDouble(3)), row.getString(1).toLowerCase()))
val shapes = ShapeLuceneRDD(swissCities)
shapes.count
The above should return
Now, let's perform a KNN (k-nearest neighbors) search around Bern (7.433534, 46.948380)
shapes.knnSearch( (7.433534, 46.948380), 10).foreach(println)
For more human friendly format, try
shapes.knnSearch( (7.433534, 46.948380), 20).flatMap(_.doc.textField("_1")).foreach(println)
Now, let's see how many entries our dataset has within 1km around Bern.
shapes.circleSearch( (7.433534, 46.948380), 1, 1000).size
shapes.circleSearch( (7.433534, 46.948380), 1, 10).foreach(println)