Skip to content

Full text search

Anastasios Zouzias edited this page Sep 17, 2016 · 1 revision

Few examples of text search with LuceneRDD

Download and install Apache Spark locally.

Setup your SPARK_HOME environment variable to your extracted spark directory, i.e., with Spark 1.5.2 extracted in your home directory, do

HOME_DIR=`echo ~`
export SPARK_HOME=${HOME_DIR}/spark-1.5.2-bin-2.6.0
./spark-shell.sh # Starts spark shell using spark-lucenerdd JAR

Now, LuceneRDD is available in Spark shell. In spark shell, type

scala> :load scripts/loadWords.scala

to instantiate an LuceneRDD[String] object containing the words from src/test/resources/words.txt

Term query

To perform a exact term query, do

scala> val results = luceneRDD.termQuery("_1", "hello", 10)
scala> results.foreach(println)
SparkScoreDoc(12.393539,129848,0,Numeric fields:Text fields:_1:[hello])
...

Prefix query

To perform a prefix query, do

scala> val results = luceneRDD.prefixQuery("_1", "hel", 10)
scala> results.foreach(println)
SparkScoreDoc(1.0,129618,0,Numeric fields:Text fields:_1:[held])
SparkScoreDoc(1.0,129617,0,Numeric fields:Text fields:_1:[helcotic])
SparkScoreDoc(1.0,129616,0,Numeric fields:Text fields:_1:[helcosis])
...

Fuzzy query

To perform a fuzzy query, do

scala> val results = luceneRDD.fuzzyQuery("_1", "aba", 1)
scala> results.foreach(println)
SparkScoreDoc(7.155413,175248,0,Numeric fields:Text fields:_1:[yaba])
SparkScoreDoc(7.155413,33820,0,Numeric fields:Text fields:_1:[paba])
...
Clone this wiki locally