Solr Connector

Solr Connector was implemented using https://github.com/lucidworks/spark-solr. The Solr Connector just works on Solr Cloud. For all the options available for the connector check on this link.

To add Solr Almaren dependency to your sbt build:

libraryDependencies += "com.github.music-of-the-ainur" %% "solr-almaren" % "0.3.5-3.4"

To run in spark-shell:

spark-shell --master "local[*]" --packages "com.github.music-of-the-ainur:almaren-framework_2.12:0.9.10-3.4,com.github.music-of-the-ainur:solr-almaren_2.12:0.3.5-3.4"

Solr Connector is available in Maven Central repository.

version	Connector Artifact
Spark 3.4.x and scala 2.12	`com.github.music-of-the-ainur:solr-almaren_2.12:0.3.5-3.4`
Spark 3.3.x and scala 2.12	`com.github.music-of-the-ainur:solr-almaren_2.12:0.3.5-3.3`
Spark 3.2.x and scala 2.12	`com.github.music-of-the-ainur:solr-almaren_2.12:0.3.5-3.2`
Spark 3.1.x and scala 2.12	`com.github.music-of-the-ainur:solr-almaren_2.12:0.3.5-3.1`
Spark 2.4.x and scala 2.11	`com.github.music-of-the-ainur:solr-almaren_2.11:0.3.5-2.4`

Source and Target

Source

Parameteres

Parameters	Description
collection	collection name
ZookeeperHost(zkhost)	localhost:9983
options	Description(Value)
-----------------------	------------------------------------------------------------------------------
query	limits the rows you want to load into Spark("body_t:solr")
fields	specify a subset of fields("id,author_s,favorited_b")
filters	to apply filters on the values in documents("firstName:Sam,lastName:Powell")
rows	specify the number of rows to be displayed on the page(100)
max_rows	Limits the result set to a maximum number of rows(5000)
request_handler	Set the Solr request handler for queries("/export","/select")

Example

import com.github.music.of.the.ainur.almaren.Almaren
import com.github.music.of.the.ainur.almaren.solr.Solr.SolrImplicit

val almaren = Almaren("App Name")

almaren.builder.sourceSolr("collection","zkHost1:2181,zkHost2:2181",Map("field_names" -> "first_name,last_name","rows" -> 100))

almaren.builder.targetSolr("collection","zkHost1:2181,zkHost2:2181",options)

Target:

Parameters

Parameters	Description
collection	collection name
ZookeeperHost(zkhost)	localhost:9983
Savemode	SaveMode.ErrorIfExists
options	Description(Value)
-----------------------	----------------------------------------------------------------
soft_commit_secs	set soft_commit_sec(10 seconds)
commit_within	force commit to happen after specified time(5000 milliSeconds)
batch_size	number of documents sent in a HTTP call (1000)
gen_uniq_key	generating unique key for each document(true)
solr_field_types	specify field types for solr("rating:string,title:text_en")

Example

import com.github.music.of.the.ainur.almaren.solr.Solr.SolrImplicit
import com.github.music.of.the.ainur.almaren.builder.Core.Implicit
import com.github.music.of.the.ainur.almaren.Almaren
import org.apache.spark.sql.SaveMode

val almaren = Almaren("App Name")

almaren.builder
    .sourceSql("""SELECT sha2(concat_ws("",array(*)),256) as id,*,current_timestamp from deputies""")
    .coalesce(30)
    .targetSolr("deputies","cloudera:2181,cloudera1:2181,cloudera2:2181/solr",Map("batch_size" -> "100000","commit_within" -> "10000"),SaveMode.Overwrite)
    .batch

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github/workflows		.github/workflows
project		project
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Solr Connector

Source and Target

Source

Parameteres

Example

Target:

Parameters

Example

About

Releases 7

Packages

Contributors 6

Languages

License

music-of-the-ainur/solr.almaren

Folders and files

Latest commit

History

Repository files navigation

Solr Connector

Source and Target

Source

Parameteres

Example

Target:

Parameters

Example

About

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 6

Languages

Packages