Skip to content

a collections of custom analyzers to tweak fulltext search in neo4j

Notifications You must be signed in to change notification settings

covidgraph/neo4j-additional-analyzers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

additional full text analyzers for Neo4j

The graph database Neo4j does support fulltext indexing out of the box. Part of this is fine grained configuration of analzyers for each index. There’s a good write up how to build your own custom analyzers. This project leverages couple of commonly used analyzers.

Installation

The analyzers are supposed to work with Neo4j >= 3.4. Clone this repo and use Maven to build a jar file:

mvn package
cp target/neo4j-additional-analyzers-<version>.jar <NEO4J_HOME>/plugins

and restart Neo4j.

Usage

Set up a fulltext index and specificy the custom index:

CALL db.index.fulltext.createNodeIndex("myindex", ["Articles"], ["abstract", "content"], {analyzer: "whitespace_lower"});
CALL db.index.fulltext.createNodeIndex("myindex", ["Articles"], ["abstract", "content"], {analyzer: "synonym"});

Reference of available analzyers

whitespace_lower

This analyzer combines the whitespace analyzer with German and English stopword lists and applies a toLower conversion as well. Due to usage of whitespace terms containing dashes, e.g. PNAS-108 as gene symbol or technical terms like x-270°C are treated as one atomic term.

synonym

Synonym analyzer is aware of gene symbols containing whitespace, e.g. cGK 1. Those multi-word terms will be mapping to a single word alternative using Lucene’s SynonymFilterFactory. Aside of this mapping, the synonym analyzer works exactly like whitespace_lower

About

a collections of custom analyzers to tweak fulltext search in neo4j

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages