Simple search engine implementation. Deployed at https://hnsearch.mnprt.me/ .
Only stores individual words as tokens after removing stop words using this stop word library. Stemming is done using port-stemming library.
Uses official hn-api to find stories and their comments. Only stories with >= 5 points are indexed. The indexer is backed by db i.e. it can be safely stopped and resumed in the future.
Only Stories and their top level comments are fetched and indexed.
Search uses BM25 algorithm to rank the documents.
Behavior can be configured using the .env
file.
DB_USER="testuser"
DB_PASSWORD="testpassword"
DB_NAME="testdb"
DB_HOST="db"
DB_PORT="5432"
START_INDEX=false // whether you want to start indexing
MAX_ITEMS=8000 // max number of items to index (item = story + comment)
docker compose up --build