Skip to content

Search Engine for Books (Java, Apache Lucene, crawler4j, Apache Spark)

Notifications You must be signed in to change notification settings

chanddu/Book-Search-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Chandra Sekhar Guntupalli
Jul 25, 2018
d855458 · Jul 25, 2018

History

4 Commits
Jul 25, 2018
Jul 25, 2018
Jul 25, 2018
Jul 25, 2018
Jul 25, 2018

Repository files navigation

Book-Search-Engine

Search Engine for Books (Java, Apache Lucene, crawler4j, Apache Spark)

  • Crawled about 100,000 web pages using crawler4j and performed link analysis by implementing PageRank on the web graph with Apache Spark’s Graphx.
  • Indexed the crawled documents using Apache Lucene and ordered the documents for each query by a combination of PageRank and TF/IDF score.

Releases

No releases published

Packages

No packages published

Languages