Skip to content

This is a modified version of the Webpage-Similarity project. With the addition of 190 more wikipedia pages, a more efficient method of data management is required. The main focus of this project is to create clusters, use persistent data stores and extendible hashing for quick data retrieval.

Notifications You must be signed in to change notification settings

DeclanGH/Webpage-Similarity-II

Repository files navigation

Webpage-Similarity-II

This is a modified version of the Webpage-Similarity project. With the addition of 190 more wikipedia pages, a more efficient method of data store is required. The main focus of this project is to integrate persistent data stores and switch the similarity metric to TF-IDF.

About

This is a modified version of the Webpage-Similarity project. With the addition of 190 more wikipedia pages, a more efficient method of data management is required. The main focus of this project is to create clusters, use persistent data stores and extendible hashing for quick data retrieval.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages