Skip to content
This repository has been archived by the owner on Apr 4, 2022. It is now read-only.

Latest commit

 

History

History
18 lines (14 loc) · 779 Bytes

README.md

File metadata and controls

18 lines (14 loc) · 779 Bytes

pyIR: Collection of Information Retrieval algorithms

GitHub version

A collection of algorithms for querying a set of documents and returning the ones most relevant to the query.

The algorithms that have been implemented are:

  • Vector Space Model
  • Best Match 25
  • Unigram Language Model using Jelinek Mercer Smoothing

Installation

If you want to be sure you're getting the newest version, you can install it directly from github with

pip install git+ssh://git@github.com/hrwx/pyIR.git

TREC

The algorithms were implemented primarily to run evaluations using the TREC Cranfield collection. The TREC evaluation can be run from the evaluate.py file.