Skip to content

Information Retrieval Project: This repository contains a suite of Python tools and a Jupyter notebook for simple data processing, inverted indexing, and boolean querying, complemented by detailed experimental analyses.

Notifications You must be signed in to change notification settings

Amir-Entezari/IR-boolean-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Information Retrieval System with Advanced Boolean Search

Overview

This project aims to develop an Information Retrieval (IR) system that supports both standard Boolean queries and proximity queries. The system is designed to handle a collection of text documents, building an Inverted Index and a Positional Index to facilitate efficient document retrieval.

Features

  • Boolean Queries: Supports AND, OR, and NOT operations.
  • Proximity Queries: Finds documents where terms appear within a specified distance from each other.
  • Inverted Index: Efficiently stores mappings from terms to the documents they appear in.
  • Positional Index: Tracks the positions of terms within documents for proximity queries.

Installation

To run this project, you will need to have Python installed along with the following libraries:

  • nltk
  • string

You can install the required libraries using pip:

pip install nltk

Description

The repository contains:

  • preprocessing.py: Script for data cleaning and preparation.
  • indexing.py: Utilities for data indexing.
  • querying.py: Implementation of various data querying methods.
  • experiments.ipynb: Jupyter notebook containing experimental analysis and results.

Installation

To get started with this project, clone this repository using:

git clone https://github.com/Amir-Entezari/IR-boolean-search.git

Install the required packages:

pip install -r requirements.txt

Usage

Here's how you can run the scripts:

For a detailed walkthrough, open the experiments.ipynb in Jupyter Notebook or JupyterLab:

jupyter notebook experiments.ipynb

Contributing

Contributions are welcome! For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Information Retrieval Project: This repository contains a suite of Python tools and a Jupyter notebook for simple data processing, inverted indexing, and boolean querying, complemented by detailed experimental analyses.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published