Simple-Search-Engine

The Simple Search Engine project is a lightweight yet powerful tool designed to provide efficient text search capabilities. this project allows users to search through a corpus of documents and obtain relevant result. The Vector Space Model (VSM) is a crucial concept in Natural Language Processing (NLP) used to represent text data numerically in a high-dimensional space. This project implements a search engine using the VSM approach, allowing users to retrieve relevant information from a given corpus.

Description

This search engine project involves several key steps:

Step 0: Importing Corpus

The initial step involves reading text corpora from the local machine. The Python script utilizes the NLTK library for further processing.

Step 1: Preprocessing & Tokenizing

Text preprocessing is carried out to eliminate unnecessary tokens and simplify calculations. The NLTK library is employed for tasks such as tokenization, lemmatization, and stop-word removal.

Step 2: Creating our Dataset

The preprocessed data is organized into a CSV file, creating a structured dataset for subsequent analysis.

Step 3: Creating our Matrix

A term-document matrix is generated from the dataset, representing the frequency of terms in each document.

Step 5: Calculating Cosine Similarity

Cosine similarity is computed to measure the similarity between the input query and the documents in the corpus. The results are ranked based on similarity.

How to Use

Clone the repository to your local machine.
Install the necessary dependencies (NLTK, pandas).
Run the Python script to build the search engine.

Dependencies

NLTK
Pandas

Author

[Kiarash Rahmani]

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Corpus		Corpus
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
Simple Search engine.ipynb		Simple Search engine.ipynb
output.csv		output.csv
ranked_output.csv		ranked_output.csv
sparse_output_tf.csv		sparse_output_tf.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple-Search-Engine

Description

Step 0: Importing Corpus

Step 1: Preprocessing & Tokenizing

Step 2: Creating our Dataset

Step 3: Creating our Matrix

Step 5: Calculating Cosine Similarity

How to Use

Dependencies

Author

License

About

Releases 1

Packages

Languages

License

kiarashrahmani/Simple-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Simple-Search-Engine

Description

Step 0: Importing Corpus

Step 1: Preprocessing & Tokenizing

Step 2: Creating our Dataset

Step 3: Creating our Matrix

Step 5: Calculating Cosine Similarity

How to Use

Dependencies

Author

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages