File_Search_Server

Overview

The File_Search_Server is a project designed to implement and benchmark various file search algorithms to efficiently search for strings within large text files. This README provides a comprehensive guide to the project, including setup instructions, detailed descriptions of the implemented algorithms, how to run the benchmarks, and how to interpret the results.

Project Structure

The project is organized into the following directories and files:

File_Search_Server/ │ ├── README.md ├── requirements.txt ├── setup.sh ├── 200k.txt ├── server.key ├── server.csr ├── server.crt ├── server.py ├── client.py ├── config.ini ├── file-search_algorithms.py ├── benchmark_results.txt ├── test-suite_server.py ├── multiple-queries_simulation.py ├── speed_report.py └── benchmark_chart.png

Setup Instructions

Prerequisites

Ensure you have the following installed on your machine:

Python 3.7+
pip (Python package installer)

Installation

Setup project directory and the virtual environment
Install the required Python packages:

pip install -r requirements.txt

Usage Instructions

Running the Server

To start the server, run:

python server.py

Running the Client

To interact with the server, run:

python client.py

Running as a Linux Service

To run the File_Search_Server as a Linux daemon or service, follow these steps:

Create a systemd service file:

sudo nano /etc/systemd/system/File_Search_Server.service

Add the following content to the service file:

[Unit]
Description=AS Introductory Task File Search Server
After=network.target

[Service]
User=yourusername
WorkingDirectory=/path/to/File_Search_Server
ExecStart=/usr/bin/python3 /path/to/File_Search_Server/server.py
Restart=always

[Install]
WantedBy=multi-user.target

Replace /path/to/File_Search_Server with the actual path to the project directory and yourusername with your Linux username.

Reload the systemd daemon to recognize the new service:

sudo systemctl daemon-reload

Start the service:

sudo systemctl start File_Search_Server.service

Enable the service to start on boot:

sudo systemctl enable File_Search_Server.service

Check the status of the service:

sudo systemctl status File_Search_Server.service

The File_Search_Server should now be running as a Linux service.

You can stop the service with

sudo systemctl stop File_Search_Server.service

and restart it with

sudo systemctl restart File_Search_Server.service

Querying the server: Navigate to the project directory where the client.py script is located:

cd /path/to/File_Search_Server

Run the client script:

python client.py

Enter the string you want to search for in the 200k.txt file

Unit Testing

Running the test suite server

Run the test suite server script to run all the unit test cases:

pytest -vv test-suite_server.py

This will run all the test cases and give you a summary of the test session results.

Implemented Algorithms

The following file search algorithms are implemented in this project:

Naive Search:
- A straightforward approach that checks each position in the text for a match.
Binary Search:
- Requires the data to be sorted; it divides the search interval in half repeatedly.
Knuth-Morris-Pratt (KMP) Algorithm:
- Uses the preprocessing of the pattern to avoid unnecessary comparisons.
Boyer-Moore Algorithm:
- Skips sections of the text, making it efficient for large alphabets and long patterns.
Rabin-Karp Algorithm:
- Uses hashing to find any one of a set of pattern strings in a text.
Z Algorithm:
- Computes the Z array which is used for pattern matching in linear time.

Configuration

The config.ini file contains the following setting:

REREAD_ON_QUERY: If set to True, the data will be re-read from the file on every query. If False, the data will be read once and reused for all queries.

Running the Benchmarks

Benchmarking Search Algorithms

To benchmark the search algorithms, run the file-search_algorithms.py script:

python file-search_algorithms.py

This script will benchmark each algorithm on the generated test files and save the results to benchmark_results.txt.

Generating the Speed Report

Once you have the benchmark results, generate the speed report by running the speed_report.py script:

python speed_report.py

This will create two visual representations of the benchmark results, for when REREAD_ON_QUERY = True, saved as benchmark_chart_reread.png and for when REREAD_ON_QUERY = False, saved as benchmark_chart_no_reread.png.

Analyzing the Results

The benchmark results are saved in benchmark_results.txt and can be visualized using the generated charts, benchmark_chart_reread.png and benchmark_chart_no_reread.png. The results include

Execution time for each algorithm across different file sizes.
Comparative performance analysis of all algorithms with and without re-reading the file.

Performance Analysis

From the generated charts, key observations include:

Naive Search shows a linear increase in execution time with file size.
Binary Search performs well for smaller datasets but requires sorted data.
KMP Algorithm and Boyer-Moore Algorithm demonstrate efficient performance for large datasets.
Rabin-Karp Algorithm is effective for multiple pattern searches but less efficient for very large datasets.
Z Algorithm outperforms other algorithms with linear time complexity.
Impact of REREAD_ON_QUERY Analyzing the two charts can help in understanding how re-reading the file impacts the performance of each algorithm

Limitations and Recommendations

Identified Limitations

Memory Usage: Higher memory consumption for certain algorithms (e.g., Rabin-Karp).
Execution Time: Inefficiency of Naive Search and Binary Search for large datasets.
Scalability: Performance degradation with high concurrency or extremely large datasets.
File Re-reading Overhead: Additional overhead when REREAD_ON_QUERY is set to True, which might affect execution time.

Recommendations

Use Z Algorithm for Large Datasets: Implement the Z Algorithm for efficient performance.
Optimize Memory Usage: Optimize algorithms to reduce memory consumption.
Enhance Scalability: Implement load balancing and optimize server code for better scalability.
Assess Impact of REREAD_ON_QUERY: Choose the REREAD_ON_QUERY setting based on specific use cases to balance between overhead and performance.
Further Testing: Conduct additional tests with diverse data and query patterns.

Future Enhancements

Advanced Query Options: Support regex and fuzzy searching.
Real-Time Analytics: Monitor server performance and client query statistics in real-time.
Scalability Improvements: Explore distributed computing techniques.
Enhanced Benchmarking: Incorporate more detailed benchmarking to evaluate the impact of file re-reading and other parameters on performance.

Conclusion

The File_Search_Server project provides an efficient solution for searching strings in large text files. By benchmarking various algorithms, we have identified the most suitable algorithm for different scenarios. The project offers a robust framework for further enhancements and optimizations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

File_Search_Server

Overview

Table of Contents

Project Structure

Setup Instructions

Prerequisites

Installation

Usage Instructions

Running as a Linux Service

Unit Testing

Running the test suite server

Implemented Algorithms

Configuration

Running the Benchmarks

Benchmarking Search Algorithms

Generating the Speed Report

Analyzing the Results

Performance Analysis

Limitations and Recommendations

Identified Limitations

Recommendations

Future Enhancements

Conclusion

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
__pycache__		__pycache__
200k.txt		200k.txt
README.md		README.md
benchmark_chart_no_reread.png		benchmark_chart_no_reread.png
benchmark_chart_reread.png		benchmark_chart_reread.png
benchmark_results.txt		benchmark_results.txt
client.py		client.py
config.ini		config.ini
file-search_algorithms.py		file-search_algorithms.py
multiple-queries_simulation.py		multiple-queries_simulation.py
requirements.txt		requirements.txt
server.crt		server.crt
server.csr		server.csr
server.key		server.key
server.py		server.py
setup.sh		setup.sh
speed_report.py		speed_report.py
test-suite_server.py		test-suite_server.py

DavidOgalo/File_Search_Server

Folders and files

Latest commit

History

Repository files navigation

File_Search_Server

Overview

Table of Contents

Project Structure

Setup Instructions

Prerequisites

Installation

Usage Instructions

Running as a Linux Service

Unit Testing

Running the test suite server

Implemented Algorithms

Configuration

Running the Benchmarks

Benchmarking Search Algorithms

Generating the Speed Report

Analyzing the Results

Performance Analysis

Limitations and Recommendations

Identified Limitations

Recommendations

Future Enhancements

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages