This is a capstone project in python to develop entire full text index search tool to be capable of indexing several files based on inverted index and enable a quick textual search capability in all files' contents.
- Create indexes (in-memory) for all the docx files present in directories and sub-directories
- Search for a word or a phrase in a file
- Search in multiple files
- Display search results- files that contain the word and frequency of occurrence in each file
- Search results are sorted with most frequent occurrence on the top
- Search case-insensitive
- Search irrespective of the punctuations
- Read files on physical disk and index them --> then search
- Provide a path and search within all directories and sub-directories for existing docx files
- Provide a warning for the files which are currently in Open state may not have latest search results.
- Real-time display of a list of currently opened files to quickly understand which files are being modified.
- Develop UI to: give path of text files, enter phrase to search for, search results, open file from results
- Store the files as well as indexes in DB
- Check if the file is modified or new and only then index it
- Save indexes in files or in database and run periodic update of indexes