KSP DATATHON 2024
Data Privacy in Law Enforcement
Welcome to the FIR Redactor, a powerful and intuitive tool designed specifically for the Karnataka Police Department. Our application aims to enhance the privacy and security of sensitive information within First Information Reports (FIRs). This tool leverages advanced language models and PDF processing libraries to identify and redact personal identifiers from FIR documents written in both English and Kannada.
-
Dual Language Support: Handles FIRs in both English and Kannada, ensuring comprehensive coverage for diverse linguistic needs.
-
AI-Powered Entity Extraction: Utilizes the cutting-edge LLM model to accurately extract personal identifiers such as names, addresses, and other sensitive information from FIR texts.
-
Automated Redaction: Employs PyMuPDF to seamlessly redact identified sensitive information from PDF documents, replacing it with blacked-out text to maintain confidentiality.
-
User-Friendly Interface: Provides an intuitive and easy-to-navigate interface for users, including tabs for Home, Login, Redaction, and comprehensive information about the tool.
-
Upload FIR Documents: Users can upload FIR PDFs through a simple file uploader. The application reads and extracts text from these documents for further processing.
-
AI-Suggested Redactions: After analyzing the text, the application suggests redactions by identifying sensitive entities. These suggestions are presented in a structured format for user review.
-
Redaction Execution: Users can input the specific text they wish to redact, and the tool will automatically find and redact these instances within the document. The redacted PDF is then saved securely.
The FIR Redactor ensures that the privacy of individuals involved in FIRs is protected by securely handling all documents and utilizing state-of-the-art redaction techniques. All data processing is performed with strict adherence to security protocols to prevent unauthorized access and maintain the integrity of sensitive information.
This tool is designed to assist the Karnataka Police Department in safeguarding personal information within FIRs, facilitating compliance with privacy regulations and enhancing the trust and confidence of the public.
- Python 3.7 or higher
- MongoDB
- Streamlit
- PyPDF2
- PyMuPDF
- OpenAI API key
-
Clone the Repository
git clone https://github.com/raju-2003/FIR-Redactor
-
Install Dependencies
pip install -r requirements.txt
-
Set Up Environment Variables
Create a
.streamlit/secrets.toml
file and add your OpenAI API key and MongoDB connection string:[secrets] openai = "your_openai_api_key" connection_string = "your_mongodb_connection_string"
streamlit run app.py
- app.py: The main application file containing the Streamlit interface and all functionalities.
- requirements.txt: A file containing all the dependencies required to run the application.
- generate_token(user_id): Generates a JWT token for user authentication.
- validate_token(token): Validates the JWT token.
- save_token(user_id, token, expiry): Saves the generated token to the MongoDB database.
- check_login(username, password): Verifies user credentials against the database.
- read_file(pdf_file): Reads the uploaded PDF file and extracts text.
- extract_entities(text): Uses the OpenAI API to extract sensitive entities from the text.
- search_replace(path, text): Redacts specified text from the PDF document and provides a downloadable redacted version.
- Login: Users must log in using their credentials to access the redaction features.
- Upload FIR Document: After logging in, users can upload FIR documents in PDF format.
- AI-Suggested Redactions: The application will provide AI-suggested redactions based on the uploaded document.
- Manual Redaction: Users can manually input text to be redacted and download the redacted PDF.
- username : admin Password : admin
- username : raju Password : raju
This tool is designed to assist the Karnataka Police Department in safeguarding personal information within FIRs, facilitating compliance with privacy regulations and enhancing the trust and confidence of the public.