This is a Basic Web Based tool to De-Identify PDFs Using Django and Python Libraries pdfminer, pdfrw. To See The Live Demo visit: http://samhaldia.pythonanywhere.com
This Repository Used Following :
Conda 4.0.5
Django 2.1.2
pdfrw>=0.4
defusedxml
# Read For More Info:
https://pdfminer-docs.readthedocs.io/pdfminer_index.html
pdfminer.six
chardet
django-crispy-forms
Steps to Setup Environment in Windows:
-
Install Anaconda
-
Create virtualenv for Web Based Tool: conda create --name deidentify python=3 # Create new ENV deidentify with Python 3 activate deidentify # Activate the deidentify ENV conda list # To See List Of Packages Installed in Current ENV.
-
conda info --envs # List all ENV's created in Conda
-
pip install django # It will Download latest Django version into de-dentify ENV
-
pip install pdfrw>=0.4
-
pip install defusedxml
-
pip install pdfminer.six
-
pip install chardet
-
pip install django-crispy-forms # 3rd party package to work with Form in Django
-
Create a Folder of your Choice ex: DeIdentifyTool
-
Clone the Repository inside the Created Folder
-
Create a Mysql Database as configured at de_identify/settings.py file in Project folder in the code repo.
-
Now need to Run : python manage.py makemigrations pdf_deidentify (This is App Specific Name While Creating app in your Django project)
-
python manage.py migrate pdf_deidentify
-
To Look Admin Interface Run: python manage.py createsuperuser
-
Execute to see the Web tool running: python manage.py runserver # Admin Interface will be available by appending /admin to the base URL
- Here Mysql is Used as DB Engine
- One can Use any Other Database of once Choice.
- To use another DB Engine, need to Configure the Settings.py accordingly.
- For more refer the Link https://docs.djangoproject.com/en/2.1/ref/settings/#databases
Linux Machine:
- Steps will Be Same to Build This Web based tool,except the Commands to install Anaconda in Linux Machine and Others if any.