AC215 2024 PrivaSee

Team Members Glo Umutoni, Shira Aronson, Aditi Raju, Sammi Zhu, Yeabsira Mohammed

Group Name PrivaSee

Project Project When deciding on a messaging app, for example, the average consumer is unlikely to read or understand the terms and conditions of multiple apps and decide which one to use accordingly. This project aims to bridge consumers’ knowledge gaps around their data privacy by building an app that reviews terms and conditions agreements and informs users about the aspects of the privacy they cede by using a certain app or website. PrivaSEE would allow users to understand the implications to their data privacy, and compare options in a way that aligns with their personal privacy priorities.

Project Organization

See below for the organizational structure of the project. Containerizations are elaborted upon (note that Dockerfiles and additional bashscripts and pyenv files exist but have been omitted from overview for brevity). Additional files can be found in the codebase directory.

├── midterm_presentation
├── notebooks
├── reports
└── src
    ├── api-service/
    │   ├── api/
    │   │   ├── routers
    │   │   ├── utils
    │   ├── service.py
    ├── datapipeline/
    │   ├── clean_data_for_recommendations.py
    │   ├── clean_data.py
    │   ├── create_gemini_tuning_datasets.py
    │   ├── create_vertexai_datasets.py
    │   ├── get_data_for_recommendations.py
    │   └── scraping_prototype.py
    ├── deployment/
    ├── frontend-react/
        │   ├── public/
        │   ├── src/
        │   │   ├── app/
        │   │   │   ├── about/
        │   │   │   ├── recommend/
        │   │   │   ├── summarize/
        │   │   │   ├── auth.js
        │   │   │   ├── global.css
        │   │   │   ├── layout.jsx
        │   │   │   ├── page.jsx
        │   │   ├── components/
        │   │   │   ├── auth/
        │   │   │   ├── chat/
        │   │   │   ├── home/
        │   │   │   ├── layout/
        │   │   ├── services/
        │   │   │   ├── Common.js/
        │   │   │   ├── DataService.js/
    ├── models
    │   ├── tests/
    │   ├── category_weights.csv
    │   ├── get_issues.py
    │   ├── modeling_functions.py
    │   ├── multi_class_model.py
    │   └── privacy_grader.py
    └── workflow
├── LICENSE
├── README.md

Prerequisites and Setup Instructions

Please see below on different methods to set up and run the application. General packages used are also listed as in requirements.txt for ease of comparison with user's local package versions. However, this step is truly optional as the Dockerfile is configured via Pipfile to install the same dependices.

Running Docker

To run Dockerfile in either container, make sure to be in /src/desired-container:

Run the command bash docker-shell.sh
When set ran correctly, you should expect to see the following as demonstrated in the screenshot.

Running Project Locally

In src: (optional depending on local configuration):
```
pip install -r requirements.txt
```
In src/frontend-react:
```
npm install
npm run dev   
```

In src/api_service:

uvicorn api.service:app --reload --host 0.0.0.0 --port 9000

If issues arise, check that npm --version = 10.8.3 and nvm --version = 22.9.0

CI/CD Pipeline Implementation and Testing:

We implemented CI/CD Pipeline and Testing through Github Actions. The workflow files for automated deployment can be found in .github/workflows and src/deployment. Please see the following screenshots for automated deployment verification.

Github Actions Overview
Deployment In Progress
Deployment Success

We wrote tests to cover model logic in src/models as well as API endpoints for summarize and recommend functionalites in src/api_service. Tests can be found in src/models/tests. Please see the following screenshots for testing verification.

Running Tests

Uploading Test Coverage

Test Coverage (71%)

As shown above, our tests cover 71% of our code, including every file related to model logic and every file related to API services. However, we can still increase testing for specific functions. In recommend.py, we can increase testing for find_best_genre_match_with_gemini. In summarize.py, we can increase testing for get_grade. In privacy_grader.py, we can increase testing for create_case_mappings and create_category_mappings. In process_pdf.py, we can increase testing for extract_text_from_pdf. Besides these functions, our tests cover the main functionality of recommend, summarize, privacy_grader, and process_pdf, in addition to model logic.

Deployment Instructions

Note: The following provides an overview of the setup steps. .yml and Dockerfiles files can be found in src/deployment. For exact steps on what code to run, please visit here.

Deployment with Ansible (GCP Virtual Machine)

Run these commands:

ansible-playbook deploy-docker-images.yml -i inventory.yml
ansible-playbook deploy-create-instance.yml -i inventory.yml --extra-vars cluster_state=present
ansible-playbook deploy-provision-instance.yml -i inventory.yml
ansible-playbook deploy-setup-containers.yml -i inventory.yml
ansible-playbook deploy-setup-webserver.yml -i inventory.yml
ansible-playbook deploy-create-instance.yml -i inventory.yml --extra-vars cluster_state=absent

Deployment with Scaling (Kubernetes)

Run these commands:

ansible-playbook deploy-docker-images.yml -i inventory.yml
ansible-playbook deploy-k8s-cluster.yml -i inventory.yml --extra-vars cluster_state=present

See screenshots below for reference of what scaling verfication should look like on GCP after completion:

Usage Details and Examples

A React app was built to identify privacy issues in terms and conditions using a trained Gemini model on the backend. The Homepage (shown below) showcases the functionalities of the application and serves as the guide to other pages.

There are two core functionalities:

`Summarize:`

Users choose any file from their laptop to upload.
Once loaded, users are given the option to upload their file to the web-application.
A loading bar indicates to users the status of their file loading. Once successfully loaded, users can fetch for a grade.
The retrieved data returned maps the privacy issues to one of 22 privacy components (i.e. privacy, governance, etc) along with the overall privacy grade. Bars highlight the counts of various privacy components and the tables provide more description of the specific violations.

`Recommend:`

Users can use the chatbox to ask for a genre or an app similar to another app, along with the privacy concerns they want the app to be aware of.
A load spin appears while the backend retrieves the output response from the model.
The final results are outputed in a chat div format for readers to see what app the model recommended.

Additionally, there is also an About page that further describes the goals of our web application.

Known Issues and Limitations

Model Robustness

Despite setting a consistent temperature parameter to control the randomness of outputs, the model’s responses occasionally lack robustness. This can result in less robust outputs. Further fine-tuning or additional constraints might be required to enhance reliability across all use cases.

File Upload Time

The time required to upload a file is directly proportional to its size. Larger files, especially those exceeding several megabytes, can result in noticeable delays, potentially impacting user experience. Optimizations in file handling or upload infrastructure could help mitigate this issue in future iterations.

Variable Model Response Time

The response time for generating outputs from the model can vary depending on the complexity of the input query and server load. While most queries are resolved in a few seconds, users may occasionally experience delays (up to 15 seconds or more).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AC215 2024 PrivaSee

Project Organization

Prerequisites and Setup Instructions

Running Docker

Running Project Locally

CI/CD Pipeline Implementation and Testing:

Deployment Instructions

Deployment with Ansible (GCP Virtual Machine)

Deployment with Scaling (Kubernetes)

Usage Details and Examples

`Summarize:`

`Recommend:`

Known Issues and Limitations

Model Robustness

File Upload Time

Variable Model Response Time

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 403 Commits
.github/workflows		.github/workflows
midterm_presentation		midterm_presentation
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

AditiR-42/AC215_PrivaSEE

Folders and files

Latest commit

History

Repository files navigation

AC215 2024 PrivaSee

Project Organization

Prerequisites and Setup Instructions

Running Docker

Running Project Locally

CI/CD Pipeline Implementation and Testing:

Deployment Instructions

Deployment with Ansible (GCP Virtual Machine)

Deployment with Scaling (Kubernetes)

Usage Details and Examples

Summarize:

Recommend:

Known Issues and Limitations

Model Robustness

File Upload Time

Variable Model Response Time

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

`Summarize:`

`Recommend:`

Packages