Skip to content

Latest commit

 

History

History
154 lines (115 loc) · 6.99 KB

README.md

File metadata and controls

154 lines (115 loc) · 6.99 KB

Lakera - ChatGPT Data Leak Protection

Lakera Chrome Extension provides a privacy guard that protects you against sharing sensitive information into your conversations with ChatGPT. Whether it is a document that needs to be summarized and contains personal information or accidentally pasting private data from your clipboard to ChatGPT, the extension protects you from disclosing important data.

You can install it into your Chrome browser from: https://chrome.google.com/webstore/detail/lakera-chatgpt-data-leak/npdeilagbbimhnbbdjmagmedchnpjeid Give it a go and leave us a review if you like it! Any feedback is welcome and appreciated!

Goals

  • Bringing the AI industry to new security standards
  • Enable a new layer of security for people that use ChatGPT frequently by protecting them from getting their data exposed
  • Offering a good enough solution for companies that have restricted their employees from using ChatGPT at work
  • Opening the source code publicly so that everyone can contribute to the project and create a solution that runs locally in the users' browser without any remote connection

Main features

  • 7 predefined detectors of private data that are all run by default for the following entities: credit card numbers, names, email addresses, phone numbers, US addresses, US social security numbers, secret keys
  • Customizability in turning on or off any of the predefined detectors if they might overtrigger on your use cases (e.g.: You have many names in your prompts and don't want the extension to notify on names.). You can do that by clicking on the extension icon and then on the toggle right next to the detector you want to switch on/off.
  • Usable in either GPT-3.5 or GPT-4 conversations

Project structure

  • dist: The built code that should be uploaded to Chrome extensions (created after build)
  • patches: Patches that were done to external libraries that the extension uses.
  • public: Static files and manifest.json that configures the permissions, what scripts to be run by the extension, on what webpages does the extension have access etc.
  • src: Extension backend and frontend code, unit tests for detectors.
  • vite, babel and jest: Compile the TypeScript code into JavaScript code

Prerequisites

Setup & Watchmode build

npm install
npm run dev

Build

npm run build

Running the extension

Go to: chrome://extensions/ and click Load unpacked and upload the dist folder generated by the npm run build command.

License

Lakera Chrome Extension source code is under GNU GPLv3 permissions, conditions and limitations.

How does it work?

Currently, all detectors run based on pattern matching. They all use clever enough regular expressions that catch most of the private data that someone could insert into their conversations with ChatGPT (taking into account only the supported detectors).

If any of the detectors recognize a pattern for their use case, the user receives a notification detailing what the extension has detected so that they know what to edit in their prompt.

Users are given two choices: either to edit their prompt and try to resubmit or submit regardless of what the extension detected. It has been designed this way so that the user is in charge of determining whether the context truly involves private data or not.

Text recognizition and analyzer

In order to notify users when some part of their prompt contains sensitive information the extension firstly fetches the textarea from the ChatGPT conversation webpage where users are expected to input their prompts as well as the submit button. There are a couple of event listeners that are afterward attached to the fetched textarea and submit button for different events like "keydown", "input" or "click". They all work together to seamlessly update the extension badge and detect if any private data is present in the prompt.

Whenever the user changes the conversation, the textarea and the submit button are getting fetched again due to a Mutation Observer that watches for changes on the entire DOM. The fetching of these elements is retried with a delay of 500ms as it is expected that the DOM might not be fully loaded from the first attempts.

When the submit button is clicked, the input is passed to the extension, then analyzed by all detectors and if any of them are triggered the input gets flagged to the user as a SweetAlert notification on the webpage, otherwise the user can safely proceed with their prompt. In order to make things simpler, the extension blocks the user all the time whenever they click on the button and later decides whether to let the prompt through or flag it accordingly.

Our detectors have their own pattern that they look for matchings, based on regular expressions. Some of them use external libraries specifically designed to handle those kinds of inputs. They also have another layer based on our custom heuristics in order to have a better coverage of the potential inputs that the extension might have to analyze. They run fully local and there is no remote connection to any other party.

Tests

Every detector has their own unit tests in the src/__tests__/detectors_tests directory, each of them being tested on a bunch of examples.

Other tests ensure that the user prompt is not larger than a preset safe limit and a support limit after which the extension does not run due to performance constraints. You can find those in src/__tests__/character_limit_tests.

Chrome extensions API

In order to create a nice experience some things have to run in background and in order to allow users to customize which detectors to turn on or off we have used the Chrome extensions API to serve this purpose.

Message passing

We have used the API for passing messages between three main entities: the content script, the background script and the popup script. All of these messages are securely sent between these entities as there is no network exposure at all as they communicate locally within your machine. Message passing is needed as all these three components interact with each other so information that lives in one of them is needed at certain times into the other and so on.

Data store

Moreover, we have also used the API for storing the total number of detections since the extension was installed by the user. We sync that with all of the user devices that they are logged into Google Chrome with their Google account. When the extension is installed by the user all detectors are by default set to be run. If a user changes that to any of the detectors, we store it in the extension synced storage space.

Does it send any data outside?

The answer is NO. Lakera Chrome Extension runs fully locally and it does not send your private data to Lakera or to any other third-party entity outside your machine.

Built with ❤️ by Lakera