Github Action that analyzes text for PII entities with Microsoft's Presidio framework.
Insights Engineering
-
path
:Description: Path to verify
Required:
false
Default: "."
-
configuration-file
:Description: Path to custom configuration file
Required:
false
Default: "default"
-
configuration-data
:Description: Configuration data as an inline YAML configuration
Required:
false
Default: ""
-
output
:Description: Format of output
Required:
false
Default: "auto"
-
publish
:Description: Publish result as a PR comment
Required:
false
Default: "true"
-
upload
:Description: Upload results as an artifact
Required:
false
Default: "true"
-
presidio-cli-version
:Description: Presidio CLI version
Required:
false
Default: "latest"
-
lang-models
:Description: List of additional language models to install
Required:
false
Default: ""
-
only-changed-files
:Description: Only run checks for changed files
Required:
false
Default:
false
An output depends on the output
parameter:
The default format is auto
.
Available formats:
- standard - standard output format
tests/conftest.py
34:58 0.85 PERSON
37:33 0.85 PERSON
- github - similar to diff function in github
::group::tests/conftest.py
::0.85 file=tests/conftest.py,line=34,col=58::34:58 [PERSON]
::0.85 file=tests/conftest.py,line=37,col=33::37:33 [PERSON]
::endgroup::
-
colored - standard output format but with colors
-
parsable - easy to parse automaticaly
{"entity_type": "PERSON", "start": 57, "end": 62, "score": 0.85, "analysis_explanation": null}
{"entity_type": "PERSON", "start": 32, "end": 37, "score": 0.85, "analysis_explanation": null}
- auto - default format, switches automatically between those 2 modes:
- github, if run on github - environment variables
GITHUB_ACTIONS
andGITHUB_WORKFLOW
are set - colored, otherwise
- github, if run on github - environment variables
Presidio action uses presidio-cli based on presidio-analyzer from Microsoft Presidio framework to check code against undesirable types of data such as 'EMAIL_ADDRESS' or 'PHONE_NUMBER' inside application's code.
For more information please see a full list of supported entities.
Example usage:
---
name: Presidio check
on:
push:
branches:
- main
pull_request:
branches:
- main
jobs:
presidio-action:
runs-on: ubuntu-latest
name: Presidio check
steps:
- name: Checkout Code
uses: actions/checkout@v3
with:
# 0 fetch-depth is needed if you set `only-changed-files` to true
# and if you are configuring this check to run on push events
fetch-depth: 0
- name: Produce the presidio report
uses: insightsengineering/presidio-action@v1
# all parameters below are optional
with:
# path to project.
# if project does not have a specific 'my-project' path,
# '.' - current folder is a default value
path: "my-project"
# configuration-file - path to file with specific configuration
# or use one of predefined files:
# - default - `conf/default.yaml` file from action repository, check default list of entities
# and ignore content of `.git` folder
# - limited - `conf/limited.yaml` file from action repository, check only PERSON, EMAIL_ADDRESS and CREDIT_CARD
# and ignore `.git` folder and *.cfg files
configuration-file: "my-project/conf/my-presidio-config.yaml"
# configuration-data - content of configuration in raw yaml format.
# Give possibility to prepare own configuration without adding file to project
# any value in this field will block usage of configuration file
configuration-data: |
entities:
- PERSON
threshold: 0.9
# output - specify one of output formats
output: "parsable"
# only-changed-files - only run the check for files that were changed
# NOTE: You must set fetch-depth: 0 in the actions/checkout@v3 step
# for push events while this paramater is set to true
only-changed-files: true
Example of comment added to the PR: