Skip to content

Commit

Permalink
Initial setup (#1)
Browse files Browse the repository at this point in the history
prepare initial version

Co-authored-by: Dinakar <26552821+dinakar29@users.noreply.github.com>
Co-authored-by: KlaudiaBB <51341892+KlaudiaBB@users.noreply.github.com>
  • Loading branch information
3 people authored Feb 14, 2022
1 parent c40a8ae commit c98ae57
Show file tree
Hide file tree
Showing 13 changed files with 488 additions and 1 deletion.
1 change: 1 addition & 0 deletions .github/workflows/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @insightsengineering/idr
1 change: 1 addition & 0 deletions .github/workflows/ISSUE_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<!-- Hello! Please describe your issue -->
1 change: 1 addition & 0 deletions .github/workflows/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<!-- Thank you for your contribution! Please describe your PR -->
31 changes: 31 additions & 0 deletions .github/workflows/linter.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
name: SuperLinter

on:
pull_request:
branches:
- main
push:
branches:
- main

jobs:
lint:
name: Lint Code Base
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Lint Code Base
uses: github/super-linter/slim@v4
env:
VALIDATE_ALL_CODEBASE: false
DEFAULT_BRANCH: main
VALIDATE_R: true
VALIDATE_YAML: true
VALIDATE_BASH_EXEC: true
VALIDATE_DOCKERFILE_HADOLINT: true
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
100 changes: 100 additions & 0 deletions .github/workflows/test-action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
name: Test 🧪

on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
test-action:
runs-on: ubuntu-latest
name: Test action 🎬
strategy:
fail-fast: false
matrix:
reports:
- path: "nothing.txt"
configuration-file: "default"
configuration-data: ""
output: "auto"
lang-models: ""
- path: "fake_no.txt"
configuration-file: "default"
configuration-data: ""
output: "auto"
lang-models: ""
- path: "3_types.txt"
configuration-file: "default"
configuration-data: ""
output: "auto"
lang-models: ""
- path: "nothing.txt"
configuration-file: "limited"
configuration-data: ""
output: "auto"
lang-models: ""
- path: "fake_no.txt"
configuration-file: "limited"
configuration-data: ""
output: "auto"
lang-models: ""
- path: "3_types.txt"
configuration-file: "limited"
configuration-data: ""
output: "standard"
lang-models: ""
- path: "3_types.txt"
configuration-file: "limited"
configuration-data: ""
output: "github"
lang-models: ""
- path: "3_types.txt"
configuration-file: "limited"
configuration-data: ""
output: "colored"
lang-models: ""
- path: "3_types.txt"
configuration-file: "limited"
configuration-data: ""
output: "parsable"
lang-models: ""
- path: "3_types.txt"
configuration-file: "ignored-use-data-content"
configuration-data: |2
entities:
- PERSON
output: "auto"
lang-models: ""
- path: "."
configuration-file: "limited"
configuration-data: ""
output: "github"
lang-models: ""
- path: "fake_no.txt"
configuration-file: "none"
configuration-data: |2
language: en
entities:
- CREDIT_CARD
- PERSON
output: "parsable"
lang-models: |
pl_core_news_sm
pl_core_news_lg
steps:
- name: Checkout repo
uses: actions/checkout@v2

- name: Run test on ${{ matrix.reports.path }} 🏃‍♀️
uses: ./
with:
path: ./fixtures/${{ matrix.reports.path }}
configuration-file: ${{ matrix.reports.configuration-file }}
configuration-data: ${{ matrix.reports.configuration-data }}
output: ${{ matrix.reports.output }}
lang-models: ${{ matrix.reports.lang-models }}
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@

.presidio-output.txt
162 changes: 161 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,162 @@
# presidio-action
# Presidio Action

Github Actions that analyze Text for PII Entities with Microsoft Presidio framework.

## Author

Insights Engineering

## Inputs

* `path`:

_Description_: Path to verify

_Required_: `false`

_Default_: "."

* `configuration-file`:

_Description_: Path to configuration file or predefined configuration (default, limited)

_Required_: `false`

_Default_: "default"

* `configuration-data`:

_Description_: Simple configuration data directly in yaml format

_Required_: `false`

_Default_: ""

* `output`:

_Description_: Format of an output

_Required_: `false`

_Default_: "auto"

* `publish`:

_Description_: Publish result in PR comment

_Required_: `false`

_Default_: "true"

* `upload`:

_Description_: Upload result as an artifact

_Required_: `false`

_Default_: "true"

* `presidio-cli-version`:

_Description_: Presidio cli version - a version of presidio-cli used in action.

_Required_: `false`

_Default_: "1.0.0"

## Outputs

An output depends on the `output` parameter:

The default format is `auto`.

Available formats:

* standard - standard output format

```shell
tests/conftest.py
34:58 0.85 PERSON
37:33 0.85 PERSON
```

* github - similar to diff function in github

```shell
::group::tests/conftest.py
::0.85 file=tests/conftest.py,line=34,col=58::34:58 [PERSON]
::0.85 file=tests/conftest.py,line=37,col=33::37:33 [PERSON]
::endgroup::
```

* colored - standard output format but with colors

* parsable - easy to parse automaticaly

```shell
{"entity_type": "PERSON", "start": 57, "end": 62, "score": 0.85, "analysis_explanation": null}
{"entity_type": "PERSON", "start": 32, "end": 37, "score": 0.85, "analysis_explanation": null}
```

* auto - default format, switches automatically between those 2 modes:
* github, if run on github - environment variables `GITHUB_ACTIONS` and `GITHUB_WORKFLOW` are set
* colored, otherwise

## How it works

Presidio action uses [presidio-cli](https://pypi.org/project/presidio-cli/)
based on presidio-analyzer from [Microsoft Presidio framework](https://github.com/microsoft/presidio)
to check code against undesirable types of data such as 'EMAIL_ADDRESS' or 'PHONE_NUMBER' inside application's code.

For more information please see a full [list of supported entities](https://microsoft.github.io/presidio/supported_entities/).

## Usage

Example usage:

```yaml
---
name: Presidio check

on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
presidio-action:
runs-on: ubuntu-latest
name: Presidio check

steps:
- name: Checkout Code
uses: actions/checkout@v2

- name: Produce the presidio report
uses: insightsengineering/presidio-action@v1
# all parameters below are optional
with:
# path to project.
# if project does not have a specific 'my-project' path,
# '.' - current folder is a default value
path: "my-project"
# configuration-file - path to file with specific configuration
# or use one of predefined files:
# - default - `conf/default.yaml` file from action, check default list of entities
# and ignore content of `.git` folder
# - limited - `conf/limited.yaml` file from action, check only PERSON, EMAIL_ADDRESS and CREDIT_CARD
# and ignore `.git` folder and *.cfg files
configuration-file: "my-project/conf/my-presidio-config.yaml"
# configuration-data - content of configuration in raw yaml format.
# Give possibility to prepare own configuration without adding file to project
# any value in this field will block usage of configuration file
configuration-data: |
entities:
- PERSON
# output - specify one of output formats
output: "parsable"

```
Loading

0 comments on commit c98ae57

Please sign in to comment.