Skip to content

Commit

Permalink
Add DLP samples (BigQuery, DeID, RiskAnalysis) (#474)
Browse files Browse the repository at this point in the history
* Add BigQuery samples + a few minor tweaks

* Update comments + fix failing test

* Sync w/codegen changes

* Add DeID samples

* Add DeID tests + remove infoTypes from DeID samples

* Remove unused option

* Add risk analysis samples

* Update README

* Add region tags + fix comment
  • Loading branch information
Ace Nassri authored Oct 18, 2017
1 parent 606a9d3 commit 14d6460
Show file tree
Hide file tree
Showing 13 changed files with 908 additions and 34 deletions.
1 change: 1 addition & 0 deletions dlp/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
**/*.result.png
67 changes: 65 additions & 2 deletions dlp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ The [Data Loss Prevention API](https://cloud.google.com/dlp/docs/) provides prog
* [Inspect](#inspect)
* [Redact](#redact)
* [Metadata](#metadata)
* [DeID](#deid)
* [Risk Analysis](#risk-analysis)
* [Running the tests](#running-the-tests)

## Setup
Expand Down Expand Up @@ -47,6 +49,7 @@ Commands:
Prevention API and the promise pattern.
gcsFileEvent <bucketName> <fileName> Inspects a text file stored on Google Cloud Storage using the Data Loss
Prevention API and the event-handler pattern.
bigquery <datasetName> <tableName> Inspects a BigQuery table using the Data Loss Prevention API.
datastore <kind> Inspect a Datastore instance using the Data Loss Prevention API.
Options:
Expand All @@ -56,14 +59,15 @@ Options:
[default: "LIKELIHOOD_UNSPECIFIED"]
-f, --maxFindings [number] [default: 0]
-q, --includeQuote [boolean] [default: true]
-l, --languageCode [string] [default: "en-US"]
-t, --infoTypes [array] [default: []]
-t, --infoTypes [array] [default: ["PHONE_NUMBER","EMAIL_ADDRESS","CREDIT_CARD_NUMBER"]]
Examples:
node inspect.js string "My phone number is (123) 456-7890 and my email address is me@somedomain.com"
node inspect.js file resources/test.txt
node inspect.js gcsFilePromise my-bucket my-file.txt
node inspect.js gcsFileEvent my-bucket my-file.txt
node inspect.js bigquery my-dataset my-table
node inspect.js datastore my-datastore-kind
For more information, see https://cloud.google.com/dlp/docs. Optional flags are explained at
https://cloud.google.com/dlp/docs/reference/rest/v2beta1/content/inspect#InspectConfig
Expand All @@ -81,6 +85,7 @@ __Usage:__ `node redact.js --help`
```
Commands:
string <string> <replaceString> Redact sensitive data from a string using the Data Loss Prevention API.
image <filepath> <outputPath> Redact sensitive data from an image using the Data Loss Prevention API.
Options:
--help Show help [boolean]
Expand All @@ -91,6 +96,7 @@ Options:
Examples:
node redact.js string "My name is Gary" "REDACTED" -t US_MALE_NAME
node redact.js image resources/test.png redaction_result.png -t US_MALE_NAME
For more information, see https://cloud.google.com/dlp/docs. Optional flags are explained at
https://cloud.google.com/dlp/docs/reference/rest/v2beta1/content/inspect#InspectConfig
Expand Down Expand Up @@ -124,6 +130,63 @@ For more information, see https://cloud.google.com/dlp/docs
[metadata_2_docs]: https://cloud.google.com/dlp/docs
[metadata_2_code]: metadata.js

### DeID

View the [documentation][deid_3_docs] or the [source code][deid_3_code].

__Usage:__ `node deid.js --help`

```
Commands:
mask <string> Deidentify sensitive data by masking it with a character.
fpe <string> <wrappedKey> <keyName> Deidentify sensitive data using Format Preserving Encryption (FPE).
Options:
--help Show help [boolean]
Examples:
node deid.js mask "My SSN is 372819127"
node deid.js fpe "My SSN is 372819127" <YOUR_ENCRYPTED_AES_256_KEY> <YOUR_KEY_NAME>
For more information, see https://cloud.google.com/dlp/docs.
```

[deid_3_docs]: https://cloud.google.com/dlp/docs
[deid_3_code]: deid.js

### Risk Analysis

View the [documentation][risk_4_docs] or the [source code][risk_4_code].

__Usage:__ `node risk.js --help`

```
Commands:
numerical <datasetId> <tableId> <columnName> Computes risk metrics of a column of numbers in a Google
BigQuery table.
categorical <datasetId> <tableId> <columnName> Computes risk metrics of a column of data in a Google
BigQuery table.
kAnonymity <datasetId> <tableId> [quasiIdColumnNames..] Computes the k-anonymity of a column set in a Google
BigQuery table.
lDiversity <datasetId> <tableId> <sensitiveAttribute> Computes the l-diversity of a column set in a Google
[quasiIdColumnNames..] BigQuery table.
Options:
--help Show help [boolean]
-p, --projectId [string] [default: "nodejs-docs-samples"]
Examples:
node risk.js numerical nhtsa_traffic_fatalities accident_2015 state_number -p bigquery-public-data
node risk.js categorical nhtsa_traffic_fatalities accident_2015 state_name -p bigquery-public-data
node risk.js kAnonymity nhtsa_traffic_fatalities accident_2015 state_number county -p bigquery-public-data
node risk.js lDiversity nhtsa_traffic_fatalities accident_2015 city state_number county -p bigquery-public-data
For more information, see https://cloud.google.com/dlp/docs.
```

[risk_4_docs]: https://cloud.google.com/dlp/docs
[risk_4_code]: risk.js

## Running the tests

1. Set the **GCLOUD_PROJECT** and **GOOGLE_APPLICATION_CREDENTIALS** environment variables.
Expand Down
163 changes: 163 additions & 0 deletions dlp/deid.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
/**
* Copyright 2017, Google, Inc.
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

'use strict';

function deidentifyWithMask (string, maskingCharacter, numberToMask) {
// [START deidentify_masking]
// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The string to deidentify
// const string = 'My SSN is 372819127';

// (Optional) The maximum number of sensitive characters to mask in a match
// If omitted from the request or set to 0, the API will mask any matching characters
// const numberToMask = 5;

// (Optional) The character to mask matching sensitive data with
// const maskingCharacter = 'x';

// Construct deidentification request
const items = [{ type: 'text/plain', value: string }];
const request = {
deidentifyConfig: {
infoTypeTransformations: {
transformations: [{
primitiveTransformation: {
characterMaskConfig: {
maskingCharacter: maskingCharacter,
numberToMask: numberToMask
}
}
}]
}
},
items: items
};

// Run deidentification request
dlp.deidentifyContent(request)
.then((response) => {
const deidentifiedItems = response[0].items;
console.log(deidentifiedItems[0].value);
})
.catch((err) => {
console.log(`Error in deidentifyWithMask: ${err.message || err}`);
});
// [END deidentify_masking]
}

function deidentifyWithFpe (string, alphabet, keyName, wrappedKey) {
// [START deidentify_fpe]
// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The string to deidentify
// const string = 'My SSN is 372819127';

// The set of characters to replace sensitive ones with
// For more information, see https://cloud.google.com/dlp/docs/reference/rest/v2beta1/content/deidentify#FfxCommonNativeAlphabet
// const alphabet = 'ALPHA_NUMERIC';

// The name of the Cloud KMS key used to encrypt ('wrap') the AES-256 key
// const keyName = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME';

// The encrypted ('wrapped') AES-256 key to use
// This key should be encrypted using the Cloud KMS key specified above
// const wrappedKey = 'YOUR_ENCRYPTED_AES_256_KEY'

// Construct deidentification request
const items = [{ type: 'text/plain', value: string }];
const request = {
deidentifyConfig: {
infoTypeTransformations: {
transformations: [{
primitiveTransformation: {
cryptoReplaceFfxFpeConfig: {
cryptoKey: {
kmsWrapped: {
wrappedKey: wrappedKey,
cryptoKeyName: keyName
}
},
commonAlphabet: alphabet
}
}
}]
}
},
items: items
};

// Run deidentification request
dlp.deidentifyContent(request)
.then((response) => {
const deidentifiedItems = response[0].items;
console.log(deidentifiedItems[0].value);
})
.catch((err) => {
console.log(`Error in deidentifyWithFpe: ${err.message || err}`);
});
// [END deidentify_fpe]
}

const cli = require(`yargs`)
.demand(1)
.command(
`mask <string>`,
`Deidentify sensitive data by masking it with a character.`,
{
maskingCharacter: {
type: 'string',
alias: 'c',
default: ''
},
numberToMask: {
type: 'number',
alias: 'n',
default: 0
}
},
(opts) => deidentifyWithMask(opts.string, opts.maskingCharacter, opts.numberToMask)
)
.command(
`fpe <string> <wrappedKey> <keyName>`,
`Deidentify sensitive data using Format Preserving Encryption (FPE).`,
{
alphabet: {
type: 'string',
alias: 'a',
default: 'ALPHA_NUMERIC',
choices: ['NUMERIC', 'HEXADECIMAL', 'UPPER_CASE_ALPHA_NUMERIC', 'ALPHA_NUMERIC']
}
},
(opts) => deidentifyWithFpe(opts.string, opts.alphabet, opts.keyName, opts.wrappedKey)
)
.example(`node $0 mask "My SSN is 372819127"`)
.example(`node $0 fpe "My SSN is 372819127" <YOUR_ENCRYPTED_AES_256_KEY> <YOUR_KEY_NAME>`)
.wrap(120)
.recommendCommands()
.epilogue(`For more information, see https://cloud.google.com/dlp/docs.`);

if (module === require.main) {
cli.help().strict().argv; // eslint-disable-line
}
Loading

0 comments on commit 14d6460

Please sign in to comment.