wernicke

A redaction tool for structured data. Run wernicke with JSON on stdin, get redacted values out. Preserves structure and (to some extent) semantics. You might want this because you have test data where the actual values are sensitive. Because the changes are consistent within the data and the overall data structure is preserved, there a better chance your data will stay suitable for testing, even though it's been scrubbed.

Most people run wernicke on a shell, so you either have json_producing_thing | wernicke or wernicke < some_file.json > redacted.json. EDN is also supported. See wernicke --help for additional information.

Example input	Example output
IPs, MAC addresses, timestamps, various AWS identifiers, and a few other types of strings are redacted to strings of the same type: IPs to IPs, SGs to SGs, et cetera. If these strings have an alphanumeric id, that id will have the same length.
{ "long_val": "ABBBAAAABBBBAAABBBAABB", "ip": "10.0.0.1", "mac": "ff:ff:ff:ff:ff:ff", "timestamp": "2017-01-01T12:34:56.000Z", "ec2": "ip-10-0-0-1.ec2.internal", "security_group": "sg-12345", "vpc": "vpc-abcdef", "aws_access_key": "AKIAXXXXXXXXXXXXXXXX", "aws_role_cred": "AROAYYYYYYYYYYYYYYYY" }	{ "long_val": "teyjdaeqEYGw18fRIt5vLo", "ip": "254.65.252.245", "mac": "aa:3e:91:ab:3b:3a", "timestamp": "2044-19-02T20:32:55.72Z", "ec2": "ip-207-255-185-237.ec2.internal", "security_group": "sg-887b8", "vpc": "vpc-a9d96a", "aws_access_key": "AKIAQ5E7IHRMOW7YABLS", "aws_role_cred": "AROA6QA7SQTM6YWS4F0H" }
Redaction happens in arbitrarily nested structures.
{ "a": { "b": [ "c", "d", { "e": "10.0.0.1" } ] } }	{ "a": { "b": [ "c", "d", { "e": "1.212.241.246" } ] } }
In addition to values in the tree, keys are also redacted, even nested ones.
{ "vpc-12345": { "sg-abcdef": { "instance_count": 5 } } }	{ "vpc-ec60f": { "sg-086fd3": { "instance_count": 5 } } }
Redaction also happens in the middle of strings.
{ "x": "i-abc123 is in sg-12345" }	{ "x": "i-26a1bf is in sg-77aff" }
The redacted values will change across runs (this is necessary to make redaction irreversible).
{ "ip": "10.0.0.1", "mac": "ff:ff:ff:ff:ff:ff" }	{ "ip": "246.220.253.214", "mac": "dc:08:90:75:e3:91" }
Redacted values _are_ consistent within runs. If the input contains the same value multiple times it will get redacted identically. This allows you to still do correlation in the result.
{ "ip": "10.0.0.1", "also_ip": "10.0.0.1" }	{ "ip": "247.226.167.9", "also_ip": "247.226.167.9" }

(These examples were pretty-printed for viewing comfort, but wernicke does not do that for you. Try jq.)

Installation

Download from https://github.com/latacora/wernicke/releases

Configuration

We try to do something reasonable for most use cases. If you have a generally useful redactions, please consider contributing them. However, sometimes redaction behavior really does need to be configured. Pass an EDN literal on the command line like so: wernicke --config '{:some-rules "detailed below"}'.

Right now this requires a pretty extensive understanding of how wernicke works--we want to make this more accessible, though! If there's a specific thing you want to accomplish, feel free to write a ticket.

Adding extra rules

For example, to redact all numbers, add the following structure to your EDN:

{:extra-rules
  [{:name :numbers
    :type :regex
    :pattern "\\d*"}]}

The extra rules will be compiled before use, so e.g. you do not need to specify the parsed regex structure for this to work.

Disabling rules by name

Add the following structure to your EDN:

{:disabled-rules [:latacora.wernicke.patterns/arn-re]}

This still requires you to know what the rule names are. You can find these in latacora.wernicke.core/default-config.

Development

To run the project directly from a source checkout:

$ clj -m latacora.wernicke.cli

To run the project's tests:

$ clj -A:test

To build a native image:

$ clj -A:native-image

(This requires GraalVM to be installed with SubstrateVM, and the GRAAL_HOME environment variable to be set.)

Namesake

Named after Carl Wernicke, a German physician who did research on the brain. Wernicke's aphasia is a condition where patients demonstrate fluent speech with intact syntax but with nonsense words. This tool is kind of like that: the resulting structure is maintained but all the words are swapped out with (internally consistent) nonsense.

License

This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
.clj-kondo		.clj-kondo
.github/workflows		.github/workflows
src/latacora/wernicke		src/latacora/wernicke
test-resources		test-resources
test/latacora/wernicke		test/latacora/wernicke
.dir-locals.el		.dir-locals.el
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
carl.jpg		carl.jpg
deps.edn		deps.edn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wernicke

Installation

Configuration

Adding extra rules

Disabling rules by name

Development

Namesake

License

About

Releases

Packages

Contributors 2

Languages

License

AmarjitGhuman/wernicke

Folders and files

Latest commit

History

Repository files navigation

wernicke

Installation

Configuration

Adding extra rules

Disabling rules by name

Development

Namesake

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages