The pipe
mask is a powerful tool to be able to mask data that is nested in a complex multi-level array structure.
Consider the following structure :
data.json
{
"organizations": [
{
"domain": "company.com",
"persons": [
{
"name": "leona",
"surname": "miller",
"email": ""
},
{
"name": "joe",
"surname": "davis",
"email": ""
}
]
},
{
"domain": "company.fr",
"persons": [
{
"name": "alain",
"surname": "mercier",
"email": ""
},
{
"name": "florian",
"surname": "legrand",
"email": ""
}
]
}
]
}
organisations
is an array of organisation objects.- each organisation contains a field
persons
, this field is an array of person objects.
How to mask the email
field in each person to this format : {{.person.name}}.{{.person.surname}}@{{.domain}}
?
The first idea that might come to mind is something like:
masking-wrong.yml
version: "1"
seed: 42
masking:
- selector:
jsonpath: "organizations.persons.email"
mask:
# this go template syntax refer to a field that is not in a nested array
template: "{{.organizations.persons.name}}.{{.organizations.persons.surname}}@{{.organizations.domain}}"
Here is the result of applying the above configuration.
NOTE
All command lines are listed in demo.sh.
The command jq -c "."
used below is to reformat an indented multiline json structure into a single line (jsonl).
oups!
$ cat data.json | jq -c "." | pimo -c masking-wrong.yml
template: template:1:16: executing "template" at <.organizations.persons.name>: can't evaluate field persons in type model.Entry
This error occur because the templating syntaxe used by the mask template
is different as the syntax used in the jsonpath
property. PIMO can handle arrays and with the path .organizations.persons.name
it recognize the fields .organizations[*].persons[*].name
are to be masked (all the names, for all persons, for all organization).
The template mask however wants to know exactly which value to use, and it can't do it with the provided path. Because this path does not point to a valid location in the structure.
The second idea that might come to mind is to try to fix the template syntax.
The way to access an array in go template is :
masking-alsowrong.yml
version: "1"
seed: 42
masking:
- selector:
jsonpath: "organizations.persons.email"
mask:
# this go template syntax refer to a single values of index 0 in each array
# (and it's not very readable)
template: "{{(index (index .organizations 0).persons 0).name}}.{{(index (index .organizations 0).persons 0).surname}}@{{(index .organizations 0).domain}}"
Here is the result of applying the above configuration.
uh?
$ cat data.json | jq -c "." | pimo -c masking-alsowrong.yml | jq
{
"organizations": [
{
"domain": "company.com",
"persons": [
{
"email": "leona.miller@company.com",
"name": "leona",
"surname": "miller"
},
{
"email": "leona.miller@company.com",
"name": "joe",
"surname": "davis"
}
]
},
{
"domain": "company.fr",
"persons": [
{
"email": "leona.miller@company.com",
"name": "alain",
"surname": "mercier"
},
{
"email": "leona.miller@company.com",
"name": "florian",
"surname": "legrand"
}
]
}
]
}
The error is gone, but everyone has the email leona.miller@company.com
which is not what we want.
The truth is, by using only the template
mask (or any other except pipe
), it is impossible to have the correct expected result. That's why the mask pipe
was created.
This mask can process the persons objects like an independent stream of json.
The usecase exposed in the previous chapter is tackled in this part, in 2 steps.
This mask can process the persons objects like an independent stream of json. Its content is another masking
node defining a list of masks to apply.
masking-pipe-1.yml
version: "1"
seed: 42
masking:
- selector:
jsonpath: "organizations.persons"
mask:
pipe:
# starting here is the definition another masking pipeline, that applies on the persons objects
masking:
- selector:
jsonpath: "email"
mask:
# in the template, name and surname can be accessed directly
template: "{{.name}}.{{.surname}}"
Here is the result of applying the above configuration.
result
$ cat data.json | jq -c "." | pimo -c masking-pipe-1.yml | jq
{
"organizations": [
{
"domain": "company.com",
"persons": [
{
"email": "leona.miller",
"name": "leona",
"surname": "miller"
},
{
"email": "joe.davis",
"name": "joe",
"surname": "davis"
}
]
},
{
"domain": "company.fr",
"persons": [
{
"email": "alain.mercier",
"name": "alain",
"surname": "mercier"
},
{
"email": "florian.legrand",
"name": "florian",
"surname": "legrand"
}
]
}
]
}
The name and surname parts are now correct. The next step is is to handle the domain part.
The domain is not part of a person object, but stored in the parent object (organisation).
The parent object can be accessed with the injectParent
property of the pipe
mask. The value of the property will be used to name this field.
masking-pipe-2.yml
version: "1"
seed: 42
masking:
- selector:
jsonpath: "organizations.persons"
mask:
pipe:
# the parent of the person will be injected during the processing of the sub-pipeline, under the path ".org"
# the name "org" is an example, any valid identifier can be chosen
injectParent: "org"
masking:
- selector:
jsonpath: "email"
mask:
# now the template can read the value of the organization domain with .org.domain
template: "{{.name}}.{{.surname}}@{{.org.domain}}"
Here is the result of applying the above configuration.
result
$ cat data.json | jq -c "." | pimo -c masking-pipe-2.yml | jq
{
"organizations": [
{
"domain": "company.com",
"persons": [
{
"email": "leona.miller@company.com",
"name": "leona",
"surname": "miller"
},
{
"email": "joe.davis@company.com",
"name": "joe",
"surname": "davis"
}
]
},
{
"domain": "company.fr",
"persons": [
{
"email": "alain.mercier@company.fr",
"name": "alain",
"surname": "mercier"
},
{
"email": "florian.legrand@company.fr",
"name": "florian",
"surname": "legrand"
}
]
}
]
}
The pipe
mask also expose the injectRoot
property, similar to injectParent
except it will inject the whole current structure being processed.
The sub-pipeline definition can be in another YAML file.
masking-root.yml
version: "1"
masking:
- selector:
jsonpath: "organizations.persons"
mask:
pipe:
injectParent: "org"
file: "masking-org.yml"
masking-org.yml
version: "1"
masking:
- selector:
jsonpath: "email"
mask:
template: "{{.name}}.{{.surname}}@{{.org.domain}}"
Pipes are compatible with caches.
If a cache mut be shared by the main pipeline and all the sub-pipelines, it must be declared at the root.
masking-cache.yml
version: "1"
seed: 42
masking:
- selector:
jsonpath: "age"
cache: "age"
mask:
randomInt:
min: 0
max: 100
- selector:
jsonpath: "related"
mask:
pipe:
masking:
- selector:
jsonpath: "age"
cache: "age"
mask:
randomInt:
min: 0
max: 100
# declared here, the cache will be shared by all sub-pipelines and the main pipeline
caches:
age : {}
data-cache.jsonl
{"age": 10, "related": [{"age":30}]}
{"age": 20, "related": [{"age":40}]}
{"age": 30, "related": [{"age":10}]}
{"age": 40, "related": [{"age":20}]}
Here is the result of applying the above configuration.
result
$ cat data-cache.jsonl | jq -c "." | pimo -c masking-cache.yml
{"age":91,"related":[{"age":55}]}
{"age":25,"related":[{"age":84}]}
{"age":55,"related":[{"age":91}]}
{"age":84,"related":[{"age":25}]}
NOTE
Pipes are currently NOT compatible with the fromCache
mask.
The use of fromCache
inside a pipe is discouraged, an the results might be unexpected.
However, an approach like the one presented just above, by referencing a cache in multiple position will give a correct and expected result. The only difference with fromCache
is that the mask must be duplicated in two locations.