-
-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: JSON Masking #1899
feat: JSON Masking #1899
Conversation
I just pushed a commit fixing datetime parsing in AvroSerializer as I realised it led to some flaky tests as well. Not related to my changes here, but a nice to have (just noticed it when checking the workflow results) |
c5bf417
to
84d5d58
Compare
@jamfor352 I will take time to review it and give you my feedback next week |
Hi, have been using this build locally and it's awesome - following the PR |
Thank you :) Glad to hear it's worked well for you! |
@jamfor352 can you please update the documentation as well ? https://github.com/tchiotludo/akhq/blob/dev/docs/docs/configuration/akhq.md#data-masking |
|
||
public Record maskRecord(Record record) { | ||
try { | ||
if(record.getValue().trim().startsWith("{") && record.getValue().trim().endsWith("}")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should test for record.getValue() != null
first (for tombstones) to avoid generating an exception (even if this exception is catch just after)
|
||
public Record maskRecord(Record record) { | ||
try { | ||
if(record.getValue().trim().startsWith("{") && record.getValue().trim().endsWith("}")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should test for record.getValue() != null first (for tombstones) to avoid generating an exception (even if this exception is catch just after)
JsonMaskingFilter foundFilter = null; | ||
for (JsonMaskingFilter filter : jsonMaskingFilters) { | ||
if (record.getTopic().getName().equalsIgnoreCase(filter.getTopic())) { | ||
foundFilter = filter; | ||
} | ||
} | ||
if (foundFilter != null) { | ||
return applyMasking(record, foundFilter.getKeys()); | ||
} else { | ||
return applyMasking(record, List.of()); | ||
} | ||
} else { | ||
return record; | ||
} | ||
} catch (Exception e) { | ||
LOG.error("Error masking record", e); | ||
return record; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve readability, what do you think of this ?
if (record.getValue() == null) {
return record;
}
try {
jsonMaskingFilters.stream()
.filter(filter -> record.getTopic().getName().equalsIgnoreCase(filter.getTopic()))
.findFirst()
.ifPresentOrElse(filter -> applyMasking(record, filter.getKeys()).getValue(),
() -> applyMasking(record, List.of()).getValue());
} catch (Exception e) {
LOG.error("Error masking record", e);
}
return record;
JsonMaskingFilter foundFilter = null; | ||
for (JsonMaskingFilter filter : jsonMaskingFilters) { | ||
if (record.getTopic().getName().equalsIgnoreCase(filter.getTopic())) { | ||
foundFilter = filter; | ||
} | ||
} | ||
if (foundFilter != null) { | ||
return applyMasking(record, foundFilter.getKeys()); | ||
} else { | ||
return record; | ||
} | ||
} else { | ||
return record; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can improve it like my previous comment (just using ifPresent and the first parameter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MaskerFactory can be removed by adding @requires annotation on the different maskers
@RequiredArgsConstructor | ||
public class JsonMaskByDefaultMasker implements Masker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using this to create the masker only if we have the json_mask_by_default property will follow what we already have at some other places. And we won't need the Factory anymore :)
@Singleton
@Requires(property = "akhq.security.data-masking.mode", value = "json_mask_by_default")
@RequiredArgsConstructor
public class JsonMaskByDefaultMasker implements Masker {
@Inject
private final DataMasking dataMasking;
And then use dataMasking.getJsonFilters()
, etc.
@jamfor352 thanks for the PR it's really interesting 😄 |
Thanks, they all seem reasonable - will get around to it sometime this week. |
Will update in the coming days - been busy moving house so haven't had much free time! |
Keeping it fresh in minds - I'm very interested in deploying this build, I can obviously pull it down and make the changes myself but would prefer to use the master build |
Hey, definitely. Was planning to do this last week but quite a lot came up, so wasn't able to get around to it. I've set some time aside specifically for this PR this evening (7pm UK time) so should be all good. |
02eeaf2
to
7fb2499
Compare
Hi @AlexisSouquiere I've updated my PR based on recommendations (with one small difference, re: default masking filter, to not break current compatibility). Added documentation, too. I've also made some other improvements that I've spotted - made the filtering more consistent (ie, always using qualified paths for both JSON maskers), and also some slight performance improvements. cc: @emerfan could you try this new build out and confirm everything is working okay for you as well after the changes? Please note the change to |
Hi @jamfor352 this isn't working for me .... debugging I can see that masking is being applied for the record (I'm using
|
Could you provide an example record for me to try out with my end? All the tests are passing, which is odd based on how the masker is activated. In theory the logic hasnt changed either beyond the nested field adjustment. Will take another look. |
Here is a close sample, there's nothing complex about it though... I've tested this value in the unit tests and it also passes.
Have you tried running it and testing with the UI / Kafka? If I replace the updated maskRecord with your original one, it works fine. I can debug to the point I can see the masked record returned to RecordRepository but not as easy to see from there what's happening
|
Can you provide your full application.yml, if possible? |
|
I'm wondering if it's something to do with the "disable if no filters found" (which I have realised probably shouldnt be in the mask by default mode) nd there being a bug with determining this, or perhaps the JSON parsing Thanks for the YAML - I'll check it out later. |
Just to make this slightly weirder .... For easier testing, I created a topic and published a single message to it. When I run the application, the record is returned unmasked. When I run in debug mode and debug through the masker, the record is eventually returned (to the UI) masked as expected ... 😕 |
I'm not entirely sure what the issue here is... but I've changed a few things relating to how masking is validated to work (removed the |
Does this work as expected for you against a cluster? I'm still seeing the debug vs run issue |
I've found the issue ... it's with the new isTombstone and isJson methods. The record is initially null when it hits them. This works:
and
|
Good catch! I put this in the domain object as I thought it would be cleaner. |
@emerfan updated my branch, so it should all be working out of the box for you now. @AlexisSouquiere can I get a re-review please now changes have been applied and bugs fixed? 🙏 |
Looks great! I've run it against a substantial cluster. I like the mask all topics by default, and the updated handling of nested objects. This is a great feature, thanks! |
@jamfor352 I will review it tomorrow ! |
Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few updates needed and then it's good for me
List<RegexFilter> filters = new ArrayList<>(); // regex filters still use `filters` for backwards compatibility | ||
List<JsonMaskingFilter> jsonFilters = new ArrayList<>(); | ||
String jsonMaskReplacement = "xxxx"; | ||
boolean cachingEnabled = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reference to cachingEnabled, should be removed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
import com.google.gson.JsonParser; | ||
import com.google.gson.JsonSyntaxException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
import org.json.JSONArray; | ||
import org.json.JSONException; | ||
import org.json.JSONObject; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
public void setTopic(Topic topic) { | ||
this.topic = topic; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer adding Lombok @Setter
on the topic field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
public RegexMasker(DataMasking dataMasking ) { | ||
this.filters = dataMasking.getFilters(); | ||
} @Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Hey @AlexisSouquiere applied all the changes requested. Although I also realised I may as well inline the (Note - all the commits below are the same, I just rebased on upstream - the last commit mentioned is the only change since the review) Hope this looks all OK now 👍 |
…sing map rather than rely on side effects
…sking is not correctly applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me !
Thanks for the approval! What is the process in terms of getting this upstream, does it need more reviewers? |
@jamfor352 nothing required, thanks for the contributions 👏 |
Hi!
AKHQ is fantastic, but the data masking using regex doesn't really scale well if you have complex personal data (pii) requirements (for example, if you need to mask all fields by default apart from carefully selected fields in a production environment). Crafting a regex for this can be flaky. Especially if you need to deal with multiple lines and account for integers, arrays, all sorts of data types. Or if you want to be able to expose some fields on some topics, but not others, etc.
I've been trying my fork locally which introduces a new JSON filtering system (while keeping the regex stuff entirely in place) and it works great, and I'd like to contribute it upstream. I think I've added more than sufficient tests as well, and retained backwards compatibility by defaulting to using the regex filtering if not defined (which current users won't, since
mode
is a new property).It has two modes -
json_mask_by_default
(where everything is masked by default and you have to specifically opt in to see specific fields - useful for regulation-heavy environments) andjson_show_by_default
where everything shows by default but you can mask specific bits of data.