Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sensitive validator branch #1364

Merged
merged 4 commits into from
Nov 15, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 57 additions & 5 deletions docs/source-2.0/guides/model-linters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,58 @@ Example:
{name: "CamelCase"}
]

.. _MissingSensitiveTrait:

MissingSensitiveTrait
=====================

This validator scans shape or member names and identifies ones that look like they could contain
sensitive information but are not marked with the ``@sensitive`` trait. This does not apply to
shapes where the ``@sensitive`` trait would be invalid. Users may also configure this validator
with a custom list of terms, and choose to ignore the built-in defaults. The defaults terms include
categories of personal information such as 'birth day', 'billing address', 'zip code', or 'gender'.
rchache marked this conversation as resolved.
Show resolved Hide resolved

Rationale
Sensitive information often incurs legal requirements regarding the handling and logging
of it. Mistakenly not marking sensitive data accordingly carries a large risk, and it is
helpful to have an automated validator to catch instances of this rather than rely on best efforts.

Default severity
``WARNING``

Configuration
.. list-table::
:header-rows: 1
:widths: 20 20 60

* - Property
- Type
- Description
* - terms
- [ ``string`` ]
- A list of search terms that match shape or member names
case-insensitively based on word boundaries (for example, the term
"access key id" matches "AccessKeyId", "access_key_id", and
"accesskeyid"). See :ref:`words-boundaries` for details.
* - excludeDefaults
- ``boolean``
- A flag indicating whether or not to disregard the default set
rchache marked this conversation as resolved.
Show resolved Hide resolved
of terms. This property is not required and defaults to false.
If set to true, ``terms`` must be provided.

Example:

.. code-block:: smithy

$version: "2"

metadata validators = [{
name: "MissingSensitiveTrait"
configuration: {
excludeDefaults: false,
terms: ["social security number"]
rchache marked this conversation as resolved.
Show resolved Hide resolved
}
}]

.. _NoninclusiveTerms:

Expand Down Expand Up @@ -227,7 +279,7 @@ Configuration
- A list of search terms that match shape or member names
case-insensitively based on word boundaries (for example, the term
"access key id" matches "AccessKeyId", "access_key_id", and
"accesskeyid"). See :ref:`reserved-words-boundaries` for details.
"accesskeyid"). See :ref:`words-boundaries` for details.
* - selector
- ``string``
- Specifies a selector of shapes to validate for this configuration.
Expand Down Expand Up @@ -345,12 +397,12 @@ be specified.
* - **Codename**
- Match

.. _reserved-words-boundaries:
.. _words-boundaries:

Reserved words boundary matching
Words boundary matching
--------------------------------
rchache marked this conversation as resolved.
Show resolved Hide resolved

Word boundaries can be used to find reserved words. Word boundary search
Word boundaries can be used to find terms of interest. Word boundary search
text consists of one or more alphanumeric words separated by a single
space. When comparing against another string, the contents of the string
are separated into words based on word boundaries. Those words are
Expand Down Expand Up @@ -379,7 +431,7 @@ demonstrates how comparison text is parsed into words.
* - access_keyID
- access key id

The following table shows matches for a reserved term of ``secret id``,
The following table shows matches for a search term of ``secret id``,
meaning the word "secret" needs to be followed by the word "id". Word
boundary searches also match if the search terms concatenated together with
no spaces is considered a word in the search text (for example,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
/*
* Copyright 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://aws.amazon.com/apache2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/

package software.amazon.smithy.linters;

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.Set;
import software.amazon.smithy.model.Model;
import software.amazon.smithy.model.node.NodeMapper;
import software.amazon.smithy.model.shapes.MemberShape;
import software.amazon.smithy.model.shapes.Shape;
import software.amazon.smithy.model.traits.SensitiveTrait;
import software.amazon.smithy.model.validation.AbstractValidator;
import software.amazon.smithy.model.validation.ValidationEvent;
import software.amazon.smithy.model.validation.ValidatorService;
import software.amazon.smithy.utils.ListUtils;
import software.amazon.smithy.utils.SetUtils;

/**
* <p>Validates that shapes and members that possibly contain sensitive data are marked with the sensitive trait.
*/
public final class MissingSensitiveTraitValidator extends AbstractValidator {
static final Set<String> DEFAULT_SENSITIVE_TERMS = SetUtils.of(
rchache marked this conversation as resolved.
Show resolved Hide resolved
"account number",
"bank",
"billing address",
"birth day",
"birth",
mtdowling marked this conversation as resolved.
Show resolved Hide resolved
"citizen ship",
rchache marked this conversation as resolved.
Show resolved Hide resolved
"credit card",
"driver license",
"drivers license",
"email",
rchache marked this conversation as resolved.
Show resolved Hide resolved
"ethnicity",
"first name",
"gender",
"insurance",
"ip address",
"last name",
"mailing address",
"passport",
"phone",
rchache marked this conversation as resolved.
Show resolved Hide resolved
"religion",
"sexual orientation",
"social security",
"ssn",
"tax payer",
"telephone",
rchache marked this conversation as resolved.
Show resolved Hide resolved
"user name",
rchache marked this conversation as resolved.
Show resolved Hide resolved
"zip code"
);

private final WordBoundaryMatcher wordMatcher;

public static final class Provider extends ValidatorService.Provider {
public Provider() {
super(MissingSensitiveTraitValidator.class, node -> {
NodeMapper mapper = new NodeMapper();
return new MissingSensitiveTraitValidator(
mapper.deserialize(node, MissingSensitiveTraitValidator.Config.class));
});
}
}

/**
* MissingSensitiveTrait configuration.
*/
public static final class Config {
private List<String> terms = ListUtils.of();
private boolean excludeDefaults;

public List<String> getTerms() {
return terms;
}

public void setTerms(List<String> terms) {
this.terms = terms;
}

public boolean getExcludeDefaults() {
return excludeDefaults;
}

public void setExcludeDefaults(boolean excludeDefaults) {
this.excludeDefaults = excludeDefaults;
}
}

private MissingSensitiveTraitValidator(Config config) {
wordMatcher = new WordBoundaryMatcher();
if (config.getExcludeDefaults() && config.getTerms().isEmpty()) {
//This configuration combination makes the validator a no-op.
throw new IllegalArgumentException("Cannot set 'excludeDefaults' to true and leave "
rchache marked this conversation as resolved.
Show resolved Hide resolved
+ "'terms' unspecified.");
}

config.getTerms().forEach(wordMatcher::addSearch);

if (!config.getExcludeDefaults()) {
DEFAULT_SENSITIVE_TERMS.forEach(wordMatcher::addSearch);
}
}

/**
* Finds shapes without the sensitive trait that possibly contain sensitive data,
* based on the shape/member name and the list of key words and phrases.
*
* @param model Model to validate.
* @return list of violation events
*/
@Override
public List<ValidationEvent> validate(Model model) {
List<ValidationEvent> validationEvents = new ArrayList<>();
validationEvents.addAll(scanShapeNames(model));
validationEvents.addAll(scanMemberNames(model));
return validationEvents;
}

private List<ValidationEvent> scanShapeNames(Model model) {
rchache marked this conversation as resolved.
Show resolved Hide resolved
List<ValidationEvent> validationEvents = new ArrayList<>();

for (Shape shape : model.toSet()) {
// Sensitive trait cannot be applied to the 4 types below
if (!shape.isMemberShape()
&& !shape.isOperationShape()
&& !shape.isServiceShape()
&& !shape.isResourceShape()
&& !shape.hasTrait(SensitiveTrait.class)) {
Optional<ValidationEvent> optionalValidationEvent =
detectSensitiveTerms(shape.toShapeId().getName(), shape);
optionalValidationEvent.ifPresent(validationEvents::add);
}
}

return validationEvents;
}

private List<ValidationEvent> scanMemberNames(Model model) {
List<ValidationEvent> validationEvents = new ArrayList<>();

for (MemberShape memberShape : model.getMemberShapes()) {
Shape containingShape = model.expectShape(memberShape.getContainer());
Shape targetShape = model.expectShape(memberShape.getTarget());

if (!containingShape.hasTrait(SensitiveTrait.class) && !targetShape.hasTrait(SensitiveTrait.class)) {
Optional<ValidationEvent> optionalValidationEvent =
detectSensitiveTerms(memberShape.getMemberName(), memberShape);
optionalValidationEvent.ifPresent(validationEvents::add);
}
}

return validationEvents;
}

private Optional<ValidationEvent> detectSensitiveTerms(String name, Shape shape) {
Optional<String> matchedTerm = wordMatcher.getAllMatches(name).stream().findAny();
rchache marked this conversation as resolved.
Show resolved Hide resolved

return matchedTerm.map(s -> emit(shape, s));
}

private ValidationEvent emit(Shape shape, String word) {
String message = shape.isMemberShape()
? String.format("This member possibly contains sensitive data but neither the enclosing nor target"
+ " shape are marked with the sensitive trait (based on the presence of '%s')", word)
: String.format("This shape possibly contains sensitive data but is not marked "
+ "with the sensitive trait (based on the presence of '%s')", word);

return warning(shape, message);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
import java.util.Map;
import java.util.Set;
import java.util.function.Predicate;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import software.amazon.smithy.utils.StringUtils;

/**
Expand Down Expand Up @@ -58,18 +60,28 @@ public void addSearch(String terms) {

@Override
public boolean test(String text) {
if (text == null || text.isEmpty() || words.isEmpty()) {
return false;
}
return matchedTermsAsStream(text)
rchache marked this conversation as resolved.
Show resolved Hide resolved
.findAny()
.isPresent();
}

String searchString = searchCache.computeIfAbsent(text, WordBoundaryMatcher::splitWords);
for (String needle : words) {
if (testWordMatch(needle, searchString)) {
return true;
}
/**
* Returns all the terms that the input text matched.
* @param text the String within which to search for matches
* @return set of all matches
*/
public Set<String> getAllMatches(String text) {
return matchedTermsAsStream(text).collect(Collectors.toSet());
}

private Stream<String> matchedTermsAsStream(String text) {
rchache marked this conversation as resolved.
Show resolved Hide resolved
if (text == null || text.isEmpty() || words.isEmpty()) {
return Stream.empty();
}

return false;
String haystack = searchCache.computeIfAbsent(text, WordBoundaryMatcher::splitWords);
return words.stream()
.filter(needle -> testWordMatch(needle, haystack));
}

private boolean testWordMatch(String needle, String haystack) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ software.amazon.smithy.linters.ReservedWordsValidator$Provider
software.amazon.smithy.linters.ShouldHaveUsedTimestampValidator$Provider
software.amazon.smithy.linters.StandardOperationVerbValidator$Provider
software.amazon.smithy.linters.StutteredShapeNameValidator$Provider
software.amazon.smithy.linters.MissingSensitiveTraitValidator$Provider
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[WARNING] smithy.example#FooOperationRequest: This shape possibly contains sensitive data but is not marked with the sensitive trait (based on the presence of 'foo') | DefaultMissingSensitiveTrait
[WARNING] smithy.example#FooOperationRequest$secondMember: This member possibly contains sensitive data but neither the enclosing nor target shape are marked with the sensitive trait (based on the presence of 'second member') | DefaultMissingSensitiveTrait
[WARNING] smithy.example#FooOperationResponse: This shape possibly contains sensitive data but is not marked with the sensitive trait (based on the presence of 'foo') | DefaultMissingSensitiveTrait
[WARNING] smithy.example#MyString: This shape possibly contains sensitive data but is not marked with the sensitive trait (based on the presence of 'string') | DefaultMissingSensitiveTrait
[WARNING] smithy.example#BillingInfo$bank: This member possibly contains sensitive data but neither the enclosing nor target shape are marked with the sensitive trait (based on the presence of 'bank') | DefaultMissingSensitiveTrait
[WARNING] smithy.example#BillingInfo$data: This member possibly contains sensitive data but neither the enclosing nor target shape are marked with the sensitive trait (based on the presence of 'data') | DefaultMissingSensitiveTrait
Loading