Skip to content

Commit

Permalink
Add ability to lint based on word boundaries
Browse files Browse the repository at this point in the history
This commit introduces a new syntax for matching words with the
ReservedWords linter and is intended to be used with the upcoming
sensitive words linter defined in #1364.

In addition to supporting wildcard searches ("*" prefix, suffix,
and contains), we now support matching based on word boundaries.

This commit introduces the "terms" keyword for word boundary
searches and adds dedicated abstractions for word boundary and
wildcard matching.

For example, "access key id" will match "AccessKeyId",
"access_key_id", "accessKeyID", "access_key_id100", "AccesKeyIDValue".
It will also match when all the words are concatenated together:
"accesskeyid". However, it will not match "accesskey_id" because it
only has two word boundaries ("accesskey" and "id").
  • Loading branch information
mtdowling committed Oct 24, 2022
1 parent c45d5db commit b25767a
Show file tree
Hide file tree
Showing 6 changed files with 538 additions and 50 deletions.
89 changes: 82 additions & 7 deletions docs/source-2.0/guides/model-linters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -196,10 +196,6 @@ ReservedWords
Validates that shape names and member names do not match a configured set of
reserved words.

Reserved words are compared in a case-insensitive manner via substring match
and support a leading and trailing wildcard character, "*". See
:ref:`wildcard evaluation <reserved-words-wildcards>` for more detail.

Rationale
Tools that generate code from Smithy models SHOULD automatically convert
reserved words into symbols that are safe to use in the targeted
Expand All @@ -223,9 +219,15 @@ Configuration
- Description
* - words
- [ ``string`` ]
- **Required**. A list of words that shape or member names MUST not
case-insensitively match. Supports only the leading and trailing
wildcard character of "*".
- A list of words that shape or member names MUST not case-insensitively
match. Supports a leading and trailing wildcard character of "*".
See :ref:`reserved-words-wildcards` for details.
* - terms
- [ ``string`` ]
- A list of search terms that match shape or member names
case-insensitively based on word boundaries (for example, the term
"access key id" matches "AccessKeyId", "access_key_id", and
"accesskeyid"). See :ref:`reserved-words-boundaries` for details.
* - selector
- ``string``
- Specifies a selector of shapes to validate for this configuration.
Expand Down Expand Up @@ -343,6 +345,79 @@ be specified.
* - **Codename**
- Match

.. _reserved-words-boundaries:

Reserved words boundary matching
--------------------------------

Word boundaries can be used to find reserved words. Word boundary search
text consists of one or more alphanumeric words separated by a single
space. When comparing against another string, the contents of the string
are separated into words based on word boundaries. Those words are
case-insensitively compared against the words in the search text for a match.

Word boundaries are detected when the casing between two characters changes,
or the type of character between two characters changes. The following table
demonstrates how comparison text is parsed into words.

.. list-table::
:header-rows: 1
:widths: 50 50

* - Comparison text
- Parsed words
* - accessKey
- access key
* - accessKeyID
- access key id
* - accessKeyIDValue
- access key id value
* - accesskeyId
- accesskey id
* - accessKey1
- access key 1
* - access_keyID
- access key id

The following table shows matches for a reserved term of ``secret id``,
meaning the word "secret" needs to be followed by the word "id". Word
boundary searches also match if the search terms concatenated together with
no spaces is considered a word in the search text (for example,
``secret id`` will match the word ``secretid``).

.. list-table::
:header-rows: 1
:widths: 75 25

* - Comparison text
- Result
* - Some\ **SecretId**
- Match
* - Some\ **SecretID**\ Value
- Match
* - Some\ **Secret__ID**\ __value
- Match
* - **secret_id**
- Match
* - **secret_id**\ 100
- Match
* - **secretid**
- Match
* - **secretid**\ _value
- Match
* - secretidvalue
- No Match
* - SecretThingId
- No match
* - SomeSecretid
- No match

.. admonition:: Syntax restrictions

* Empty search terms are not valid.
* Only a single space can appear between words in word boundary patterns.
* Leading and trailing spaces are not permitted in word boundary patterns.
* Word boundary patterns can only contain alphanumeric characters.


.. _StandardOperationVerb:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,7 @@
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Locale;
import java.util.Optional;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import software.amazon.smithy.model.Model;
import software.amazon.smithy.model.node.NodeMapper;
import software.amazon.smithy.model.selector.Selector;
Expand All @@ -34,7 +30,6 @@
import software.amazon.smithy.model.validation.Severity;
import software.amazon.smithy.model.validation.ValidationEvent;
import software.amazon.smithy.model.validation.ValidatorService;
import software.amazon.smithy.utils.OptionalUtils;

/**
* Emits validation events for a configuration of reserved words.
Expand All @@ -48,6 +43,7 @@
* <li>words: ([string]) A list of words that are
* case-insensitively reserved. Leading and trailing wildcards
* ("*") are supported.
* <li>terms: ([string]) A list of word boundary terms to test.</li>
* <li>selector: (string) Specifies a selector for this
* configuration. Defaults to validating all shapes, including
* member names.
Expand Down Expand Up @@ -83,9 +79,10 @@ public void setReserved(List<ReservedWords> reserved) {
* A single reserved words configuration.
*/
public static final class ReservedWords {
private List<String> words = Collections.emptyList();
private Selector selector = Selector.IDENTITY;
private String reason = "";
private final WildcardMatcher wildcardMatcher = new WildcardMatcher();
private final WordBoundaryMatcher wordMatcher = new WordBoundaryMatcher();

/**
* Sets the list of reserved word definitions.
Expand All @@ -96,16 +93,16 @@ public static final class ReservedWords {
* @param words Words to set.
*/
public void setWords(List<String> words) {
this.words = new ArrayList<>(words.size());
for (String word : words) {
if (word.equals("*")) {
throw new IllegalArgumentException("Reservations cannot be made against '*'");
}
if (CONTAINS_INNER_WILDCARD.matcher(word).find()) {
throw new IllegalArgumentException("Only preceding and trailing wildcards ('*') are supported.");
}
this.words.add(word.toLowerCase(Locale.ENGLISH));
}
words.forEach(wildcardMatcher::addSearch);
}

/**
* Sets the list of reserved word terms to match based on word boundaries.
*
* @param terms Terms to set.
*/
public void setTerms(List<String> terms) {
terms.forEach(wordMatcher::addSearch);
}

/**
Expand All @@ -126,8 +123,10 @@ public void setReason(String reason) {
this.reason = reason;
}

private Stream<ValidationEvent> validate(Model model) {
return selector.select(model).stream().flatMap(shape -> OptionalUtils.stream(validateShape(shape)));
private void validate(Model model, List<ValidationEvent> events) {
for (Shape shape : selector.select(model)) {
validateShape(shape).ifPresent(events::add);
}
}

private Optional<ValidationEvent> validateShape(Shape shape) {
Expand All @@ -139,28 +138,13 @@ private Optional<ValidationEvent> validateShape(Shape shape) {
}

/**
* Checks a passed word against the list of reserved words in this
* configuration. Validates these in a case-insensitive manner, and
* supports starting and ending wildcards '*'.
* Checks a passed word against the reserved words in this configuration.
*
* @param word A value that may be reserved.
* @return Returns true if the word is reserved by this configuration
*/
private boolean isReservedWord(String word) {
String compare = word.toLowerCase(Locale.US);
return words.stream().anyMatch(reservation -> {
// Comparisons against '*' have been rejected at configuration load.
if (reservation.startsWith("*")) {
if (reservation.endsWith("*")) {
return compare.contains(reservation.substring(1, reservation.lastIndexOf("*")));
}
return compare.endsWith(reservation.substring(1));
}
if (reservation.endsWith("*")) {
return compare.startsWith(reservation.substring(0, reservation.lastIndexOf("*")));
}
return compare.equals(reservation);
});
return wildcardMatcher.test(word) || wordMatcher.test(word);
}

private ValidationEvent emit(Shape shape, String word, String reason) {
Expand All @@ -182,21 +166,18 @@ public Provider() {
}
}

private static final Pattern CONTAINS_INNER_WILDCARD = Pattern.compile("^.+\\*.+$");

private final Config config;

private ReservedWordsValidator(Config config) {
this.config = config;

if (config.getReserved().isEmpty()) {
throw new IllegalArgumentException("Missing `reserved` words");
}
}

@Override
public List<ValidationEvent> validate(Model model) {
return config.getReserved().stream().flatMap(reservation -> reservation.validate(model))
.collect(Collectors.toList());
List<ValidationEvent> events = new ArrayList<>();
for (ReservedWords reserved : config.getReserved()) {
reserved.validate(model, events);
}
return events;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
/*
* Copyright 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://aws.amazon.com/apache2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/

package software.amazon.smithy.linters;

import java.util.ArrayList;
import java.util.List;
import java.util.Locale;
import java.util.function.Predicate;
import software.amazon.smithy.utils.StringUtils;

final class WildcardMatcher implements Predicate<String> {

private final List<Predicate<String>> predicates = new ArrayList<>();

@Override
public boolean test(String text) {
if (StringUtils.isEmpty(text)) {
return false;
}

text = text.toLowerCase(Locale.ENGLISH);
for (Predicate<String> predicate : predicates) {
if (predicate.test(text)) {
return true;
}
}

return false;
}

void addSearch(String pattern) {
if (StringUtils.isEmpty(pattern)) {
throw new IllegalArgumentException("Invalid empty pattern");
} else if (pattern.equals("*")) {
throw new IllegalArgumentException("Invalid wildcard pattern: *");
} else {
predicates.add(parseWildcardPattern(pattern));
}
}

private static Predicate<String> parseWildcardPattern(String pattern) {
boolean suffix = false;
boolean prefix = false;

// Find any leading or ending star, ensure that no inner stars are used.
StringBuilder result = new StringBuilder();
for (int i = 0; i < pattern.length(); i++) {
char c = pattern.charAt(i);
if (c == '*') {
if (i == 0) {
suffix = true;
} else if (i == pattern.length() - 1) {
prefix = true;
} else {
throw new IllegalArgumentException("Invalid inner '*' in wildcard pattern: " + pattern);
}
} else {
result.append(Character.toLowerCase(c));
}
}

String needle = result.toString();
if (suffix && prefix) {
return text -> text.contains(needle);
} else if (suffix) {
return text -> text.endsWith(needle);
} else if (prefix) {
return text -> text.startsWith(needle);
} else {
return text -> text.equals(needle);
}
}
}
Loading

0 comments on commit b25767a

Please sign in to comment.