Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

133 string find wdl 1.2 #605

Merged
merged 6 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ version development
consist of multiple files.
[PR 241](https://github.com/openwdl/wdl/pull/241) by @cjllanwarne.

version 1.2.0
---------------------------

+ Added `find`, and `matches` functions.

version 1.1.1
---------------------------

Expand Down
119 changes: 117 additions & 2 deletions SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,8 @@ Revisions to this specification are made periodically in order to correct errors
- [`min`](#min)
- [`max`](#max)
- [String Functions](#string-functions)
- [✨ `find`](#-find)
- [✨ `matches`](#-matches)
- [`sub`](#sub)
- [File Functions](#file-functions)
- [`basename`](#basename)
Expand Down Expand Up @@ -6806,14 +6808,127 @@ These functions operate on `String` arguments.

**Restrictions**: None

### ✨ `find`

Given two `String` parameters `input` and `pattern`, searches for the occurrence of `pattern` within `input` and returns the first match or `None` if there are no matches. `pattern` is a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) and is evaluated as a [POSIX Extended Regular Expression (ERE)](https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended).

Note that regular expressions are written using regular WDL strings, so backslash characters need to be double-escaped. For example:

```wdl
String? first_match = find("hello\tBob", "\\t")
```

**Parameters**

1. `String`: the input string to search.
2. `String`: the pattern to search for.

**Returns**: The contents of the first match, or `None` if `pattern` does not match `input`.

<details>
<summary>
Example: test_find_task.wdl

```wdl
version 1.2
workflow find_string {
input {
String in = "hello world"
String pattern1 = "e..o"
String pattern2 = "goodbye"
}
output {
String? match1 = find(in, pattern1) # "ello"
String? match2 = find(in, pattern2) # None
}
}
```
</summary>
<p>
Example input:

```json
{}
```

Example output:

```json
{
"test_find.match1": "ello",
"test_matches.is_read1": null
}
```
</p>
</details>

### ✨ `matches`

Given two `String` parameters `input` and `pattern`, tests whether `pattern` matches `input` at least once. `pattern` is a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) and is evaluated as a [POSIX Extended Regular Expression (ERE)](https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended).

To test whether `pattern` matches the entire `input`, make sure to begin and end the pattern with anchors. For example:

```wdl
Boolean full_match = matches("abc123", "^a.+3$")
```

Note that regular expressions are written using regular WDL strings, so backslash characters need to be double-escaped. For example:

```wdl
Boolean has_tab = matches("hello\tBob", "\\t")
```

**Parameters**

1. `String`: the input string to search.
2. `String`: the pattern to search for.

**Returns**: `true` if `pattern` matches `input` at least once, otherwise `false`.

<details>
<summary>
Example: test_matches_task.wdl

```wdl
version 1.2
workflow contains_string {
input {
File fastq
}
output {
Boolean is_compressed = matches(basename(fastq), "\\.(gz|zip|zstd)")
Boolean is_read1 = matches(basename(fastq), "_R1")
}
}
```
</summary>
<p>
Example input:

```json
{
"fastq": "sample1234_R1.fastq"
}
```

Example output:

```json
{
"test_matches.is_compressed": false,
"test_matches.is_read1": true
}
```
</p>
</details>

### `sub`

```
String sub(String, String, String)
```

Given 3 String parameters `input`, `pattern`, and `replace`, this function replaces all non-overlapping occurrences of `pattern` in `input` by `replace`. `pattern` is a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) that will be evaluated as a [POSIX Extended Regular Expression (ERE)](https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended).

Given three String parameters `input`, `pattern`, `replace`, this function replaces all non-overlapping occurrences of `pattern` in `input` by `replace`. `pattern` is a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) and is evaluated as a [POSIX Extended Regular Expression (ERE)](https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended).
Regular expressions are written using regular WDL strings, so backslash characters need to be double-escaped (e.g., `"\\t"`).

🗑 The option for execution engines to allow other regular expression grammars besides POSIX ERE is deprecated.
Expand Down