Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/streamline inputs #34

Merged
merged 86 commits into from
Jan 8, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
8b9ea0c
use small genomes to generate examples and stramline input definitions
rsuchecki Dec 10, 2019
fdda035
corrected urls
rsuchecki Dec 10, 2019
240644f
relaxed allowed target name regex
rsuchecki Dec 11, 2019
e3a01b0
stingency settings not ensembl specific, moved to main config
rsuchecki Dec 11, 2019
774d2ae
refactoring gtf/gff3 fileds def
rsuchecki Dec 11, 2019
3415c30
major re-work of input staging and multitude of related changes
rsuchecki Dec 11, 2019
5880eb5
updated repr pep filtering
rsuchecki Dec 11, 2019
3a1577b
relaxed req to include supercontigs not just chromosomes
rsuchecki Dec 12, 2019
24a05f0
added sequencesToPlace spec to test config
rsuchecki Dec 12, 2019
68f9f84
restored core functionality after re-structure
rsuchecki Dec 12, 2019
f85b198
cleanup, comments
rsuchecki Dec 12, 2019
34ea935
added samtools container def
rsuchecki Dec 12, 2019
889df68
optional faidx process if idx no provided
rsuchecki Dec 12, 2019
04a21ae
added data set from non Esembl source
rsuchecki Dec 12, 2019
03732b8
generalised gff3-based pep conversion to Ensembl style, also allows p…
rsuchecki Dec 12, 2019
3162231
allowing user-specified chromosome id pattern for block and feature J…
rsuchecki Dec 12, 2019
c321a58
updated and documented test data sets
rsuchecki Dec 13, 2019
d7ca759
travis stub
rsuchecki Dec 13, 2019
87ef4e3
opted for smaller samtools container
rsuchecki Dec 13, 2019
2a33bc9
hack to handle gz (not bgz) files fro chr lengths
rsuchecki Dec 13, 2019
29f789c
minor
rsuchecki Dec 13, 2019
aa134b6
Update README.md
rsuchecki Dec 13, 2019
82d2c3d
Update .travis.yaml
rsuchecki Dec 13, 2019
26811a5
Update .travis.yaml
rsuchecki Dec 13, 2019
0517b45
Update .travis.yaml
rsuchecki Dec 13, 2019
a25b91a
Update .travis.yaml
rsuchecki Dec 13, 2019
8d7dacf
test profile with local data
rsuchecki Dec 17, 2019
68881a3
travis data download and untar
rsuchecki Dec 17, 2019
da8e1c9
Merge branch 'feature/streamline-inputs' into feature/streamline-inputs
rsuchecki Dec 17, 2019
4baa3f6
travis fixes
rsuchecki Dec 17, 2019
8fd3ce1
merge conflict res
rsuchecki Dec 17, 2019
df2b7a6
ubu version for travis
rsuchecki Dec 18, 2019
3051998
updated dep
rsuchecki Dec 18, 2019
e2baa64
for GH actions
rsuchecki Dec 18, 2019
692a680
docker user change for GH actions
rsuchecki Dec 18, 2019
1b938e0
docker groovy test for GHA
rsuchecki Dec 18, 2019
021ba86
docker user
rsuchecki Dec 18, 2019
8ac6fb0
docker grp exists
rsuchecki Dec 18, 2019
46f0716
added go for singularity
rsuchecki Dec 19, 2019
cc3df9c
added groovy image with ps
rsuchecki Dec 19, 2019
d99e324
reconf
rsuchecki Dec 19, 2019
4eb5b1c
test profile updates
rsuchecki Jan 3, 2020
17426e0
added Singularity install to GH actions
rsuchecki Jan 3, 2020
f118dd1
Singularity dependencies @ GH actions
rsuchecki Jan 3, 2020
6313210
working around https://github.com/sylabs/singularity/issues/3634
rsuchecki Jan 3, 2020
dd47eba
test singularity pull form docker
rsuchecki Jan 3, 2020
1c5d857
explicit use of gawk - may matter on alpine
rsuchecki Jan 3, 2020
5db7bac
workaround for
rsuchecki Jan 3, 2020
c9715a1
leaner fastx container
rsuchecki Jan 3, 2020
633b3d0
fastx and reconf
rsuchecki Jan 3, 2020
290081b
fix path to script, renamed tasks
rsuchecki Jan 3, 2020
78bc738
test wspace path
rsuchecki Jan 5, 2020
c669c9c
added missing script, fixed GH actions cmd
rsuchecki Jan 5, 2020
7d4b37c
ansi-lo on and try docker again
rsuchecki Jan 5, 2020
dc819de
docker workflow test
rsuchecki Jan 5, 2020
c03202f
fix typo
rsuchecki Jan 5, 2020
77294ee
fix typo
rsuchecki Jan 5, 2020
f0fbeee
fix for permission denied GH actions (?)
rsuchecki Jan 6, 2020
70872c6
fix for groovy grapes in docker
rsuchecki Jan 6, 2020
cec2c30
test
rsuchecki Jan 6, 2020
e5fd34c
test
rsuchecki Jan 6, 2020
2376c77
test
rsuchecki Jan 6, 2020
f883ae7
test
rsuchecki Jan 6, 2020
7701b02
another docker test
rsuchecki Jan 6, 2020
e8b218b
GH A job.needs experiemnt
rsuchecki Jan 6, 2020
7c4dfdc
GH A tidy
rsuchecki Jan 6, 2020
a25b32c
GH A fix indent
rsuchecki Jan 6, 2020
881190a
GH A fix job
rsuchecki Jan 6, 2020
bb7752a
added GH actions CI badge
rsuchecki Jan 6, 2020
880909c
re-implemented: duplicate emissions if multiple annotations per refer…
rsuchecki Jan 6, 2020
a95b25c
updated datastes in line with feature dev
rsuchecki Jan 6, 2020
92c9dcc
another badge ver
rsuchecki Jan 6, 2020
58551a6
fix
rsuchecki Jan 6, 2020
625adec
added EP datasets
rsuchecki Jan 7, 2020
989e15b
ensure non-empty process out
rsuchecki Jan 7, 2020
a4b60d5
generalised for different gff3 interpretations
rsuchecki Jan 7, 2020
412dc1c
Delete .travis.yaml
rsuchecki Jan 7, 2020
4984117
Update README.md
rsuchecki Jan 7, 2020
1abf86e
Update README.md
rsuchecki Jan 7, 2020
299649c
At & Bd ref fasta not needed
rsuchecki Jan 8, 2020
f8ced04
speeding things up: gawk in jq container and up resources
rsuchecki Jan 8, 2020
15244ac
do not report markers placed outside pseudochromosomes (e.g. on scaff…
rsuchecki Jan 8, 2020
a643a08
id pattern match extended to seq placement
rsuchecki Jan 8, 2020
1f287b5
Merge branch 'feature/streamline-inputs' of github.com:plantinformati…
rsuchecki Jan 8, 2020
8d80513
redundant-ish
rsuchecki Jan 8, 2020
7e04de5
added TOC
rsuchecki Jan 8, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: CI

on: [push]

jobs:
docker:
runs-on: ubuntu-18.04
steps:
- name: Install GraphViz
run: |
sudo apt-get update && sudo apt-get install -y graphviz
- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Check out code
uses: actions/checkout@v1
- name: Test workflow (docker)
run: |
NXF_VER=19.10.0 nextflow run ${GITHUB_WORKSPACE} -profile CI,docker --max_cpus 2 --max_memory 4.GB -ansi-log false

singularity:
runs-on: ubuntu-18.04
# runs-on: ubuntu-18.04
steps:
- name: Check out code
uses: actions/checkout@v1
- name: Set up Go
uses: actions/setup-go@v1
with:
go-version: 1.13
id: go
- name: Install Dependencies for Singularity
run: |
sudo apt-get update && sudo apt-get install -y \
build-essential \
libssl-dev \
uuid-dev \
libgpgme11-dev \
squashfs-tools \
libseccomp-dev \
pkg-config
- name: Install Singularity
env:
SINGULARITY_VERSION: 3.5.2
run: |
export GOPATH=/tmp/go
mkdir -p $GOPATH
sudo mkdir -p /usr/local/var/singularity/mnt && \
mkdir -p $GOPATH/src/github.com/sylabs && \
cd $GOPATH/src/github.com/sylabs && \
wget -qO- https://github.com/sylabs/singularity/releases/download/v${SINGULARITY_VERSION}/singularity-${SINGULARITY_VERSION}.tar.gz | \
tar xzv && \
cd singularity && \
./mconfig -p /usr/local && \
make -C builddir && \
sudo make -C builddir install
- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Pull containers
run: |
echo "GITHUB_WORKSPACE: ${GITHUB_WORKSPACE}"
nextflow run ${GITHUB_WORKSPACE}/pull_containers.nf -ansi-log false -profile singularity
- name: Test workflow (singularity)
run: |
NXF_VER=19.10.0 nextflow run ${GITHUB_WORKSPACE} -profile CI,singularity --max_cpus 2 --max_memory 4.GB -ansi-log false
88 changes: 33 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,31 +2,29 @@

[![GitHub commits since latest release](https://img.shields.io/github/commits-since/plantinformatics/pretzel-input-generator/latest.svg?style=for-the-badge&logo=github)](https://github.com/plantinformatics/pretzel-input-generator/releases)

Note that this description lacks detail which will be added in future releases.

# Pipeline overview

`pretzel-input-generator` is a [nextflow](https://www.nextflow.io) pipeline for generating input for [pretzel](https://github.com/plantinformatics/pretzel) from annotated and (mostly) contiguous genome assemblies. The pipeline requires approximately 1 cpu-day, but as many processes can run independently, the real run-time is much shorter if suitable compute resources are available.
![GitHub Workflow Status](https://img.shields.io/github/workflow/status/plantinformatics/pretzel-input-generator/CI?label=CI%20TESTS&logo=github&style=for-the-badge)


<!-- TOC -->

**Note that this README is partly out-of-date **
- [Pipeline overview](#pipeline-overview)
- [Default pipeline](#default-pipeline)
- [Quick start example using yeast data](#quick-start-example-using-yeast-data)
- [Quick start example using microsporidia data](#quick-start-example-using-microsporidia-data)
- [Input specification (triticeae and other relevant data sets)](#input-specification-triticeae-and-other-relevant-data-sets)
- [Data sources](#data-sources)
- [Remote](#remote)
- [Local](#local)
- [Other considerations](#other-considerations)
- [Disparate triticeae datasets](#disparate-triticeae-datasets)
- [Dependencies](#dependencies)
- [Execution](#execution)
- [Output](#output)
- [BUSCO-based pipeline](#busco-based-pipeline)
- [Quick-ish start](#quick-ish-start)
- [Output](#output-1)

<!-- /TOC -->

# Pipeline overview

`pretzel-input-generator` is a [nextflow](https://www.nextflow.io) pipeline for generating input for [pretzel](https://github.com/plantinformatics/pretzel) from annotated and (mostly) contiguous genome assemblies. The pipeline requires approximately ??? cpu-???, but as many processes can run independently, the real run-time is much shorter if suitable compute resources are available.


<!-- TODO: re-generate TOC -->

# Default pipeline

Expand All @@ -35,54 +33,34 @@ Designed for EnsemblPlants and similarly formatted data.
![doc/dag.png](doc/dag.png)


## Quick start example using yeast data
## Quick start example using microsporidia data


Requires [nextflow](https://www.nextflow.io) and [Singularity](http://singularity.lbl.gov)
Requires [nextflow](https://www.nextflow.io) and either [Singularity](http://singularity.lbl.gov)

```
nextflow run plantinformatics/pretzel-input-generator \
-profile YEAST,singularity --max_cpus 2 --max_memory 2.GB
-profile MICROSPORIDIA,singularity --max_cpus 2 --max_memory 2.GB
```

This will pull and process data sets from [Ensembl](https://ensembl.org) specified in [`conf/ensembl-yeast.config`](conf/ensembl-yeast.config)
This will pull and process data sets specified in [`conf/microsporidia.config`](conf/microsporidia.config)

## Input specification (triticeae and other relevant data sets)

Input files are specified in [conf/triticeae.config](conf/triticeae.config). This can be supplemented/replaced by JSON/YAML formatted input spec.

### Data sources

Currently all input data comes from the following sources:

* [Ensembl plants](https://plants.ensembl.org) - multiple datasets as specified in [`conf/triticeae.config`](conf/triticeae.config) and
* [International Wheat Genome Sequencing Consortium](https://www.wheatgenome.org/)
* [Triticum aestivum (Chinese Spring) IWGSC RefSeq v1.0 assembly](https://wheat-urgi.versailles.inra.fr/Seq-Repository/Assemblies)
* [The wild emmer wheat sequencing consortium (WEWseq)](http://wewseq.wixsite.com/consortium)
* Zavitan assembly downloaded from [GrainGenes](https://wheat.pw.usda.gov/GG3/wildemmer)
* [European Nucleotide Archive](https://www.ebi.ac.uk/ena)
* [Assembly of chromosome 2D of *Triticum aestivum* line CH Campala *Lr22a*](https://www.ebi.ac.uk/ena/data/view/LS480641)
* [Assembly of *Triticum urartu* ](https://www.ebi.ac.uk/ena/data/view/GCA_003073215)
* Annotation downloaded from [MBKBase](http://www.mbkbase.org/Tu/)
* [Assembly of *Aegilops tauschii* ](https://www.ebi.ac.uk/ena/data/view/GCA_002575655.1)
* Annotation downloaded from [http://aegilops.wheat.ucdavis.edu/ATGSP/annotation/](http://aegilops.wheat.ucdavis.edu/ATGSP/annotation/)
* ...and more...
*
#### Remote
## Input specification (triticeae and other relevant data sets)

The pipeline pulls data from Ensembl, included species and assembly versions are specified in configuration file(s) e.g. [conf/ensembl-plants-data.config](conf/ensembl-plants-data.config).
For each of the data sets the pipeline downloads:
A mix of local and remote files can be specified - see [`conf/microsporidia.config`](conf/microsporidia.config) and the corresponding [`conf/test-data.config`](conf/test-data.config)

There are several paths through the pipeline which are executed depending on input specification and availability of various input file types, e.g.

* genome assembly index file (required)
* genome assembly index file
* protein sequences (required if pipeline is to generate aliases)
* genome assembly fasta (only required if pipeline is to place markers on assemblies)
* marker sequences
* genome assembly fasta (required if pipeline is to place marker sequences on assemblies)

#### Local
Different paths through the pipeline rely on partly different inputs

Different branches of the pipeline rely on partly different inputs
1. Generation of genome blocks requires a genome assembly index file - all we really need are lengths of pseudo-chromosomes so a two-column `.tsv` file with chromosome names and their lengths will suffice. Also, if genome assembly fasta file is specified, the index will be generated automatically.

1. Generation of genome blocks requires a genome assembly index file - all we really need are lengths of pseudo-chromosomes so a two-column `.tsv` file with chromosome names and their lengths will suffice
2. Placement of gene features on the generated genome blocks and generation of aliases between features requires

* gene annotations (either GTF or GFF3)
Expand All @@ -102,9 +80,10 @@ This follows how protein sequences are annotated on Ensembl plants, but we do no

4. Marker placement requires full reference FASTA file.

### Other considerations

Wherever possible the local assembly files are used as input for the pipeline in their original form - as downloaded from their respective sources. This is however not always possible due to inconsistencies in formatting and varying levels of adherence to standards and conventions. We try to capture additional steps needed to prepare these input data sets for the inclusion in this pipeline in [doc/format_local.md](doc/format_local.md).
### Disparate triticeae datasets

Wherever possible the assembly files are used as input for the pipeline in their original form - as downloaded from their respective sources. This is however not always possible due to inconsistencies in formatting and varying levels of adherence to standards and conventions. We try to capture additional steps needed to prepare these input data sets for the inclusion in this pipeline in [doc/format_local.md](doc/format_local.md).

## Dependencies

Expand All @@ -114,14 +93,13 @@ Wherever possible the local assembly files are used as input for the pipeline in
* [Docker](http://singularity.lbl.gov)
* Required software installed. In addition to standard linux tools, these include:
* [FASTX-Toolkit](http://hannonlab.cshl.edu/fastx_toolkit/)
* [MMSeqs2](https://github.com/soedinglab/mmseqs2)
* [MMSeqs2](https://github.com/soedinglab/mmseqs2) - if generating aliases
* [Minimap2](https://github.com/lh3/minimap2) - if placing markers
* `jq`
* `groovy` interpreter


When using Singularity or Docker, the required containers are specified in [`conf/containers.conf`](conf/containers.config)

and pulled by Nextflow as required.

## Execution

Expand All @@ -131,21 +109,21 @@ Run locally with docker

```
nextflow run plantinformatics/pretzel-input-generator \
-profile YEAST,docker
-profile MICROSPORIDIA,docker
```

Run locally with singularity

```
nextflow run plantinformatics/pretzel-input-generator \
-profile YEAST,singularity
-profile MICROSPORIDIA,singularity
```

Dispatch on a SLURM cluster with singularity

```
nextflow run plantinformatics/pretzel-input-generator \
-profile YEAST,slurm,singularity
-profile MICROSPORIDIA,slurm,singularity
```

## Output
Expand All @@ -162,13 +140,13 @@ All generated JSON files generated by the pipeline are output to `results/JSON`.

The output files (hopefully) conform to the requirements of [pretzel data structure](https://github.com/plantinformatics/pretzel-data).


The `results/flowinfo` directory contains summaries of pipeline execution and `results/downloads` includes the files downloaded from Ensembl plants.

```
results
├── downloads
├── flowinfo
├── summary
└── JSON
```

Expand All @@ -193,4 +171,4 @@ This will pull and process data sets from [DNA Zoo](https://www.dnazoo.org/) spe

## Output

In comparison with the main pipeline the output lacks `*_aliases.json.gz` as features on different genomes are implicitly connected by BUSCOs identifiers.
In comparison with the main pipeline the output lacks `*_aliases.json.gz` as features on different genomes are implicitly connected by BUSCOs identifiers.
4 changes: 2 additions & 2 deletions bin/excludeSameChromosome.awk
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#!/usr/bin/awk -f
#!/usr/bin/gawk -f

BEGIN {
OFS="\t";
}
NR==FNR && $3 ~/^chromosome/ {
NR==FNR && $3 ~/^(chromosome|supercontig)/ {
#gsub("^>","",$1)
split($3,location,":");
idmap[$1]=location[3];
Expand Down
6 changes: 3 additions & 3 deletions bin/filterForRepresentative.awk
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/awk -f
#!/usr/bin/gawk -f

BEGIN {
FS = "\t";
Expand All @@ -9,8 +9,8 @@ BEGIN {
split($1,arr," "); #GET GENE FIELD
split(arr[4],gene,":"); #GET GENE FIELD
ID=gene[2]; #GET GENE ID
sub(/^>[^ ]+/, ">"ID); #USE GENE ID AS FASTA IDENTIFIER (NOT TRANSCRIPT ID)
if(!(ID in storedIDs) || length($2) > length(StoredSeqLines[ID])) { #FIRST OCCURANCE OT LONGER THAN STORED
sub(/^>[^ ]+/, ">"ID); #USE GENE ID AS FASTA IDENTIFIER (RATER THAN THE TRANSCRIPT ID)
if(!(ID in storedIDs) || length($2) > length(StoredSeqLines[ID])) { #FIRST OCCURANCE OR LONGER THAN STORED
storedIdLInes[ID] = $1;
# print "storing "$1
storedSeqLines[ID] = $2;
Expand Down
29 changes: 23 additions & 6 deletions bin/gff3AndRepr2ensembl_pep.awk
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/awk -f
#!/usr/bin/gawk -f

BEGIN {
FS="\t";
Expand All @@ -15,24 +15,41 @@ NR==FNR {
}

NR!=FNR {
if($3 =="mRNA") {
if($3 =="CDS") {
gsub("\"","");
split($9,arr,";| ");
for(i in arr) {
split(arr[i], pair, "=");
if(pair[1]=="ID") {
transcript=pair[2];
} else if(pair[1]=="Parent") {
gene=pair[2];
parent=pair[2];
gene=parent
gsub(/\.[0-9]+$/,"",gene);
} else if(pair[1] ~ /^protein(_source)?_id$/) {
source=pair[2];
}
}
if(transcript in repr && !(gene in printed)) {
# print "p="parent,"g="gene,"t="transcript,"s="source
# if(transcript in repr && !(gene in printed)) {
if(!(parent in printed)) {
#IGNORECASE=1;
gsub(/chr_?/,"",$1);
#IGNORECASE=0;
printed[gene]=1;
print ">"transcript" pep chromosome:"version":"$1":"$4":"$5" gene:"gene"\n"repr[transcript];

if(source in repr) {
id = source
} else if(parent in repr) { #dealing with GFF files being inconsistent...
id = parent
} else {
id = ""
}
# if(source in repr && !(gene in printed)) {
# print ">"transcript" pep chromosome:"version":"$1":"$4":"$5" gene:"gene"\n"repr[transcript];
if(id) {
print ">"id" pep chromosome:"version":"$1":"$4":"$5" gene:"gene"\n"repr[id];
printed[parent]=1;
}
}
}
}
2 changes: 1 addition & 1 deletion bin/gtfAndRepr2ensembl_pep.awk
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/awk -f
#!/usr/bin/gawk -f

BEGIN {
FS="\t";
Expand Down
14 changes: 12 additions & 2 deletions bin/paf2pretzel.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ import java.util.zip.GZIPInputStream
import java.util.zip.GZIPOutputStream


@Grab('info.picocli:picocli:4.0.0-alpha-3') //command line interface
//@Grab('info.picocli:picocli-groovy:4.1.2') //command line interface
groovy.grape.Grape.grab(group:'info.picocli', module:'picocli-groovy', version:'4.1.2')

@Command(header = [
//Font Name: Calvin S
$/@|bold,blue ╔═╗╔═╗╔═╗ ┌┬┐┌─┐ ╔═╗┬─┐┌─┐┌┬┐┌─┐┌─┐┬ |@/$,
Expand Down Expand Up @@ -52,6 +54,9 @@ import static picocli.CommandLine.*
@Option(names = ["--align-params"], description = ["Params used to generate input PAF alignments"])
@Field private String alignParams

@Option(names = ["--allowed-target-id-pattern"], description = ["Provide target identifier patter if other than common chromosome naming"])
@Field private String allowedTargetIdPattern

@Option(names = ["-O", "--output"], description = ["JSON output file name"])
@Field private String output = '/dev/stdout'

Expand Down Expand Up @@ -112,8 +117,12 @@ pafContent.eachLine { line ->
// println "${query_identity} >= ${minIdentity} ?"
if(query_identity >= minIdentity) {
def kosher = true;
if(!(TNAME.toLowerCase() ==~ /^(chr(omosome)?)?(_)?([0-9]+|x|y|i|v).*/)) {
// println "check if TNAME kosher"
// if(!(TNAME.toLowerCase() ==~ /^(ch(romosome)?)?(_)?([0-9]+|x|y|i|v).*/)) {
// if(!(TNAME.toLowerCase() ==~ /^(ch(romosome)?)?(_)?([0-9]+|x|y|i|v|[0-9a-z_\-]).*/)) {
if(!((TNAME.toLowerCase() =~ /^(ch|[0-9]{1,2}|x|y|i|v)/) || (TNAME =~ allowedTargetIdPattern) )) {
kosher = false //don't report placement on plasmid or other non-pseudomolecule parts of assembly
// println "${allowedTargetIdPattern} not matching $TNAME"
} else if(markerMode && query_identity < 1) { //Not a 100% match, so for markers we check if no MM in last 3 bases - if notMarkerMode the required tag may not be present
TAGS.each { tag ->
if(tag.startsWith('cs:Z')) {
Expand All @@ -129,6 +138,7 @@ pafContent.eachLine { line ->
}

if(kosher) {
// println TNAME
def key = TNAME.replaceFirst("^(C|c)(H|h)(R|r)[_]?","")
if(!scope.containsKey(key)) {
scope << [(key) : []]
Expand Down
Loading