Enhancement request #19

svsuresh · 2017-08-10T19:49:27Z

Thank you for a wonderful parsing tool. After much usage, I would like to request following features.
Please add

extract sequence by number (for fasta file). Use case: I would like to extract n th file
Tail function. Head function is already implemented. Tail function is missing
inverse selection for bases in a sequence. Current version allows user to select bases at the start, in the middle and at the end. It would be difficult for user to choose first few bases and last few bases. This would help user in removing sequences in the start and end of a read or a long sequence
Allow user to search by stop codon *. Stop codon * is present in few of predicted sequences. Currently user cannot search by * in the sequences.

shenwei356 · 2017-08-10T23:54:31Z

extract sequence by number (for fasta file). Use case: I would like to extract n th file

not clear
Tail function. Head function is already implemented. Tail function is missing
```
  seqkit fx2tab | tail | seqkit tab2fx
```
inverse selection for bases in a sequence. Current version allows user to select bases at the start, in the middle and at the end. It would be difficult for user to choose first few bases and last few bases. This would help user in removing sequences in the start and end of a read or a long sequence

i understand this, but it not so useful in practice.
Allow user to search by stop codon *. Stop codon * is present in few of predicted sequences. Currently user cannot search by * in the sequences.
```
  seqkit grep -r -p '\*' # it should work
```

svsuresh · 2017-08-11T07:33:04Z

Use case: I would like to extract every 4th sequence from fasta file or every 4th and 6th file either from the top or bottom of fasta file. I saw a use case for this on biostars and will post it once I find it.
There is a head function like this: seqkit head -n 1 hairpin.fa.gz. I would like similar tail function.
use case: https://www.biostars.org/p/263861/
I tried it on following fasta file and didn't work. seqkit grep -r -p '*' test.fa gives me blank lines. Out put should give me 3 sequences and inverse grep shoud give me 2 sequences.

$ cat test.fa 
>s1
 ACDL
>s2
AGCTYLAKQ*
>s3
GTCTY*ATC
>s4
*GAP
>s5
AGATE

shenwei356 · 2017-08-11T07:52:12Z

seqkit fx2tab | sed / awk | seqkit tab2fx
seqkit fx2tab | tail | seqkit tab2fx
easy for region without overlap, but will be out of control for completed cases.
seqkit grep -s -r -p '\*'

shenwei356 · 2017-08-12T15:58:51Z

@ssvbio check new version: v0.7.0

shenwei356 added a commit that referenced this issue Aug 12, 2017

add new command range. #19

1acc61f

shenwei356 closed this as completed Aug 12, 2017

Provide feedback