Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement request #19

Closed
svsuresh opened this issue Aug 10, 2017 · 4 comments
Closed

Enhancement request #19

svsuresh opened this issue Aug 10, 2017 · 4 comments

Comments

@svsuresh
Copy link

svsuresh commented Aug 10, 2017

Thank you for a wonderful parsing tool. After much usage, I would like to request following features.
Please add

  • extract sequence by number (for fasta file). Use case: I would like to extract n th file
  • Tail function. Head function is already implemented. Tail function is missing
  • inverse selection for bases in a sequence. Current version allows user to select bases at the start, in the middle and at the end. It would be difficult for user to choose first few bases and last few bases. This would help user in removing sequences in the start and end of a read or a long sequence
  • Allow user to search by stop codon *. Stop codon * is present in few of predicted sequences. Currently user cannot search by * in the sequences.
@shenwei356
Copy link
Owner

  • extract sequence by number (for fasta file). Use case: I would like to extract n th file

    not clear

  • Tail function. Head function is already implemented. Tail function is missing

      seqkit fx2tab | tail | seqkit tab2fx
    
  • inverse selection for bases in a sequence. Current version allows user to select bases at the start, in the middle and at the end. It would be difficult for user to choose first few bases and last few bases. This would help user in removing sequences in the start and end of a read or a long sequence

    i understand this, but it not so useful in practice.

  • Allow user to search by stop codon *. Stop codon * is present in few of predicted sequences. Currently user cannot search by * in the sequences.

      seqkit grep -r -p '\*' # it should work
    

@svsuresh
Copy link
Author

svsuresh commented Aug 11, 2017

  1. Use case: I would like to extract every 4th sequence from fasta file or every 4th and 6th file either from the top or bottom of fasta file. I saw a use case for this on biostars and will post it once I find it.

  2. There is a head function like this: seqkit head -n 1 hairpin.fa.gz. I would like similar tail function.

  3. use case: https://www.biostars.org/p/263861/

  4. I tried it on following fasta file and didn't work. seqkit grep -r -p '*' test.fa gives me blank lines. Out put should give me 3 sequences and inverse grep shoud give me 2 sequences.

$ cat test.fa 
>s1
 ACDL
>s2
AGCTYLAKQ*
>s3
GTCTY*ATC
>s4
*GAP
>s5
AGATE

@shenwei356
Copy link
Owner

shenwei356 commented Aug 11, 2017

  1. seqkit fx2tab | sed / awk | seqkit tab2fx
  2. seqkit fx2tab | tail | seqkit tab2fx
  3. easy for region without overlap, but will be out of control for completed cases.
  4. seqkit grep -s -r -p '\*'

shenwei356 added a commit that referenced this issue Aug 12, 2017
@shenwei356
Copy link
Owner

@ssvbio check new version: v0.7.0

  1. seqkit range -r 4:4 for retrieving 4th records
  2. seqkit range -r -10:-1 for retrieving last 10 records (tail)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants