Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I lied to PATRIC & it liked it #1962

Closed
aswarren opened this issue Apr 19, 2018 · 11 comments
Closed

I lied to PATRIC & it liked it #1962

aswarren opened this issue Apr 19, 2018 · 11 comments
Assignees

Comments

@aswarren
Copy link
Contributor

Minhash supports fastq files. I told PATRIC that my fastq file was a contigs file so I could submit it to similar genome finder. It returned the correct result.

We should add type "reads" to allowable inputs in the similar genome finder.

@mshukla1
Copy link

...or just call you a lier next time! ;)

@olsonanl
Copy link

This is a problem across the board. We have had a number of problems where uploaded data does not match the declared type.

I have working fastq and fasta validators that we can bolt into place. One issue, however, is that "reads" does not mean fastq; it can mean bam as well:

SPAdes takes as input paired-end reads, mate-pairs and single (unpaired) reads in FASTA and FASTQ. For IonTorrent data SPAdes also supports unpaired reads in unmapped BAM format (like the one produced by Torrent Server). However, in order to run read error correction, reads should be in FASTQ or BAM format. Sanger, Oxford Nanopore and PacBio CLR reads can be provided in both formats since SPAdes does not run error correction for these types of data.

There is a distinction between the file format and type of file as we are using it.

@mshukla1
Copy link

Not sure if there is any short term action item here or we bite the bullet fix the file type / file format checking and make it consistently available across all services.

@olsonanl
Copy link

There is not a good hack fix to this.

@aswarren
Copy link
Contributor Author

Regardless of what we do to clean up the filetype space, we will still have filetypes; apps will still be authorized to work with certain filetypes; this one should be authorized to work with fastq since it supports it. If you want to clean up filetypes, that is a separate issue/ticket.

@aswarren
Copy link
Contributor Author

aswarren commented Jul 12, 2018

Hmm I think I need to walk that back. I forgot BAM, that is my bad. I will try to think about this.

@aswarren
Copy link
Contributor Author

After looking at this further it looks like BAM is its own type right now.
https://github.com/PATRIC3/Workspace/blob/master/typeslist.txt

image

So that isn't exactly an issue for this service.
More broadly, for my money, exploding types like "reads" and "contigs" into their constituent formats and having those as our input types makes sense to me.
I think if we did that then we could support the old "types" (reads, contigs) as we transition to no longer creating them.

I will try to create a proposed solution for this in a couple of slides so we can talk about it next week.

@aswarren
Copy link
Contributor Author

Since BAM is not currently a "reads" type.
This is enabled for Similar genome finder here.
PATRIC3/p3_web#773

@aswarren
Copy link
Contributor Author

It looks like fastq only works when it is small and compressed fastq files don't seem to work even though MASH supports it. The invocation of MASH may need to be reworked for fastq:
marbl/Mash#32

@aswarren aswarren reopened this Jul 26, 2018
@olsonanl
Copy link

How does it fail?

@aswarren
Copy link
Contributor Author

@olsonanl Scratch that. It succeeded. We just were not setting the MASH distance threshold to be permissive enough. This seems to be common enough (along with the need to search all public genomes) that I'm wondering if we should be hiding the "Advanced" parameters here by default. They don't seem very "Advanced" and most people would want to know that they are searching only "Reference & Representative" without having to dig.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants