Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom LCA with custom Lineages #2734

Closed
mr-eyes opened this issue Aug 27, 2023 · 4 comments
Closed

Custom LCA with custom Lineages #2734

mr-eyes opened this issue Aug 27, 2023 · 4 comments

Comments

@mr-eyes
Copy link
Member

mr-eyes commented Aug 27, 2023

I have a custom sourmash-LCA-friendly lineages as described here. I created custom lineages for my tree that does not follow the identifiers,superkingdom,phylum,class,order,family,genus,species,strain naming convention. Is there a way to make sourmash works on that without throwing

Building LCA database with ksize=51 scaled=10000 moltype=DNA.
examining spreadsheet headers...
** assuming column 'Leaf_ID' is identifiers in spreadsheet
** assuming column 'level_0' is superkingdom in spreadsheet
** assuming column 'level_1' is phylum in spreadsheet
whoa, too many assumptions. are the headers right?
expecting identifiers,superkingdom,phylum,class,order,family,genus,species,strain

Leaf_ID,level_0,level_1,level_2,level_3,level_4,level_5,level_6,level_7,level_8,level_9,level_10,level_11,level_12,level_13,level_14,level_15,level_16,level_17,level_18,level_19,level_20,level_21,level_22,level_23,level_24,level_25,level_26,level_27,level_28,level_29,level_30,level_31,level_32,level_33,level_34,level_35,level_36,level_37,level_38,level_39,level_40,level_41,level_42,level_43,level_44,level_45,level_46,level_47,level_48,level_49,level_50,level_51,level_52
SRR13739007,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8,node_9,node_10,node_11,node_12,node_13,node_14,node_15,node_16,node_17,node_18,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13738983,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8,node_9,node_10,node_11,node_12,node_13,node_14,node_15,node_16,node_17,node_18,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13739027,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8,node_9,node_10,node_11,node_12,node_13,node_14,node_15,node_16,node_17,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13738996,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8,node_9,node_10,node_11,node_12,node_13,node_14,node_15,node_16,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13739025,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8,node_9,node_10,node_11,node_12,node_13,node_14,node_15,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13739019,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8,node_9,node_10,node_11,node_12,node_13,node_14,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13739024,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8,node_9,node_10,node_11,node_12,node_13,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13739021,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8,node_9,node_10,node_11,node_12,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13739016,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8,node_9,node_10,node_11,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13739017,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8,node_9,node_10,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13739018,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8,node_9,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13739026,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,node_8,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13739078,node_0,node_1,node_2,node_3,node_4,node_5,node_6,node_7,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
SRR13739020,node_0,node_1,node_2,node_3,node_4,node_5,node_6,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na

@mr-eyes
Copy link
Member Author

mr-eyes commented Aug 27, 2023

Seems to work after -f for force.

! sourmash lca index -k 51 --scaled 10000 -f {output_tsv} {output_newick}.lca.json --from-file {lca_sigs}

@mr-eyes
Copy link
Member Author

mr-eyes commented Aug 27, 2023

No. I was wrong. The command didn't terminate with -f however the indexing is wrong.

Building LCA database with ksize=51 scaled=10000 moltype=DNA.
examining spreadsheet headers...
** assuming column 'Leaf_ID' is identifiers in spreadsheet
** assuming column 'level_0' is superkingdom in spreadsheet
** assuming column 'level_1' is phylum in spreadsheet
whoa, too many assumptions. are the headers right?
expecting identifiers,superkingdom,phylum,class,order,family,genus,species,strain
...continue, because --force was specified.
** assuming column 'level_2' is class in spreadsheet
whoa, too many assumptions. are the headers right?
expecting identifiers,superkingdom,phylum,class,order,family,genus,species,strain
...continue, because --force was specified.
** assuming column 'level_3' is order in spreadsheet
whoa, too many assumptions. are the headers right?
expecting identifiers,superkingdom,phylum,class,order,family,genus,species,strain
...continue, because --force was specified.
** assuming column 'level_4' is family in spreadsheet
whoa, too many assumptions. are the headers right?
expecting identifiers,superkingdom,phylum,class,order,family,genus,species,strain
...continue, because --force was specified.
** assuming column 'level_5' is genus in spreadsheet
whoa, too many assumptions. are the headers right?
expecting identifiers,superkingdom,phylum,class,order,family,genus,species,strain
...continue, because --force was specified.
** assuming column 'level_6' is species in spreadsheet
whoa, too many assumptions. are the headers right?
expecting identifiers,superkingdom,phylum,class,order,family,genus,species,strain
...continue, because --force was specified.
** assuming column 'level_7' is strain in spreadsheet
whoa, too many assumptions. are the headers right?
expecting identifiers,superkingdom,phylum,class,order,family,genus,species,strain
...continue, because --force was specified.
737 distinct identities in spreadsheet out of 737 rows.
56 distinct lineages in spreadsheet out of 737 rows.
... loaded 737 signatures.907 (737 of 737); skipped 0 so farr
loaded 12386552 hashes at ksize=51 scaled=10000
56 assigned lineages out of 56 distinct lineages in spreadsheet.
737 identifiers used out of 737 distinct identifiers in spreadsheet.

And when query:

finding query signatures...
outputting classifications to -
ID,status,superkingdom,phylum,class,order,family,genus,species,strain
SRR8614047,found,node_0,node_1,node_2,node_177,node_178,node_179,node_257,node_346
classified 1 signatures total

So basically, it ignored my custom taxonomy levels.

@ctb
Copy link
Contributor

ctb commented Aug 27, 2023

non-standard lineage names are not (yet) accessible from the command line; see #2469 for what I think is the latest word on them.

your best bet for now is to just put non-standard values in the standard lineage naming scheme :)

@mr-eyes
Copy link
Member Author

mr-eyes commented Aug 27, 2023

non-standard lineage names are not (yet) accessible from the command line; see #2469 for what I think is the latest word on them.

Thank you!

@mr-eyes mr-eyes closed this as completed Aug 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants