Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charcoal tests fail with sourmash version 4.4.0 with error ValueError: empty or improperly formatted pickfile #2094

Closed
taylorreiter opened this issue Jun 21, 2022 · 4 comments

Comments

@taylorreiter
Copy link
Contributor

I was trying to update the default database for charcoal over in dib-lab/charcoal#215, and the tests were failing. They were failing in a way that wasn't related to the changes I made in the PR, so I pinned the version of sourmash to 4.2.3 (down from 4.4.0), and the issue resolved. I chose this version bc charcoal is currently running on farm using an environment with this version of sourmash installed.

The error I was getting was

[Sat Jun 18 04:17:04 2022]
rule make_contigs_search_taxonomy_wc:
    input: demo/genomes/TOBG_NAT-167.fna.gz, /tmp/charcoal_testp9sd8lp1/stage1/TOBG_NAT-167.fna.gz.sig, /tmp/charcoal_testp9sd8lp
1/stage1/TOBG_NAT-167.fna.gz.matches.csv, demo/demo-lineages.csv, demo/LoombaR_2017__SID1050_bax__bin.11.fa.gz.gather-matches.sig
.gz, demo/TARA_ANE_MAG_00014.fa.gather-matches.sig.gz, demo/TARA_PON_MAG_00084.fa.gather-matches.sig.gz, demo/GCA_001593925.sig.g
z
    output: /tmp/charcoal_testp9sd8lp1/stage1/TOBG_NAT-167.fna.gz.contigs-tax.json
    jobid: 14
    wildcards: g=TOBG_NAT-167.fna.gz
    resources: tmpdir=/tmp

examining spreadsheet headers...
** assuming column 'accession' is identifiers in spreadsheet
examining spreadsheet headers...
** assuming column 'accession' is identifiers in spreadsheet
Traceback (most recent call last):
  File "/home/tereiter/miniconda3/envs/charcoal/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/tereiter/miniconda3/envs/charcoal/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/tereiter/github/charcoal/charcoal/contigs_search_taxonomy.py", line 151, in <module>
    returncode = cmdline(sys.argv[1:])
  File "/home/tereiter/github/charcoal/charcoal/contigs_search_taxonomy.py", line 146, in cmdline
    return main(args)
  File "/home/tereiter/github/charcoal/charcoal/contigs_search_taxonomy.py", line 36, in main
    picklist.load(args.matches_csv, picklist.column_name)
  File "/home/tereiter/miniconda3/envs/charcoal/lib/python3.9/site-packages/sourmash/picklist.py", line 163, in load
    raise ValueError(f"empty or improperly formatted pickfile '{pickfile}'")
ValueError: empty or improperly formatted pickfile '/tmp/charcoal_testp9sd8lp1/stage1/GCF_000005845-subset.fa.gz.matches.csv'

I got this error for the tests test_make_contigs_json and test_make_clean_dna.
It can be re-created by changing the environment.yml file in charcoal to sourmash=4.4.0 and running the tests.

@ctb
Copy link
Contributor

ctb commented Jun 22, 2022

99% sure that I (perhaps inadvertently...) changed the behavior of sourmash to no longer output headers to the prefetch file if there are no matches. Will update as I track things down further.

@ctb
Copy link
Contributor

ctb commented Jun 23, 2022

in #1924, I changed sourmash so that it complains when the pickfile is empty - specifically, this error is triggered by a pickfile that has no column headers.

I think the solution is to fix charcoal so that it catches this error. will try.

@ctb
Copy link
Contributor

ctb commented Jun 23, 2022

added to 242e943 in dib-lab/charcoal#215

@ctb
Copy link
Contributor

ctb commented Jun 23, 2022

merged! closing.

@ctb ctb closed this as completed Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants