Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] When two input genomes have same basename, the first genome is mistakenly used for the second run. #273

Closed
pvstodghill opened this issue Oct 18, 2023 · 4 comments

Comments

@pvstodghill
Copy link

Describe the bug
When two input genomes have same basename, the genome from the first run is mistakenly used for the second run. The cause is that for the first run, the first input genome (e.g., genome.fasta) is copied to the PGAP working directory, but for the second run, the second input genome is not copied and the first input genome is reused.

A related bug (untested): the use of fixed files names (input.yaml, submol.yaml) and user-supplied basenames as intermediate files in PGAP working directory. precluded concurrent execution of PGAP.

Another related bug (also untested, and now I'm just being silly): What happens if the path for my genome is "path/pgap.py"? :-)

To Reproduce

$ head -n1 a_in/genome.fasta b_in/genome.fasta
==> a_in/genome.fasta <==
>a
==> b_in/genome.fasta <==
>b
$ ./pgap.py -o a_out -g a_in/genome.fasta -s 'Genus species'
$ head -n1 genome.fasta a_out/annot.fna
==> genome.fasta <==
>a
==> a_out/annot.fna <==
>lcl|a
$ ./pgap.py -o b_out -g b_in/genome.fasta -s 'Genus species'
$ head -n1 genome.fasta b_out/annot.fna
==> genome.fasta <==
>a
==> b_out/annot.fna <==
>lcl|a
$ rm -f genome.fasta input.yaml submol.yaml
$ ./pgap.py -o b_out -g b_in/genome.fasta -s 'Genus species'
$ head -n1 genome.fasta b_out/annot.fna
==> genome.fasta <==
>b
==> b_out/annot.fna <==
>lcl|b

Expected behavior

The expected behavior is that the second input genome is used as input to the second PGAP run. This might be achieved by deleting the working files (genome.fasta, input.yaml, submol.yaml) from the PGAP working directory. This might be achieved by mktemp'ing a new directory within the PGAP working directory to contain the working files.

Software versions (please complete the following information):

  • OS: Debian 12
  • pgap.py --version: 2023-10-03.build7061
  • docker --version: 20.10.24+dfsg1, build 297e128
@azat-badretdin
Copy link
Contributor

Thank you for your report, Paul! That seems like a bug to me. We will look at it promptly.

@azat-badretdin
Copy link
Contributor

What happens if the path for my genome is "path/pgap.py"? :-)

I love how your mind works, Paul! :-) We definitely need this attitude in our testing.

@george-coulouris
Copy link
Contributor

george-coulouris commented Oct 24, 2023

Hey Paul! We have an open internal ticket to address the concurrency implications of pgap's use of fixed filenames. In the meantime, you can work around this by creating a temp dir and cd'ing to it before invoking pgap.

Say hi to Dave L. for me.

@azat-badretdin
Copy link
Contributor

This has been fixed in our code and the fix will be available in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants