Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello, may I ask if you can provide a chromosome dataset for training purposes #21

Open
a-piece-of-teemo opened this issue Dec 4, 2023 · 4 comments

Comments

@a-piece-of-teemo
Copy link

Hello, may I ask if you can provide a chromosome dataset for training purposes

@dpellow
Copy link
Collaborator

dpellow commented Dec 4, 2023

This repo has the scripts I used to download and generate datasets: https://github.com/dpellow/data-download.

(I haven't used them in a while so they may not be fully up to date - open an issue there if you find anything that doesn't work any more and I can update)

@dpellow
Copy link
Collaborator

dpellow commented Dec 6, 2023

@a-piece-of-teemo did it work?

@a-piece-of-teemo
Copy link
Author

I ran the command scripts/download_script.sh, and the following error messages occurred during execution:
2023-12-06 16:39:00 (3.71 MB/s) - ‘archaea_summary.txt’ saved [8532546]
/var/spool/slurmd/job18464/slurm_script: line 48: /usr/bin/gunzip: Argument list too long
cat: 'bacteria/*.fna': No such file or directory
I'm not sure which part went wrong, nor do I know if the command to download the dataset was completed.

@dpellow
Copy link
Collaborator

dpellow commented Dec 6, 2023

Please put issues in that repo there.
Can you say which directories were created and which files were downloaded?
From the information you posted, it looks like the expansion of * in the gunzip produces a command that is too long since there are too many files.
You should be able to replace the line gunzip bacteria/*.gz with find ./bacteria -name "*.gz" -exec gunzip {} +

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants