-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up data loading process #376
Speed up data loading process #376
Conversation
parser.add_argument('--alignment_dir', type=str, help='path to alignment dir') | ||
args = parser.parse_args() | ||
alignment_dir = args.alignment_dir | ||
stockholm_files = [i for i in os.listdir(alignment_dir) if (i.endswith('.sto') and ("hmm_output" not in i))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here can you add an exclusion "uniprot_hits" as well? I changed this recently, it is only used for msa pairing.
continue | ||
|
||
msa_data[f] = msa | ||
# Now will split the following steps into multiple processes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we already generated the pkl file, then we should check that it exists before re-parsing the msas. Or does it get removed somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh also, is there reason we couldn't just call a function to do this instead of running the script with subprocess?
Now MSA files are parsed in parallel instead of in serial way