-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General problem: Featurization takes too long! #179
Comments
Do you know what is taking the most time? |
@utf Structure featurization. |
Is it SiteStatsFeaturizer (with CrystalNN preset) or something else? |
To featurize 83k structures, it is taking roughly 12+ hours on beefy lawrencium Xeon compute nodes. |
It is many of them. SiteStatsFeaturizer is among the worst offenders |
Try featurizing elastic tensor dataset with the autofeaturizer "best" preset and see for yourself. |
It could be that we keep recalculating the bonding which takes a long time for big structures. We could check by seeing if there is a big speed up using MultipleFeaturizer with the additional caching. This is not a long term solution though, as we would then lose a lot of the fidelity of timing per featurizer. If calculating the bonding is taking the most amount of time, we could alter the featurizers to also accept a BondedStructure object, in which case the bonding would not be recalculated. And then just have a StructureToBondedStructure conversion featurizer. |
So that is one problem. Another problem is that it is just too slow for even one SiteStatsFingerprint for many large structures. For example running it on MP towards the larger structures I get times of ~1s/sample (running on a LR4 node on Lawrencium). As far as I can tell, it is running in fully parallel mode (n_jobs = cput_count() = 24)... |
After looking into this, not sure caching actually really improves the performance (or maybe I am just using wrong?) |
done via 7f4f99e |
Featurizing all of MP takes at least one day. This is way, way too long.
Edit:
Easiest way to fix this is by:
The text was updated successfully, but these errors were encountered: