Downloading a model from within Python without interrupting your script #10608
Replies: 2 comments 2 replies
-
I totally understand the frustration and confusion around this, but I don't think it's going to work in general unless you do install separately and restart python afterwards. In practice, however, it will probably currently work for all or nearly all of the current The main problem is that this doesn't work for packages that have dependencies that add entry points. So if you have a base spacy install and load a There are potentially workarounds for reloading the entry points on the fly, but they are probably going to be brittle and different across different versions of python. We've looked into it in the past and decided it wasn't worth pursuing, but it's also possible this could change as we drop older versions of python. At any rate, there are a ton of advantages in packaging pipelines as full-fledged pip packages rather than just treating them as data. |
Beta Was this translation helpful? Give feedback.
-
I did not intend to discredit how spaCy handles package/model installations. I definitely appreciate the upsides that it brings. But, as you mention, one drawback is the requirement to always manually install any model. This is a manual step that you could argue is simply one additional step and not too bad. However, I also find myself wanting to make all models available if requested by a user, on-the-fly. As an example, this simple demo where I do not want to put all of the possible spaCy models wheels in my pip file, but where they can be downloaded when needed. I do understand that spaCy does not have this built-in. From what I have tested, this seems to work for the sm,md,lg models and as well as the Note: I work in research, not in production environments. So for me it isn't as much of a problem if |
Beta Was this translation helpful? Give feedback.
-
I just posted this on Twitter and I figured I'd share it here as well.
I've often ran into the question where I use spaCy under the hood and I want to make my code base accessible and easy to use, but was hindered by the need of users to pre-install models (
spacy -m download ...
). For some users, this is a technical, frightening or simply "extra" step to take. They just want to follow example scripts and run commands with whichever model/language they want. This may seem silly to many of you, but I've had people knocking on my door with this a couple of times.So recently, I've started to add this little snippet in my codebases, which automatically downloads and loads (technically
import
s) a model when it is not installed. It therefore mimics behavior of other libraries likestanza
ortransformers
where models are downloaded on-the-fly. I am aware that spaCy models are a different beast altogether, with their own dependencies, configs, and all kinds of magic. But for the end-user, the behavior is the same.I hope it helps some of you! And if there are any issues with this snippet (security, portability, usability wise), I'd be very happy to hear it, and keep track of them in this first post!
EDIT: I packaged it for easy use so you can just pip-install it. If it turns out that there are too many/intricate underlying issues with this, I am happy to hear it and re-adjust.
Potential issues
download
invokespip
to install the module/models, it is technically possible that the models depend on different versions of installed packages, leading to the on-the-fly uninstallation/reinstallation of packages, potentially even of spaCy itself. That is not wanted! But as long as official spaCy packages are installed, such problems should not occur I think. However, because of this I would not recommend to also do this for any spaCy model on the Hugging Face hub, for which you have no control over its dependencies.Beta Was this translation helpful? Give feedback.
All reactions