spaCy multiprocessing ('n_process') makes Docker image exit #10087
Replies: 2 comments 12 replies
-
How many workers are you launching Is it possible to share the |
Beta Was this translation helpful? Give feedback.
-
@dave-espinosa That all makes sense to me, thanks for summarizing. We do have a note in the docs about the behavior, but it's not explicitly called out. We do include a link that I referenced throughout to explain what's going on.
The issue here is that In addition, it's easy to run out of memory in these scenarios, so you might want to look into that. The docker defaults are pretty low for memory use and typically when RAM is the issue you'll see the exit behavior like you have. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone,
Long story short: I need to apply some NER on several thousands of documents. I have decided to go with the "en_core_web_lg" model. I want to increase the NER processing speed as much as possible, so the speed FAQ has been already considered. Two of the main suggestions there, require the developer to use
nlp.pipe
and fine-tune then_process
argument in it, which I have already done, to obtain something like this:The results look fine so far, at least in the tests ran in my computer. To speed things up even more, and since my company works with GCP, I was planing to create a Docker image with the previous code in it, to then deploy it in several endpoints. The goal is of course split the 'text_list' in 'n' chunks, where
n
= number of endpoints. The problem that I have right now is that whenn_process
= 1, the Docker image manages to normally process all the texts, batch after batch (in the following toy example, 2 batches of 100 texts were used):BUT if
n_process
> 1, then the image just processes the first batch, and then exits:After some research in Google, even when "one process per Docker image" is usually recommended, there are ways to achieve multiprocessing. However and quite honestly I don't know how to adapt those suggestions for spaCy's
nlp.pipe
.Have you had any experience implementing something similar to this?
PS.: I discarded this recommendation because my company's GCP architecture is better oriented to deal with Docker images.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions