You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Almost all functions of the nlp module under-the-hoods make use of spaCy.
In general, spaCy is quite fast as it uses Cython.
The core code looks like this:
new_data = []
for row in nlp.pipe(s.values, batch_size=32):
new_data.append( ... row ...)
spacy pipe has been initially chosen as it's multi-threading. An alternative might be to use apply ( probably is slower).
The pipe functions have among other the n_threads as well as the batch_size arguments. Tuning this values might be very important.
This task consists in:
Understand spaCy pipe
Test on a large dataset different combinations of n_threads and batch_size value
(it it make sense) Compare this results with the pandas apply approach
Pick the best solution and implement it in all NLP functions that uses spaCy under-the-hoods
We might find that the optimal values of n_threads and batch_size are not always the same, in this case, we will need to add it as arguments to the NLP functions and update the docstring.
I guess this task might take quite long time, what if we prioritize to completely finish part 2 of the "API next checklist" and then move on to part 4?
(Edit)
Almost all functions of the
nlp
module under-the-hoods make use of spaCy.In general, spaCy is quite fast as it uses Cython.
The core code looks like this:
spacy pipe
has been initially chosen as it's multi-threading. An alternative might be to useapply
( probably is slower).The
pipe
functions have among other then_threads
as well as thebatch_size
arguments. Tuning this values might be very important.This task consists in:
pipe
n_threads
andbatch_size
valuepandas apply
approachn_threads
andbatch_size
are not always the same, in this case, we will need to add it as arguments to the NLP functions and update the docstring.Useful resources:
Turbo-charge your spaCy NLP pipeline
The text was updated successfully, but these errors were encountered: