[WIP] phrases multicore using joblib threading #1433

prakhar2b · 2017-06-20T10:21:14Z

No description provided.

piskvorky · 2017-06-20T13:05:40Z

@prakhar2b did you talk to @menshikh-iv and @jayantj ?

Unfortunately this is not what we want.

menshikh-iv · 2017-06-23T13:42:23Z

@piskvorky It's a part of GSoC proposal, label 1.4

piskvorky · 2017-06-23T15:37:37Z

Yes, we want multicore, but joblib is not the right tool.

Joblib uses multiprocessing, and as I explained earlier, that is a bad choice of granularity when the operation to be done is as simple as incrementing a counter. The queueing/pickling/inter-process communication overhead will be enormous.

jayantj · 2017-06-23T16:06:16Z

I completely agree that multiprocessing is not a good solution due to the overheads/copying involved. We discussed trying out a multi-threading approach instead (joblib seems to allow this, although the GIL will have to be deal with). One idea was to use libcuckoo since it seems to allow for concurrent read/writes.

gojomo · 2017-06-23T19:00:45Z

I suspect multiprocessing might be a competitive approach in the particular case where each process can open its own reader a into a disjoint range of the corpus – and thus the only IPC is tiny summary counts, not bulk ranges of text.

So it might only be a strategy where the corpus is large, and the user sophisticated enough to have already structured their corpus as some uncompressed file or set-of-many-smaller-files.

piskvorky · 2017-06-24T12:20:44Z

Yes, that's the case where we create several counters independently and merge them at the end. I think that's the correct level of granularity for something as simple as incrementing a counter (but requires the user to have multiple input streams, rather than one, to parallelize well).

prakhar2b · 2017-06-29T11:18:07Z

closing this PR as parallelizing using joblib threading doesn't improve the performance of pure python code and Phrases module has nothing much to cythonize other than static typing which doesn't result in desirable performance improvement.

Also, ref - this comment , this comment above

For fast counter, there is another PR #1446 in gensim , hopefully parallelizing will be better suited there.

initialized joblib and created func count_vocab

39eb2db

Prakhar Pratyush added 3 commits June 26, 2017 23:06

[WIP] joblib threading

2b26682

debug comments removed

40aa8fb

debug comments removed

e3ed67d

prakhar2b changed the title ~~[WIP] phrases multiprocessing using joblib~~ [WIP] phrases multicore using joblib threading Jun 26, 2017

using lock for race condition

9bb5a10

prakhar2b closed this Jun 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] phrases multicore using joblib threading #1433

[WIP] phrases multicore using joblib threading #1433

prakhar2b commented Jun 20, 2017

piskvorky commented Jun 20, 2017 •

edited

Loading

menshikh-iv commented Jun 23, 2017

piskvorky commented Jun 23, 2017 •

edited

Loading

jayantj commented Jun 23, 2017 •

edited

Loading

gojomo commented Jun 23, 2017

piskvorky commented Jun 24, 2017 •

edited

Loading

prakhar2b commented Jun 29, 2017

[WIP] phrases multicore using joblib threading #1433

[WIP] phrases multicore using joblib threading #1433

Conversation

prakhar2b commented Jun 20, 2017

piskvorky commented Jun 20, 2017 • edited Loading

menshikh-iv commented Jun 23, 2017

piskvorky commented Jun 23, 2017 • edited Loading

jayantj commented Jun 23, 2017 • edited Loading

gojomo commented Jun 23, 2017

piskvorky commented Jun 24, 2017 • edited Loading

prakhar2b commented Jun 29, 2017

piskvorky commented Jun 20, 2017 •

edited

Loading

piskvorky commented Jun 23, 2017 •

edited

Loading

jayantj commented Jun 23, 2017 •

edited

Loading

piskvorky commented Jun 24, 2017 •

edited

Loading