Hadoop state non-functional #83

kadwanev · 2016-03-01T21:23:26Z

It looks like the HadoopEngine isn't currently in an executable state. There are references to settings that have been removed fairly recently. Is the implementation here something that could be made to work?

kadwanev · 2016-03-01T21:24:32Z

@gregory-marton question for you :)

gregory-marton · 2016-03-01T21:26:16Z

Sounds like you've done some investigation already. Can you share the details? In what way did it not work, which settings did you find, etc.?

kadwanev · 2016-03-01T21:34:52Z

Oh, I thought it was intentional.

crosscat/src/HadoopEngine.py

Line 27 in d7765df

from crosscat.settings import Hadoop as hs

There appears to not be a crosscat.settings since v0.1.25
The change that has broken seems sensible and straightforward but I would ask if the implementation, if fixed, would still be functional in the current version.

gregory-marton · 2016-03-01T21:42:10Z

I'm actually not sure of the history. I came onboard around v0.1.40, and this was not on my radar.
@riastradh-probcomp if you have time, can you give more context?

@kadwanev, pull requests are always welcome!

riastradh-probcomp · 2016-03-01T21:58:18Z

Nobody has touched the Hadoop code in years, and it apparently requires various moving parts that were customized for one developer's setup years ago, with some private network layout and Amazon S3 account and local Hadoop installation &c.

I expect it would be easier to start from scratch than to try to revive what's there.

gregory-marton · 2016-03-01T22:00:10Z

Given that context, I expect the appropriate "fix" would be to remove HadoopEngine. @kadwanev, if you want to take this on instead, we would absolutely welcome it. If interested, let me know a time frame to check back with you?

kadwanev · 2016-03-01T22:04:57Z

Understood. Thanks for the responses.

Just want to ask:
Is the distribution technique still sound?
Did it ever work?

I ask this because the only reducer reference I see is /bin/cat, which leads me to question if it collected results back into a single response.

I want to know if the current implementation is a good starting point or not.

riastradh-probcomp · 2016-03-01T22:43:56Z

There is no 'reduce' step because Crosscat's job is just to apply a transition operator to each of a number of independent states -- it's all 'map', and it is embarrassingly parallelizable, so any parallelism you throw at it should stick, no matter how trivial.

The MultiprocessingEngine is just LocalEngine with Python map replaced by multiprocessing.pool().map to transition the states in separate processes. Doing the same on different computers will certainly work just fine.

kadwanev · 2016-03-01T23:11:16Z

Thanks. That definitely answers my question. I'll be looking to contribute some code as soon as possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hadoop state non-functional #83

Hadoop state non-functional #83

kadwanev commented Mar 1, 2016

kadwanev commented Mar 1, 2016

gregory-marton commented Mar 1, 2016

kadwanev commented Mar 1, 2016

gregory-marton commented Mar 1, 2016

riastradh-probcomp commented Mar 1, 2016

gregory-marton commented Mar 1, 2016

kadwanev commented Mar 1, 2016

riastradh-probcomp commented Mar 1, 2016

kadwanev commented Mar 1, 2016

Hadoop state non-functional #83

Hadoop state non-functional #83

Comments

kadwanev commented Mar 1, 2016

kadwanev commented Mar 1, 2016

gregory-marton commented Mar 1, 2016

kadwanev commented Mar 1, 2016

gregory-marton commented Mar 1, 2016

riastradh-probcomp commented Mar 1, 2016

gregory-marton commented Mar 1, 2016

kadwanev commented Mar 1, 2016

riastradh-probcomp commented Mar 1, 2016

kadwanev commented Mar 1, 2016