Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel::Deadworker error on a specific set of graph's nodes #154

Open
huskebasi opened this issue Jan 22, 2016 · 4 comments
Open

Parallel::Deadworker error on a specific set of graph's nodes #154

huskebasi opened this issue Jan 22, 2016 · 4 comments

Comments

@huskebasi
Copy link

Hi Everyone! I loved parallel since I discovered it and it allowed me to speedup my simulations in my academic research. I'm now using it to perform some optimization on a graph representing purchase journeys for customers buying a product. I have to perform a scoring process (based on some criteria e.g.entropy, frequency, etc) on all the subset of the nodes of the graph (the enumeration of its powerset) and then choose the best set based on this score.
I've implemented with success a call to parallel.map method and 99% of the times the code flows with an awesome speedup but on a specific data set (read from a DB) it gives me this error:

I, [2016-01-22 10:46:06 UTC+0000#4308]  INFO -- : enumerator.rb ---------- : -----===== PARALLEL enumeration started! =====-----
/home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:59:in `rescue in work': Parallel::DeadWorker (Parallel::DeadWorker)| Time: 00:00:57
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:56:in `work'
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:319:in `block (4 levels) in work_in_processes'
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:419:in `with_instrumentation'
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:318:in `block (3 levels) in work_in_processes'
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:312:in `loop'
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:312:in `block (2 levels) in work_in_processes'
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:183:in `block (2 levels) in in_threads'

I've looked a lot online to find solutions or description of this error but I've only found other examples of it in very different contexts.
What exactly does it mean? How can I prevent this from happening?

Here's the core of the code that raise the error, if it could help:

  pr_nds = Parallel.map(cs_powerset, :progress => " Executing PARALLEL enumeration") do |c|
    parallel_enumeration(c, ttrg, bpm, dst_mtx, nrm, orig_entropy, entropy_norm_factor, fct_entrpy, fct_imp, fct_frq, fct_nnd, sum_jrnys)
  end
def parallel_enumeration(c, ttrg, bpm, dst_mtx, nrm, orig_entropy, entropy_norm_factor, fct_entrpy, fct_imp, fct_frq, fct_nnd, sum_jrnys)
# c is an element of the powerset of the nodes of the graph
    min_e = Float::INFINITY
    max_e = 0
    res_ary_3d = Array.new
    bs = get_bs(c, ttrg, bpm, dst_mtx)
    if bs and bs.length > 0
      bs.powerset.each do |b|
      # b is an element of the powerset of the milestones (of the purchase journey) of the set c
        trg, rtp, prc = enum_milestones(c, b, dst_mtx)
        ary = Array.new
        trg.each do |t|
          rtp.each do |r|
            prc.each do |p|
              nds = Array.new
            # I compute the score of the element calling other methods
              nds_ary = get_node_arry(nds, bpm)
              entropy = bpm_entropy2(nds_ary, attrs, nrm)
              ent_diff = 1.0 - ((orig_entropy - entropy).abs * entropy_norm_factor)
              min_e = [min_e, ent_diff].min
              max_e = [max_e, ent_diff].max
              ent_diff *= fct_entrpy
              frq = get_node_frequencys(nds_ary, sum_jrnys)
              frq *= fct_frq
              imp = get_importance(nds_ary, @importance)
              imp *= fct_imp
              nns = get_number_of_nodes_score(nds_ary, bpm)
              nns *= fct_nnd
            # The total score of the set is stored in tmp_res
              tmp_res = frq + imp + nns + ent_diff
            # This array contains the score and the nodes associated to it
              ary.push(tmp_res)
              ary.push(nds)
            end # prc
          end # rtp
        end # trg
      # The final result is a 3D array where each element is relative of a subset c
        res_ary_3d.push(ary)
      end # bs
      return res_ary_3d
    end # if
  end

Thanks infintely.

@grosser
Copy link
Owner

grosser commented Jan 24, 2016

Parallel uses workers to work on each piece of input,
for example you have 10 chunks of work and want to run them in 2 processes, then each of these processes gets 5 pieces of work if they are the same size.
For unknown reasons one of these workers seems to die (process is killed / kills itself ...)
and then when the main process is trying to read what the worker has calculated it raises this exception.

To debug this you could:

  • print inside the worker to see when the worker dies (maybe a specific set of data raises an exception)
  • make the workers always return nil / a static value (maybe the return value is undumpable and that causes a error)
  • narrow down 1 specific example / dataset and post that here so we can dig in further or even make it a fail test case

@grosser
Copy link
Owner

grosser commented Jan 24, 2016

sorry for the late reply, but this issue is kind of intimidating and I needed a few calm minutes to wrap my had around what's going on :)

@huskebasi
Copy link
Author

The only thing I can say is that your work is amazing!
Thank you so much for the effort.
I'll use the serial version of the code for the moment.

Thanks again.
Regards
Riccardo

Il giorno dom 24 gen 2016 alle ore 02:45 Michael Grosser <
notifications@github.com> ha scritto:

sorry for the late reply, but this issue is kind of intimidating and I
needed a few calm minutes to wrap my had around what's going on :)


Reply to this email directly or view it on GitHub
#154 (comment).

@soobrosa
Copy link

soobrosa commented Feb 1, 2016

preview
Just ordered the T-shirt, ping if you need one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants