Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parallel clocks #31

Closed
4 tasks
pbayer opened this issue Sep 14, 2020 · 9 comments
Closed
4 tasks

Improve parallel clocks #31

pbayer opened this issue Sep 14, 2020 · 9 comments
Assignees

Comments

@pbayer
Copy link
Collaborator

pbayer commented Sep 14, 2020

Facilitate the use of parallel clocks:

  • Actor clockwork single-threaded,
  • Actor clockwork multi-threaded,
  • Single clocks on parallel threads,
  • Master and worker clocks on threads 2..nthreads(), relieve thread 1

This will improve greatly the speed of single- and multi-threaded simulations.
It eliminates the need to use onthread to speedup task-based single-threaded simulations.

@pbayer
Copy link
Collaborator Author

pbayer commented Apr 12, 2021

Unfortunately I cannot see the following comment by @SamRodkey here on Github

its link should be #31 (comment)

Side note to above: I tried avoiding using parallel clocks at all and converting my existing sim to an activity-based model per the discussion in the benchmarks section, but that somehow made my code go even slower than with a process-based approach. I was surprised by this result. I also tried using DiscreteEvents.onthread(2) to de-conflict with the Julia scheduler, but that also didn't really give me much of a performance benefit.

It seems like using processes has very poor performance beyond 10 or so processes all putting/taking from Channels. Not sure if there are ways to target improving the scaling of this? The process-based modeling approach made the most intuitive sense to me and also made my code come out super readable for others.

@pbayer
Copy link
Collaborator Author

pbayer commented Apr 12, 2021

@SamRodkey posted another comment not visible here:

Not sure if this warrants a separate issue, but I'm a bit stumped as to how I keep getting task failures using parallel clocks.

I first ran into this problem when digging into why a simulation I was developing that has a bunch of parallel decoupled processes was so slow. I was running 20+ parallel processes that do not depend on each other and only communicate through channels and run time was 100x slower than with 2-4 parallel processes. I decided to turn on parallelism to see if that help improve run time.

It did, but many of my processes would randomly fail on calls to rand() even when I wasn't using random numbers.

I've been able to reproduce this sort of outcome on my machine with a far simpler example I'll paste in here. When I run the following on my machine:

using DiscreteEvents, Printf

# here is the process
function TaskDoer(clock::Clock, input_queue::Channel{Float64})

    task_duration = take!(input_queue)  # process will block on take

    delay!(clock, task_duration)

end

"Helper function to display parallel clock processes/state"
function display_pclock(clock)

    @printf("\n")
    
    for thread in 1:Threads.nthreads()
        if thread==1
            clk=clock
        else
            clk = pclock(clock, thread).clock
        end
        
        @printf("Clock %i\n", thread)
        
        for (n,p) in clk.processes
            @printf("\tProcess %i : %s : %s\n", n, p.f, p.task)
        end
    end

end

function run_sim()

    clock = PClock();

    pseed!(100)

    # create non-random durations for each task
    durations = 0.3 .+ 0.1 * sin.(range(0., stop=2*pi, length=8000));

    # create input queue
    input_queue = Channel{Float64}(10000)

    # spawn a bunch of processes
    for i=1:32
        process!(clock, Prc(TaskDoer, input_queue), cid=(mod(i, Threads.nthreads()) + 1))
    end

    # put durations into the input channel
    for d in durations
        put!(input_queue, d)
    end

    @time run!(clock, 1.);  # run for a second to get compile/JIT warmup
    @time run!(clock, 5000.);  # run for full desired amount

    display_pclock(clock);

    return clock

end

clock = run_sim();

I get the following output, or something very similar with a few of the failed tasks occurring in different places.

  0.510163 seconds (450.81 k allocations: 26.262 MiB, 40.30% compilation time)
 11.688433 seconds (35.38 M allocations: 771.571 MiB, 2.15% gc time)

Clock 1
        Process 4 : TaskDoer : Task (runnable) @0x000000001ab35dc0
        Process 2 : TaskDoer : Task (runnable) @0x000000001a773210
        Process 3 : TaskDoer : Task (runnable) @0x000000001a8b7080
        Process 1 : TaskDoer : Task (runnable) @0x000000001a10f3a0
Clock 2
        Process 4 : TaskDoer : Task (failed) @0x00000000820652d0
        Process 2 : TaskDoer : Task (runnable) @0x0000000082064e20
        Process 3 : TaskDoer : Task (runnable) @0x0000000082065140
        Process 1 : TaskDoer : Task (runnable) @0x000000008126e720
Clock 3
        Process 4 : TaskDoer : Task (runnable) @0x000000001a033210
        Process 2 : TaskDoer : Task (runnable) @0x000000001a032ef0
        Process 3 : TaskDoer : Task (runnable) @0x000000001a033080
        Process 1 : TaskDoer : Task (runnable) @0x00000000805eb6c0
Clock 4
        Process 4 : TaskDoer : Task (runnable) @0x0000000080c847e0
        Process 2 : TaskDoer : Task (runnable) @0x0000000080c844c0
        Process 3 : TaskDoer : Task (failed) @0x0000000080c84650
        Process 1 : TaskDoer : Task (runnable) @0x0000000080c87080
Clock 5
        Process 4 : TaskDoer : Task (runnable) @0x000000001a0347e0
        Process 2 : TaskDoer : Task (runnable) @0x000000001a0344c0
        Process 3 : TaskDoer : Task (runnable) @0x000000001a034650
        Process 1 : TaskDoer : Task (runnable) @0x000000001a034330
Clock 6
        Process 4 : TaskDoer : Task (runnable) @0x000000001a0387e0
        Process 2 : TaskDoer : Task (runnable) @0x000000001a0384c0
        Process 3 : TaskDoer : Task (runnable) @0x000000001a038650
        Process 1 : TaskDoer : Task (runnable) @0x000000001a038330
Clock 7
        Process 4 : TaskDoer : Task (runnable) @0x000000001a03c7e0
        Process 2 : TaskDoer : Task (runnable) @0x000000001a03c4c0
        Process 3 : TaskDoer : Task (runnable) @0x000000001a03c650
        Process 1 : TaskDoer : Task (runnable) @0x000000001a03c330
Clock 8
        Process 4 : TaskDoer : Task (failed) @0x000000001a0407e0
        Process 2 : TaskDoer : Task (runnable) @0x000000001a0404c0
        Process 3 : TaskDoer : Task (runnable) @0x000000001a040650
        Process 1 : TaskDoer : Task (runnable) @0x000000001a040330
Clock 1 (+7): state=:idle, t=5001.0, Δt=0.01, prc:4
  scheduled ev:0, cev:0, sampl:0

When I go into the REPL and look at these failed tasks, I get calls tacks that point to calls to rand inside of delay!. Is this expected? Why do we call rand at all? I looked into this a little bit and was very confused.

Here is a representative call stack:

julia> pclock(clock, 8).clock.processes[4].task
Task (failed) @0x000000001a0407e0
AssertionError: length(ints) == 501
Stacktrace:
  [1] mt_setfull!(r::Random.MersenneTwister, #unused#::Type{UInt64})
    @ Random C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Random\src\RNGs.jl:260
  [2] reserve1
    @ C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Random\src\RNGs.jl:291 [inlined]
  [3] mt_pop!
    @ C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Random\src\RNGs.jl:296 [inlined]
  [4] rand
    @ C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Random\src\RNGs.jl:464 [inlined]
  [5] rand
    @ C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Random\src\Random.jl:256 [inlined]
  [6] rand(rng::Random.MersenneTwister, sp::Random.SamplerRangeNDL{UInt64, Int64})
    @ Random C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Random\src\generation.jl:332
  [7] rand(rng::Random.MersenneTwister, X::UnitRange{Int64})
    @ Random C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Random\src\Random.jl:253
  [8] _spawnid(c::Clock{Vector{DiscreteEvents.ClockChannel}})
    @ DiscreteEvents ~\.julia\packages\DiscreteEvents\DM41T\src\schedule.jl:212
  [9] _spawnid(ac::DiscreteEvents.ActiveClock{DiscreteEvents.ClockEvent})
    @ DiscreteEvents ~\.julia\packages\DiscreteEvents\DM41T\src\schedule.jl:219
 [10] _spawnid(c::Clock{Ref{DiscreteEvents.ActiveClock{DiscreteEvents.ClockEvent}}})
    @ DiscreteEvents ~\.julia\packages\DiscreteEvents\DM41T\src\schedule.jl:216
 [11] _cid(c::Clock{Ref{DiscreteEvents.ActiveClock{DiscreteEvents.ClockEvent}}}, cid::Int64, spawn::Bool)
    @ DiscreteEvents ~\.julia\packages\DiscreteEvents\DM41T\src\schedule.jl:224
 [12] event!(clk::Clock{Ref{DiscreteEvents.ActiveClock{DiscreteEvents.ClockEvent}}}, ex::DiscreteEvents.var"#64#65"{Condition}, t::Float64; cid::Int64, spawn::Bool)
    @ DiscreteEvents ~\.julia\packages\DiscreteEvents\DM41T\src\schedule.jl:72
 [13] event!
    @ ~\.julia\packages\DiscreteEvents\DM41T\src\schedule.jl:70 [inlined]
 [14] #event!#16
    @ ~\.julia\packages\DiscreteEvents\DM41T\src\schedule.jl:110 [inlined]
 [15] event!
    @ ~\.julia\packages\DiscreteEvents\DM41T\src\schedule.jl:108 [inlined]
 [16] delay!
    @ ~\.julia\packages\DiscreteEvents\DM41T\src\process.jl:116 [inlined]
 [17] TaskDoer(clock::Clock{Ref{DiscreteEvents.ActiveClock{DiscreteEvents.ClockEvent}}}, input_queue::Channel{Float64})
    @ Main ~\basic_parallel_test.jl:11
 [18] _loop(p::Prc, cycles::Float64)
    @ DiscreteEvents ~\.julia\packages\DiscreteEvents\DM41T\src\process.jl:36
 [19] (::DiscreteEvents.var"#58#60"{Prc, Float64})()
    @ DiscreteEvents .\task.jl:118

I think I'm misunderstanding how the sim works under the hood.

Perhaps this is user error and the documentation could describe how to better initialize the random number generator when running with PClock? I tried the use of pseed but that didn't seem to do the trick.

Any advice @pbayer?

I post it here from my emails in order to work on it.

@pbayer
Copy link
Collaborator Author

pbayer commented Apr 12, 2021

@SamRodkey thank you for this report.

First Diagnosis

I can reproduce those failures. I am not sure about the further failure stack into random number generation, but the call is doing the wrong thing: after a call to delay! it tries to create an event! on a random clock (therefore the rand() call in _spawnid).

Instead a process running under a parallel clock must schedule an event to the same clock, not to another random one. In order to do so it must call event! with spawn = false. This is missing in the delay! function around line 116 in process.jl as reported in the above stack trace.

@pbayer
Copy link
Collaborator Author

pbayer commented Apr 12, 2021

@SamRodkey : In your first comment you reported:

It seems like using processes has very poor performance beyond 10 or so processes all putting/taking from Channels.

I know of "very poor performance" with lots and lots of processes (in the thousands or so). But I have to investigate performance degradation beyond "10 or so processes". That should not be the case.

The process-based modeling approach made the most intuitive sense to me and also made my code come out super readable for others.

I agree that often it is the more appropriate approach. And therefore there should be no performance degradation with numbers of processes in a reasonable range.

One first impression: In your TaskDoer function you do a take! from a channel. This actually slows a simulation by doing an IO and switching control to the Julia scheduler. But I do not know if that may be the cause for your observation of a performance degradation.

pbayer added a commit that referenced this issue Apr 12, 2021
reported in issue #31
@pbayer
Copy link
Collaborator Author

pbayer commented Apr 12, 2021

ok, regarding the above Task failures the problem was the use of ifelse instead of the ternary operator _ ? _ : _ in _cid. Thus the _spawnid got evaluated even if not needed (with spawn=false). This will be fixed with v0.3.4. see below.

To do

There remains the problem with a call to rand() from a parallel thread causing a task failure even if it will not occur anymore in the above context. I must further investigate that.

@pbayer
Copy link
Collaborator Author

pbayer commented Apr 12, 2021

Register v0.3.4

@JuliaRegistrator register

Release notes:

This is a bugfix release:

@JuliaRegistrator
Copy link

Registration pull request created: JuliaRegistries/General/34111

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.3.4 -m "<description of version>" fba6278277cdfa8698544ff1f8f36c7176d02eb1
git push origin v0.3.4

@pbayer pbayer self-assigned this Apr 12, 2021
@SamRodkey
Copy link
Contributor

Sorry for the super delayed reply @pbayer.

I ended up deleting my comment originally because I think after working on it for a little while, I was not sure if it was user error and me doing something dumb that isn't how the code is intended to be used. Your comments above indicate that at least it wasn't expected behavior.

One first impression: In your TaskDoer function you do a take! from a channel. This actually slows a simulation by doing an IO and switching control to the Julia scheduler. But I do not know if that may be the cause for your observation of a performance degradation.

I am pretty aggressively using Channels to communicate certain data between parallel processes. I kind of figured that was the most "simple" approach when I have multiple agents all taking from the same queue, but maybe this isn't optimal for performance? I definitely do see my sim grinding alot and running slower than I think I expected when I have 16-32 agents all taking from a common task queue. I will investigate further this week.

@pbayer
Copy link
Collaborator Author

pbayer commented Feb 23, 2023

I close this issue since in my view parallel clocks have to be rewritten to make it stable, see issue #47 .

So, sorry, back to the drawing board 😥

@pbayer pbayer closed this as completed Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants