Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when using from multiple threads #835

Closed
jw3126 opened this issue May 28, 2021 · 12 comments
Closed

Crash when using from multiple threads #835

jw3126 opened this issue May 28, 2021 · 12 comments

Comments

@jw3126
Copy link

jw3126 commented May 28, 2021

The following code crashes julia:

using HDF5
dir = tempdir()
@sync for i in 1:10
    data = randn(10)
    Threads.@spawn begin
        path = joinpath(dir, "$i.h5")
        h5open(path, "w") do file
            file["data"] = data
        end
    end
end
...
Bus error (core dumped)

Is there something that can be done to prevent the crash? Like putting a lock in HDF5.jl?

@simonbyrne
Copy link
Collaborator

simonbyrne commented Jun 7, 2021

By default, HDF5 is not thread-safe: there is an option to build it thread-safe, but this only serializes access anyway, and is incompatible with Parallel HDF5:
https://portal.hdfgroup.org/display/knowledge/Questions+about+thread-safety+and+concurrent+access

Adding locking to every function call could impose significant overhead, so probably should not be the default. You would typically want to lock sections where multiple function calls occur. With Julia, it isn't too hard to handle the locking yourself:

using HDF5
dir = tempdir()
lk = ReentrantLock()
@sync for i in 1:10
    data = randn(10)
    Threads.@spawn begin
        path = joinpath(dir, "$i.h5")
        lock(lk) do
            h5open(path, "w") do file
                file["data"] = data
            end
        end
    end
end

@jw3126
Copy link
Author

jw3126 commented Jun 7, 2021

Thanks! The problem in my case is, that there are two different libraries, that don't know about each other and both want to use HDF5. Currently, I impose the lock around functions like:

function some_library_function()
    while (...)
        perfectly_thread_safe_stuff(...)
        h5open(...)
    end
end

but this is wasteful.

@gszep
Copy link
Contributor

gszep commented Oct 26, 2022

should HDF5_jll build with --enable-threadsafe by default? I think PyTables/PyTables#776 do this

@mkitti
Copy link
Member

mkitti commented Oct 26, 2022

It would be good for us to consider adding thread safety features, but I would prefer to implement on the Julia side.

That said there should be a documented way to plugin the HDF5 C library with thread safety enabled there.

@simonbyrne
Copy link
Collaborator

What if we were to have a default lock which we use for finalizers (which would help with #990), and that could be acquired in other cases?

@mkitti
Copy link
Member

mkitti commented Oct 26, 2022

For the lock to be effective, the user would also need to use the lock when using HDF5. That said, implementing a lock for finalizers is a good and perhaps necessary first step.

@gszep
Copy link
Contributor

gszep commented Oct 27, 2022

does concurrent read access work?

h5open(path, "r") do file
    Threads.@threads for id  ids
        dataset = file["$id/dataset"][:]
        ############### concurrent lazy access and computations
        #
        #
    end
end

@mkitti
Copy link
Member

mkitti commented Oct 27, 2022

does concurrent read access work?

No, not through the HDF5 library via threads. You might be able to do it with memory mapping.

You can do concurrent reads with SWMR via multiple processes.

@gszep
Copy link
Contributor

gszep commented Oct 27, 2022

this MWE seems to work

function f()
    h5open("sample.h5", "r") do file
        for k  keys(file)
            log10.(file["$k/counts"][:])
        end
    end
end

function g()
    h5open("sample.h5", "r") do file
        Threads.@threads for k  keys(file)
            log10.(file["$k/counts"][:])
        end
    end
end

@time f() # 0.126000 seconds (2.75 k allocations: 112.420 MiB)
@time g() # 0.043185 seconds (2.84 k allocations: 112.429 MiB)

sample.zip

@mkitti
Copy link
Member

mkitti commented Oct 27, 2022

It will work sometimes...

@simonbyrne
Copy link
Collaborator

@gszep What does your benchmark give if you use the sb/lock branch on #1021?

@simonbyrne
Copy link
Collaborator

Closed by #1021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants