Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Rust Support #109

Closed
SchrodingerZhu opened this issue Jan 5, 2020 · 22 comments
Closed

Adding Rust Support #109

SchrodingerZhu opened this issue Jan 5, 2020 · 22 comments

Comments

@SchrodingerZhu
Copy link
Collaborator

SchrodingerZhu commented Jan 5, 2020

I am trying to bind snmalloc to rust to have a comprehensive malloc-bench (WIP).

Basically, I have already created several bindings: snmalloc,snmalloc-sys, snmallocator.

Yet, there is a issue to provide efficient realloc with alignment. I added one following the code in malloc.h, but I do not know whether it is sound and safe. So I guess I need some guidance here.

@mjp41
Copy link
Member

mjp41 commented Jan 5, 2020

Hi @SchrodingerZhu

This is awesome!

Firstly is there a specification of realloc aligned for Rust that I can read. There are a bunch of things that are done here to meet various C expectations that might not be required. The most efficient version will do no more than necessary.

In particular, the interaction between alignment work and realloc work has many possibilities: What should size = 0, alignment = 16 do?

  • error
  • free the object
  • return a pointer that is aligned to 16.
  • undefined behaviour (i.e. what ever you want, as client promises not to do it)

The order of checks is important to get each of these behaviours.

Second minor point, if you reformat a file it makes code review very hard, if we are to take any changes they must be formated with our Clangformat config, our CI enforces this.

@mjp41
Copy link
Member

mjp41 commented Jan 5, 2020

Is it supposed to satisfy the following:

https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html#safety-4

How defensive do you want the code to be for undefined behaviour? The current code checks a lot of properties and returns error codes, but as the property failures are undefined behaviour we could be more efficient here? So it depends on the performance/usability trade-off for how this is implemented. I would suggest we turn a lot of stuff into asserts, so the Debug version checks everything and release is fast.

@SchrodingerZhu
Copy link
Collaborator Author

I am so sorry that I am being terribly busy recently. I will back to this work at this weekend. :P

@mjp41
Copy link
Member

mjp41 commented Jan 15, 2020

@SchrodingerZhu no need to apologise, just let me know if there is anything I can do to help in advance.

The project: mimalloc_rust looks like it could be good inspiration. It has fewer repos, then your current proposal, so probably easier to keep consistent.

I think it makes sense to do the correctness checking for arguments in the Rust code as this will allow more inlining and better error reporting.

I would propose you probably add a new file in

 snmalloc\src\override\rust.cc

That provides the API without any complex checking or assumptions that the C standard requires.

  • No realloc to size zero to mean free
  • No freeing or reallocing nullptrs

I would propose you call ThreadAlloc::get_noncachable() directly in your file. The (size,align) -> size calculation that I have factored out should be usable, but perhaps we need to move it to somewhere you can call it more easily. If you either copy or include malloc.cc for now, and I will refactor the function to somewhere else.

You can call

ThreadAlloc::get_noncachable().dealloc(ptr, size)

this may be faster than looking up in the page map the size of the allocation, but the calculation to align the value maybe slower.

If there is anyway, we can cache or inline the (size,align) -> size on the Rust side would probably be a win. Or we can perform a simpler lookup on the free case as we only need to know
(size,align) -> {SMALL, MEDIUM, LARGE}

Ultimately, knowing which bits cost is a case of getting data, so if we have profiling data we can see what needs improving. The aligned allocation path has not been heavily benchmark, so interested to see what you find.

@SchrodingerZhu
Copy link
Collaborator Author

SchrodingerZhu commented Jan 20, 2020

according to the document and the implementation of rust Layout, we can left many checking to the client. Hence, perhaps something like this is enough.

BTW, will snmalloc zero-initialize the memory block, though I guess it would not affect the performance since the derived alloc_zeroed in rust will invoke write_bytes (which is a memset intrinsic, and should use SIMD instructions if possible) and it should be fast enough.

If the following code is okay, I will start merging current repos into one repo asap.

#include "../mem/slowalloc.h"
#include "../snmalloc.h"
#include <cstring>
using namespace snmalloc;

inline size_t aligned_size(size_t alignment, size_t size)
{
    // Client responsible for checking alignment is not zero
    assert(alignment != 0);
    // Client responsible for checking alignment is not above SUPERSLAB_SIZE
    assert(alignment <= SUPERSLAB_SIZE);
    // Client responsible for checking alignment is a power of two
    assert(bits::next_pow2(alignment) == alignment);

    size = bits::max(size, alignment);
    snmalloc::sizeclass_t sc = size_to_sizeclass(size);
    if (sc >= NUM_SIZECLASSES)
    {
        // large allocs are 16M aligned, which is maximum we guarantee
        return size;
    }
    for (; sc < NUM_SIZECLASSES; sc++)
    {
        size = sizeclass_to_size(sc);
        if ((size & (~size + 1)) >= alignment)
        {
            return size;
        }
    }
    // Give max alignment.
    return SUPERSLAB_SIZE;
}

extern "C" void* rust_alloc(size_t alignment, size_t size) {
    return ThreadAlloc::get_noncachable()->alloc(aligned_size(alignment, size));
}

extern "C" void rust_dealloc(void* ptr, size_t alignment, size_t size) {
    ThreadAlloc::get_noncachable()->dealloc(ptr, aligned_size(alignment, size));
}

extern "C" void* rust_realloc(void* ptr, size_t alignment, size_t size, size_t new_size) {
    size_t old_size = aligned_size(alignment, size);
    new_size = aligned_size(alignment, new_size);
    if (new_size == old_size) return ptr;
    void* p = ThreadAlloc::get_noncachable()->alloc(new_size);
    if (p) {
        std::memcpy(p, ptr, old_size < new_size ? old_size : new_size);
        ThreadAlloc::get_noncachable()->dealloc(ptr, old_size);
    }
    return p;
}

Another thing to be considered is that (as you have mentioned), will the re-calculation of aligned memory (in dealloc and realloc) leads to a performance issue?

@mjp41
Copy link
Member

mjp41 commented Jan 20, 2020

So I think your implementation would work. I have one question that isn't clear from the standard, but I think you are doing the right thing:

Can the alignment change between the original and the new allocation?

I think as it says the outer Rust call, must be the same Layout as originally called with, thus I think it cannot change.

However, there appears to be a gap in the specification as it doesn't specify what to consider the Layout for an allocation returned by realloc is. I have submitted a PR to Rust to amend the documentation, so hopefully we are right from this point on.

The other thing is, I think you are memcpy too much data, surely you only have to copy the data spaces requested by the user:

extern "C" void* rust_realloc(void* ptr, size_t alignment, size_t old_size, size_t new_size) {
    size_t align_old_size = aligned_size(alignment, old_size);
    align_new_size = aligned_size(alignment, new_size);
    if (align_new_size == align_old_size) return ptr;
    void* p = ThreadAlloc::get_noncachable()->alloc(align_new_size);
    if (p) {
        std::memcpy(p, ptr, size < new_size ? old_size : new_size);
        ThreadAlloc::get_noncachable()->dealloc(ptr, align_old_size);
    }
    return p;
}

Given the current API there is no way to know how snmalloc has rounded up the allocation. If you expose a malloc_usable_size and that says it is safe to access the larger space, then that would change.

@SchrodingerZhu
Copy link
Collaborator Author

A published version of snmalloc-rs is at crates.io.

However, the documentation failed to build (because of Ninja):

$ rustc --version
rustc 1.42.0-nightly (c0e02ad72 2020-01-19)
$ cratesfyi --version
docsrs 0.6.0 (fc011ef 2020-01-20)
$ cratesfyi ...
[INFO] running `"docker" "create" "-v" "/home/cratesfyi/workspace/builds/snmalloc-rs-0.1.1/target:/opt/rustwide/target:rw,Z" "-v" "/home/cratesfyi/workspace/builds/snmalloc-rs-0.1.1/source:/opt/rustwide/workdir:ro,Z" "-v" "/home/cratesfyi/workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/home/cratesfyi/workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "MAP_USER_ID=1001" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "RUSTFLAGS=" "-e" "RUSTDOCFLAGS=-Z unstable-options --resource-suffix -20200119-1.42.0-nightly-c0e02ad72 --static-root-path / --extern-html-root-url snmalloc_sys=https://docs.rs/snmalloc-sys/0.1.1" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "3221225472" "--network" "none" "rustops/crates-build-env" "/opt/rustwide/cargo-home/bin/cargo" "+nightly" "doc" "--lib" "--no-deps"`
[INFO] [stderr] WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
[INFO] [stdout] 088f803240be559a83ea043ea7d562200d7deead892de4c26002fb9d2e171e0e
[INFO] running `"docker" "start" "-a" "088f803240be559a83ea043ea7d562200d7deead892de4c26002fb9d2e171e0e"`
[INFO] [stderr]    Compiling cc v1.0.50
[INFO] [stderr]    Compiling libc v0.2.66
[INFO] [stderr]    Compiling cmake v0.1.42
[INFO] [stderr]    Compiling snmalloc-sys v0.1.1
[INFO] [stderr] error: failed to run custom build command for `snmalloc-sys v0.1.1`
[INFO] [stderr] 
[INFO] [stderr] Caused by:
[INFO] [stderr]   process didn't exit successfully: `/opt/rustwide/target/debug/build/snmalloc-sys-3579fee7bd6a2cc9/build-script-build` (exit code: 101)
[INFO] [stderr] --- stdout
[INFO] [stderr] running: "cmake" "/opt/rustwide/cargo-home/registry/src/git.luolix.top-1ecc6299db9ec823/snmalloc-sys-0.1.1/snmalloc" "-G" "Ninja" "-DSNMALLOC_RUST_SUPPORT=ON" "-DCMAKE_BUILD_TYPE=Release" "-DCMAKE_INSTALL_PREFIX=/opt/rustwide/target/debug/build/snmalloc-sys-c8988e1b8a5dcd50/out" "-DCMAKE_C_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_C_COMPILER=/usr/bin/cc" "-DCMAKE_CXX_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_COMPILER=/usr/bin/c++"
[INFO] [stderr] -- Configuring incomplete, errors occurred!
[INFO] [stderr] See also "/opt/rustwide/target/debug/build/snmalloc-sys-c8988e1b8a5dcd50/out/build/CMakeFiles/CMakeOutput.log".
[INFO] [stderr] 
[INFO] [stderr] --- stderr
[INFO] [stderr] CMake Error: CMake was unable to find a build program corresponding to "Ninja".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
[INFO] [stderr] thread 'main' panicked at '
[INFO] [stderr] command did not execute successfully, got: exit code: 1
[INFO] [stderr] 
[INFO] [stderr] build script failed, must exit now', /opt/rustwide/cargo-home/registry/src/git.luolix.top-1ecc6299db9ec823/cmake-0.1.42/src/lib.rs:861:5
[INFO] [stderr] note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[INFO] [stderr] 
[INFO] running `"docker" "inspect" "088f803240be559a83ea043ea7d562200d7deead892de4c26002fb9d2e171e0e"`
[INFO] running `"docker" "rm" "-f" "088f803240be559a83ea043ea7d562200d7deead892de4c26002fb9d2e171e0e"`
[INFO] [stdout] 088f803240be559a83ea043ea7d562200d7deead892de4c26002fb9d2e171e0e

Any suggestion?

@plietar
Copy link
Contributor

plietar commented Jan 20, 2020

Could you "fake" snmalloc when building for docs.rs? Don't build or link to snmalloc, and cfg out the calls to it.
rust-lang/docs.rs#147 and rust-lang/docs.rs#191 have (somewhat) hacky solutions.

Or change the build.rs to use make instead of ninja.

@SchrodingerZhu
Copy link
Collaborator Author

I think it is more reasonable not to set Ninja as the default generator in publish. Let me just left the option for default generator on unix platform.

@mjp41
Copy link
Member

mjp41 commented Jan 22, 2020

So I think this #113 is a really good start and #115 makes free pretty fast too. I am a little concerned about the codegen for realloc. When you start benchmarking it would be good to know the speed of realloc . This currently doesn't have a nice fast path like alloc and free.

@mjp41
Copy link
Member

mjp41 commented Jan 23, 2020

If you want some publicity about this, let me know, and I can tweet about it. There is currently some buzz about Verona, which is why we wrote the allocator in the first place. So can probably ride that wave. Let's just be sure it is working well before we draw attention to it. :-)

If you want me to tweet about it, and you have a twitter handle, can you let me know it.

@SchrodingerZhu
Copy link
Collaborator Author

The packages are ready at crates.

There are two more things, and I will try to check them in the following days:

  • Check the compilation on Windows with MSVC toolchain.
  • Check the documentation.

@snf
Copy link

snf commented Jan 23, 2020

I couldn't make it work in Windows with MSVC and will try spending more time on this, but meanwhile some feedback:

  • Visual Studio 2015 is hardcoded. CMake should be able to find a VS version installed
  • snmallocshim-rust.lib is created in out/build/Debug while cmake-rs is expecting it in out/ so the linking stage fails. I'm not sure if this requires the install directive in snmalloc's CMakeFile or if it requires additional configuration in snmalloc-sys/build.rs
  • Is it possible to get rid of the nightly dependency by removing #![feature(static_nobundle)]?

Versions:

  • CMake version: 3.16.3
  • Visual Studio 16 2019
  • Rust1.42.0-nightly (d1e594f40 2020-01-22)

@mjp41
Copy link
Member

mjp41 commented Jan 23, 2020

Thanks @snf for taking a look.

I have pushed a branch, install, that adds an install target for cmake. This should put the results in CMAKE_INSTALL_PREFIX. If this is useful, then I can PR it in.

For CMake, if you specify, '-Ax64', then it will build the 64bit version for which ever version of Visual Studio you have (I think). But you can't set this for Makefiles or Ninja.

@SchrodingerZhu
Copy link
Collaborator Author

SchrodingerZhu commented Jan 24, 2020

My observation is that -Ax64 is naturally added by cmake-rs.

Something like the attached build script should fix the linking problem. However, I sadly run into a new problem:

The kernel32.lib is added during the linking process, but it still fails.

  = note: snmallocshim-rust.lib(rust.obj) : error LNK2019: unresolved external symbol __imp_VirtualAlloc2FromApp referenced in function "public: bool __cdecl snmalloc::LargeAlloc<class snmalloc::Me
moryProviderStateMixin<class snmalloc::PALWindows> >::reserve_memory<1>(unsigned __int64,unsigned __int64)" (??$reserve_memory@$00@?$LargeAlloc@V?$MemoryProviderStateMixin@VPALWindows@snmalloc@@@sn
malloc@@@snmalloc@@QEAA_N_K0@Z)
          C:\msys64\tmp\snmalloc-rs\target\debug\deps\snmalloc_sys-66fbd1983265cd21.exe : fatal error LNK1120: 1 unresolved externals
use cmake::Config;

fn main() {
    let mut cfg = &mut Config::new("snmalloc");

    let build_type = if cfg!(feature = "debug") {
        "Debug"
    } else {
        "Release"
    };

    cfg = cfg.define("SNMALLOC_RUST_SUPPORT", "ON")
        .profile(build_type);

    let target = if cfg!(feature = "1mib") {
        "snmallocshim-1mib-rust"
    } else {
        "snmallocshim-rust"
    };

    let mut dst = if cfg!(feature = "cache-friendly") {
        cfg.define("CACHE_FRIENDLY_OFFSET", "64").build_target(target).build()
    } else {
        cfg.build_target(target).build()
    };

    dst.push("./build");

    println!("cargo:rustc-link-lib={}", target);
    if cfg!(unix) {
        println!("cargo:rustc-link-search=native={}", dst.display());
        println!("cargo:rustc-link-lib=dylib=stdc++");
        println!("cargo:rustc-link-lib=dylib=atomic");
    } else {
        println!("cargo:rustc-link-search=native={}/{}", dst.display(), build_type);
    }
}

@mjp41
Copy link
Member

mjp41 commented Jan 24, 2020

You need mincore.lib for VirtualAlloc2. The following bit in the CMake should add it in::

    # VirtualAlloc2 is exposed by mincore.lib, not Kernel32.lib (as the
    # documentation says)
    target_link_libraries(snmalloc_lib INTERFACE mincore)

Not sure why it isn't.

@SchrodingerZhu
Copy link
Collaborator Author

@mjp41 The error occurs during rust's linking, hence I will add mincore in the rust build script.

@snf This commit supposes to fix the building problem.
.

@SchrodingerZhu
Copy link
Collaborator Author

A CI is added for this:

https://www.travis-ci.org/SchrodingerZhu/snmalloc-rs

@SchrodingerZhu
Copy link
Collaborator Author

@mjp41 I do not have a twitter account but it will be great if you can let more people test on these rust crates. Since my benchmark project still has a long way to go it will be useful if others who are interested do more experiments using their existed projects. I actually did very little work -- just some bindings and adjustments on compilation. The crates cannot exist without your great research.

I just copied some sentences in this repo to create the current document of the rust library. If you have checked the document and no issue has been found, I think #109 is ready to be closed.

@mjp41
Copy link
Member

mjp41 commented Jan 26, 2020

@snf and @plietar could you confirm this works for some Rust projects? Could we build a Rust project with a rustc using snmalloc?

@SchrodingerZhu I'll submit a PR to your repo with some minor documentation fixes.

Repo: https://github.com/SchrodingerZhu/snmalloc-rs
Crate: https://crates.io/crates/snmalloc-rs

@snf
Copy link

snf commented Jan 27, 2020

I checked one of my private projects with it and it worked so I guess there is nothing super wrong with it :). I left running some benchmarks in LocustDB and will report back tomorrow.

@mjp41
Copy link
Member

mjp41 commented Feb 6, 2020

All looks good, so closing this issue. Thanks @snf and @SchrodingerZhu for the hard work. Perf numbers look pretty good too.

@mjp41 mjp41 closed this as completed Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants