Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large structs constructed on stack #817

Closed
mark-i-m opened this issue Jun 6, 2019 · 25 comments
Closed

Large structs constructed on stack #817

mark-i-m opened this issue Jun 6, 2019 · 25 comments

Comments

@mark-i-m
Copy link

mark-i-m commented Jun 6, 2019

I am trying to use StdRng in a no_std context where the stack is rather small. I found that the stack is overflowed when the RNG is constructed.

Basically, rand_core::SeedableRng::seed_from_u64 calls <rand::rngs::std::StdRng as rand_core::SeedableRng>::from_seed, which calls <rand_hc::hc128::Hc128Rng as rand_core::SeedableRng>::from_seed, which calls BlockRng::<Hc128Core>::from_seed ... etc

And each of these allocates a huge amount of space on the stack for the return value, which leads to a stack overflow. I imagine it is also quite slow on a normal platform too, since a lot of values will be copied on the stack.

TBH, I'm not really sure what the solution is. It seems like allocating the RNG on the heap is the right way to avoid all of this copying on the stack. On the other, perhaps marking some of the constructors #[inline(always)] would solve the problem?

@vks
Copy link
Collaborator

vks commented Jun 6, 2019

This will hopefully get better once we release Rand 0.7, which uses ChaCha20 for StdRng, requiring much less memory than HC128 (see the Rand book). For now, I have the following suggestions:

  1. Try using rand_chacha::ChaChaRng.
  2. Try using Box<StdRng>. However, I'm not sure whether this avoids the stack allocation.
  3. Consider using a RNG that is not cryptographically secure but has a tiny state, see rand::SmallRng, rand_xoshiro and rand_pcg.

@burdges
Copy link
Contributor

burdges commented Jun 6, 2019

We should add some #[inline(always)] lines or similar, probably in BlockRng, but..

Reusing stack space tightly sounds vital CPU cache performance. Yet, Rust code often returns large structures on the stack. We've RFCs for returning DSTs and VLAs that encourage this. We'll gain const generics this year, which should make methods like fn foo<const u: usize>(...) -> [T; n] common place.

I'd therefore consider this an ABI bug or code generation bug in rustc. I'd expect the caller to pass the called method a pointer to memory where it may construct the returned type, but even if it simply returned the value on the stack, then you'd double, or quadruple in this case, you cache efficiency by doing an overlaping copy instead of the non-overlaping copy they obviously do.

Can you create an issue on https://github.com/rust-lang/rust/ ?

@dhardy
Copy link
Member

dhardy commented Jun 6, 2019

We should add some #[inline(always)] lines or similar, probably in BlockRng, but..

Agreed; I'll add to #815.

Without testing the full application, I can only guess where these will be most effective however.

burdges added a commit to w3f/schnorrkel that referenced this issue Jun 6, 2019
We declare methods `#[inline(always)]` if they return a 200 byte
Transcript because rustc does not handle large returns as efficently
as one might like.  See rust-random/rand#817
dhardy added a commit to dhardy/rand that referenced this issue Jun 6, 2019
@burdges
Copy link
Contributor

burdges commented Jun 6, 2019

OT: Is there any "hot spot" analyzer for rust code? Or LLVM? You'd put tests into it and it identifies better inlining or whatever?

@dhardy
Copy link
Member

dhardy commented Jun 6, 2019

Don't know. @burdges would you take a quick look at the last commit in #815? After I get a review I'll make the release (yes, it's largely guess-work at this point).

@burdges
Copy link
Contributor

burdges commented Jun 6, 2019

There are no BlockRngCore for .. appears in that diff, so maybe they were already marked #[inline(...)]? It's those that really need inlining I'd think, unless they're big maybe. And some inherent methods on BlockRng*. Is their already inlined then rustc ignore the directives. I've heard of that happening.

I've no idea about inlining methods from impl<R: BlockRngCore<Item=u32>> RngCore for BlockRng<R> myself.

@dhardy
Copy link
Member

dhardy commented Jun 6, 2019

The BlockRngCore implementations tend to be the big ones; I inlined everything else (some with always and some without). Since all of these are pure or nearly-pure wrappers I think this is okay.

At any rate, benchmarks are either a little better or similar, so I think this is going in the right direction.

@mark-i-m
Copy link
Author

mark-i-m commented Jun 6, 2019

Thanks all! That was a swift response!

I agree that the best solution would be some sort of language support (e.g. move elision/optimization or emplacement construction), but that doesn't seem to be a near-term thing, and I think it is already a recognized problem (e.g. there have been emplacement RFCs, and this, and others).

@vks Thanks for the suggestions

Try using rand_chacha::ChaChaRng

I will give that a try. I will also try after the release and see...

Try using Box. However, I'm not sure whether this avoids the stack allocation.

This doesn't solve the problem because the struct is created on the stack first, and then copied to the heap.

Consider using a RNG that is not cryptographically secure but has a tiny state, see rand::SmallRng, rand_xoshiro and rand_pcg

Unfortunately, I need a crypto RNG :/

@vks
Copy link
Collaborator

vks commented Jun 6, 2019

Unfortunately, I need a crypto RNG :/

ChaCha only needs 136 bytes, so it hopefully works for you. We could probably use the Gimli permutation as a RNG, which only has 48 bytes of state, but this is not yet implemented.

@burdges
Copy link
Contributor

burdges commented Jun 6, 2019

It's an interesting point that box Something::new() with Something large enough should be optimized to allocate before invoking new, but then box Something::new()? gets tricky, so maybe new(..) -> Something should be optimized to new<F>(.., f: F) -> Result<Something,..> where F: unsafe FnOnce(..) -> *mut Something.. tricky. And those DST discussion you link would basically turn fns into Generators.. very tricky.

@dhardy
Copy link
Member

dhardy commented Jun 6, 2019

Good point @burdges but somewhat off-topic. #[inline] on constructors should help work around this.

I'll make those releases...

@eddyb
Copy link
Contributor

eddyb commented Jun 10, 2019

I'd therefore consider this an ABI bug or code generation bug in rustc. I'd expect the caller to pass the called method a pointer to memory where it may construct the returned type,

This is exactly what it happens, but for soundness reasons, a copy is performed in the body of the function.
You need an optimization (which we call NRVO) to remove the copy in a sound manner.

(oops, I forgot to hit send 4 days ago...)

@mark-i-m
Copy link
Author

@eddyb are there any plans to implement NRVO in rustc? would that be a breaking change? or alternately, do you think an explicit syntactic method of invoking NRVO would be better (e.g. some attribute)?

@nagisa
Copy link
Contributor

nagisa commented Jun 10, 2019

Implementation/enablement of optimisations is, by the definition of an optimisation, not a breaking change.

@eddyb
Copy link
Contributor

eddyb commented Jun 10, 2019

@mark-i-m Yes, see rust-lang/rust#47954 (which I first tried to get into Rust last year), it's mostly a matter of adding better debuginfo support (rust-lang/rust#56231, which I was working earlier this year before a few other things came up) before we can start doing that sort of optimizations (and IMO the need for debuginfo support would also apply if it were more manual).

And no, it wouldn't be a breaking change, as @nagisa pointed out. LLVM already can technically do it sometimes, it's just not as aggressive as we'd need it to be for Rust.

@dhardy
Copy link
Member

dhardy commented Jun 12, 2019

@mark-i-m please test the new 0.7.0-pre.1 release. Hopefully the in-lining works around the issue sufficiently well.

I think then we can close this?

@mark-i-m
Copy link
Author

@dhardy Thanks! Unfortunately, I'm not able to build 😕

It looks like the most recent build doesn't pass a no_std feature for ChaCha? Or am I missing a feature? I am getting the following:

$ make runtext
make -C kernel
make[1]: Entering directory '/nobackup/os2/kernel'
gcc -m64 -O0 -g -c -o start.o start.S
gcc -m64 -O0 -g -c -o mbr.o mbr.S
cargo xbuild  -Z unstable-options --out-dir `pwd`  --target x86_64-unknown-elf.json
    Updating crates.io index
   Compiling core v0.0.0 (/nobackup/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore)
   Compiling compiler_builtins v0.1.16
   Compiling rustc-std-workspace-core v1.0.0 (/nobackup/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/tools/rustc-std-workspace-core)
   Compiling alloc v0.0.0 (/tmp/xargo.0gpCipQ7XgVt)
    Finished release [optimized] target(s) in 21.57s
   Compiling semver-parser v0.7.0
   Compiling cc v1.0.35
   Compiling autocfg v0.1.2
   Compiling ppv-lite86 v0.2.5
   Compiling lazy_static v1.3.0
error[E0463]: can't find crate for `std`
 --> /nobackup/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/lazy_static-1.3.0/src/inline_lazy.rs:9:1
  |
9 | extern crate std;
  | ^^^^^^^^^^^^^^^^^ can't find crate
  |
  = note: the `x86_64-unknown-elf-1121863173107299811` target may not be installed

error: aborting due to previous error

For more information about this error, try `rustc --explain E0463`.
error: Could not compile `lazy_static`.
warning: build failed, waiting for other jobs to finish...
error: build failed
Makefile:44: recipe for target 'libkernel.a' failed
make[1]: *** [libkernel.a] Error 101
make[1]: Leaving directory '/nobackup/os2/kernel'
Makefile:8: recipe for target 'kernel' failed
make: *** [kernel] Error 2

and my Cargo.toml:

[package]
name = "kernel"
version = "0.1.0"
authors = ["mark"]

[lib]
name = "kernel"
path = "lib.rs"
crate-type = ["staticlib"]

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"

[dependencies]
rlibc = "1.0.0"
spin = "0.4.8"
smallheap = { git = "https://github.com/mark-i-m/smallheap", features = ["no_std"] }
buddy = { git = "https://github.com/mark-i-m/buddy" }
x86_64 = "0.5.2"
os_bootinfo = "0.2.1"
rand = { version = "0.7.0-pre.1", default-features = false, features = ["alloc"] }

@dhardy
Copy link
Member

dhardy commented Jun 16, 2019

You're right, we should depend on rand_chacha with default-features = false, and opt-in to the std when used.

@kazcw how should we handle the simd feature? We do have a simd_support feature in Rand, but this is nightly-only. We could add a new simd feature for stable SIMD support, or is it safe just to enable simd for c2-chacha in all cases?

@kazcw
Copy link
Contributor

kazcw commented Jun 16, 2019

@dhardy The option to disable simd is sort of a historical accident. It's part of the interface now and I use it for testing, but I don't expect users to want to toggle it.

@vks Chacha can be as light as 138 bytes, but a simd implementation can do several rounds at a time before making full use of available ILP even on a sse2 machine. In the mode rand_chacha uses, c2-chacha fills a 4-round buffer so it has around 312 bytes of state.

@dhardy
Copy link
Member

dhardy commented Jun 16, 2019

Unfortunately I think forwarding the std feature to rand_chacha will interfere with the fallback on Emscripten.

Do we want to use the new rand_chacha code without std? Would it be better to keep rand_hc in this case?

Also, should we drop the simd flag from rand_chacha and just always enable simd on c2-chacha @kazcw?

dhardy added a commit to dhardy/rand that referenced this issue Jul 27, 2019
@dhardy
Copy link
Member

dhardy commented Sep 13, 2019

@mark-i-m would you like to re-test with rand 0.7.1? I believe this should have solved the no_std issues.

@mark-i-m
Copy link
Author

mark-i-m commented Sep 14, 2019

Hmm... I am getting the following with 0.7.1

$ make
cargo xbuild  -Z unstable-options --out-dir `pwd`  --target x86_64-os2.json
    Updating crates.io index
   Compiling ppv-lite86 v0.2.5
   Compiling rand_core v0.5.1
   Compiling c2-chacha v0.2.2
   Compiling rand_chacha v0.2.1
   Compiling rand v0.7.1
   Compiling kernel v0.1.0 (/home/mark/nobackup/os2/kernel)
warning: method is never used: `free`
   --> memory/paging/mod.rs:115:9
    |
115 |         pub fn free(&mut self, val: usize, n: usize) {
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: `#[warn(dead_code)]` on by default

LLVM ERROR: Do not know how to split the result of this operator!

I've never seen this error before. Some googling suggests that this is related to the target. My target file is:

$ cat x86_64-os2.json 
{
  "arch": "x86_64",
  "cpu": "x86-64",
  "data-layout": "e-m:e-i64:64-f80:128-n8:16:32:64-S128",
  "dynamic-linking": false,
  "target-env": "gnu",
  "linker-flavor": "ld",
  "linker-is-gnu": true,
  "llvm-target": "x86_64-unknown-none",
  "os": "none",
  "position-independent-executables": true,
  "target-endian": "little",
  "target-pointer-width": "64",
  "target-c-int-width": "64",
  "no-compiler-rt": true,
  "archive-format": "gnu",
  "pre-link-args": {
      "gcc": [
          "-Wl,--nmagic",
          "-nostartfiles"
      ]
  },
  "disable-redzone": true,
  "eliminate-frame-pointer": false,
  "features": "-mmx,-sse,+soft-float"
}

EDIT: to clarify, I'm referring to the LLVM ERROR part. The unused function is on me.

@dhardy
Copy link
Member

dhardy commented Sep 27, 2019

Is this issue still about large structs on the stack or is it about debugging your build? I tried reproducing but got the following, similar to your previous error.

error[E0463]: can't find crate for `std`
 --> /home/dhardy/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/lazy_static-1.4.0/src/inline_lazy.rs:9:1
  |
9 | extern crate std;
  | ^^^^^^^^^^^^^^^^^ can't find crate
  |
  = note: the `x86_64-os2-13976306759915653580` target may not be installed

However, I don't believe this has anything to do with the rand project so will now close this. I guess we can re-open if you can reproduce your original issue using the latest rand version.

@dhardy dhardy closed this as completed Sep 27, 2019
@mark-i-m
Copy link
Author

@dhardy yeah sorry about that. I suspect that it is probably fixed but i don't have a good way to check. Thanks for your help!

@burdges
Copy link
Contributor

burdges commented Sep 27, 2019

It's an open rustc issue since 2014, not fixed yet. According to rust-lang/rust#32966 you should understand and try to improve upon rust-lang/rust#47954 if you really want this sooner rather than later. Yes, this issue should be closed in favor of those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants