Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an owned variant of WeightedChoice. #152

Closed
wants to merge 10,000 commits into from

Conversation

mbrubeck
Copy link
Contributor

Fixes #90, fixes #142.

alexcrichton and others added 30 commits January 7, 2015 17:26
Conflicts:
	src/libcollections/vec.rs
	src/libcore/fmt/mod.rs
	src/librustc/lint/builtin.rs
	src/librustc/session/config.rs
	src/librustc_trans/trans/base.rs
	src/librustc_trans/trans/context.rs
	src/librustc_trans/trans/type_.rs
	src/librustc_typeck/check/_match.rs
	src/librustdoc/html/format.rs
	src/libsyntax/std_inject.rs
	src/libsyntax/util/interner.rs
	src/test/compile-fail/mut-pattern-mismatched.rs
This gets rid of the 'experimental' level, removes the non-staged_api
case (i.e. stability levels for out-of-tree crates), and lets the
staged_api attributes use 'unstable' and 'deprecated' lints.

This makes the transition period to the full feature staging design
a bit nicer.
This gets rid of the 'experimental' level, removes the non-staged_api
case (i.e. stability levels for out-of-tree crates), and lets the
staged_api attributes use 'unstable' and 'deprecated' lints.

This makes the transition period to the full feature staging design
a bit nicer.
This adds the int_uint feature to *every* library, whether or not it
needs it.
Conflicts:
	src/test/compile-fail/borrowck-move-out-of-overloaded-auto-deref.rs
	src/test/compile-fail/issue-2590.rs
	src/test/compile-fail/lint-stability.rs
	src/test/compile-fail/slice-mut-2.rs
	src/test/compile-fail/std-uncopyable-atomics.rs
Lets them build with the -dev, -nightly, or snapshot compiler
* `core` - for the core crate
* `hash` - hashing
* `io` - io
* `path` - path
* `alloc` - alloc crate
* `rand` - rand crate
* `collections` - collections crate
* `std_misc` - other parts of std
* `test` - test crate
* `rustc_private` - everything else
* `core` - for the core crate
* `hash` - hashing
* `io` - io
* `path` - path
* `alloc` - alloc crate
* `rand` - rand crate
* `collections` - collections crate
* `std_misc` - other parts of std
* `test` - test crate
* `rustc_private` - everything else
Conflicts:
	mk/tests.mk
	src/liballoc/arc.rs
	src/liballoc/boxed.rs
	src/liballoc/rc.rs
	src/libcollections/bit.rs
	src/libcollections/btree/map.rs
	src/libcollections/btree/set.rs
	src/libcollections/dlist.rs
	src/libcollections/ring_buf.rs
	src/libcollections/slice.rs
	src/libcollections/str.rs
	src/libcollections/string.rs
	src/libcollections/vec.rs
	src/libcollections/vec_map.rs
	src/libcore/any.rs
	src/libcore/array.rs
	src/libcore/borrow.rs
	src/libcore/error.rs
	src/libcore/fmt/mod.rs
	src/libcore/iter.rs
	src/libcore/marker.rs
	src/libcore/ops.rs
	src/libcore/result.rs
	src/libcore/slice.rs
	src/libcore/str/mod.rs
	src/libregex/lib.rs
	src/libregex/re.rs
	src/librustc/lint/builtin.rs
	src/libstd/collections/hash/map.rs
	src/libstd/collections/hash/set.rs
	src/libstd/sync/mpsc/mod.rs
	src/libstd/sync/mutex.rs
	src/libstd/sync/poison.rs
	src/libstd/sync/rwlock.rs
	src/libsyntax/feature_gate.rs
	src/libsyntax/test.rs
Conflicts:
	mk/tests.mk
	src/liballoc/arc.rs
	src/liballoc/boxed.rs
	src/liballoc/rc.rs
	src/libcollections/bit.rs
	src/libcollections/btree/map.rs
	src/libcollections/btree/set.rs
	src/libcollections/dlist.rs
	src/libcollections/ring_buf.rs
	src/libcollections/slice.rs
	src/libcollections/str.rs
	src/libcollections/string.rs
	src/libcollections/vec.rs
	src/libcollections/vec_map.rs
	src/libcore/any.rs
	src/libcore/array.rs
	src/libcore/borrow.rs
	src/libcore/error.rs
	src/libcore/fmt/mod.rs
	src/libcore/iter.rs
	src/libcore/marker.rs
	src/libcore/ops.rs
	src/libcore/result.rs
	src/libcore/slice.rs
	src/libcore/str/mod.rs
	src/libregex/lib.rs
	src/libregex/re.rs
	src/librustc/lint/builtin.rs
	src/libstd/collections/hash/map.rs
	src/libstd/collections/hash/set.rs
	src/libstd/sync/mpsc/mod.rs
	src/libstd/sync/mutex.rs
	src/libstd/sync/poison.rs
	src/libstd/sync/rwlock.rs
	src/libsyntax/feature_gate.rs
	src/libsyntax/test.rs
@mbrubeck
Copy link
Contributor Author

There is an alternative fix in #151.

@alexcrichton alexcrichton added the F-new-int Functionality: new, within Rand label Jun 14, 2017
dhardy added a commit to dhardy/rand that referenced this pull request Jul 29, 2017
The change to WeightedChoice is debatable, but since the argument must normally
be constructed specially anyway, the reference model makes little sense.
See issues rust-random#90, rust-random#142 and PR rust-random#152.
@dhardy
Copy link
Member

dhardy commented Feb 1, 2018

There are a couple of other alternatives: make the current WeightedChoice owning, or remove it altogether. Any thoughts on this?

I've written code to do similar jobs a few times before, using floating-point numbers and iterative search or std::map::lower_bound. The binary search algorithm here is neat. I note though that even the owning variant still clones the result.

Recommendation: store the items and weights in separate arrays — or even leave storage of the items to users and only store weights. This means the Weighted type can be dropped and cloning won't be required (at the cost of an extra array bounds check, but that is trivial compared to the algorithm cost).

See also this issue: dhardy#82

@pitdicker
Copy link
Contributor

Sorry this PR got so suddenly closed. The cause was that we cleaned up the git history (#350) and forced-pushed it to master. Now there was a difference of about 30.000, and I suppose GitHub just gave up...

This PR, together with two others, stayed open for so long with little attention because it touches on an area of Rand that needs some design first. That is now slowly getting explored in dhardy#82.

@usamec
Copy link

usamec commented Apr 13, 2018

Can we have a slightly higher priority on this?
Current version of WeightedChoice cannot be used as a member of struct alongside the weights (which makes it sort of useless) and also calling it twice on same weights makes it to sample from different distribution (which is quite a nasty footgun).
I understand that recreating it from scratch is matter of minutes, but it is quite annoying to find it and then after hours of googling finding out, that it is useless.

@pitdicker
Copy link
Contributor

it is quite annoying to find it and then after hours of googling finding out, that it is useless.

@usamec I can believe it took quite some Googling to find this PR 😄.

As you maybe know development in this repro only started up again a couple of months ago, ane we have been hard at work getting the foundation right. I hoop very soon we will see the 0.5 release with the changes.

WeightedChoice, but actually anything that has to do with sampling from a collection, can use some more love. I have started working on the area, and dhardy#82 sort of tracks the progress. But I don't expected things to be done wihtin two months...

Two comments about weighted sampling: dhardy#82 (comment) and dhardy#82 (comment). Do you need weighted sampling with replacement, or without?

For weighted sampling with replacement I still need to organise my notes and make a post. The current implementation of WeightedChoice is a mostly useless melon. The owned variant in this PR makes it somewhat usable for some applications. What would still not be possible example (or very hard), it to add/remove elements, or modify weights. So there are several other algorithms and possible APIs to consider.

For now I'm afraid it is best write/copy an implementation from scratch (or help us out?).
But I'd really like to know how you wanted to use it. How does your data structure look? Do you need sampling with replacement or without? Would you need to add/remove elements? Would you need to modify weights?

@usamec
Copy link

usamec commented Apr 18, 2018

@pitdicker Thanks for reply.
I just need a simple sampling with replacement. Or in other ability to sample just one element from distribution many times.
The blocker is that WeightedChoice needs to be part of another structure, which brings up self-referential struct ownership issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-new-int Functionality: new, within Rand P-low Priority: Low
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WeightedChoice is hard to use with borrow checker Consider adding an owned version of WeightedChoice