Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate Rng::sample and Rng::sample_iter #552

Closed
wants to merge 1 commit into from

Conversation

sicking
Copy link
Contributor

@sicking sicking commented Jul 13, 2018

In #483 we deprecated Rng::choose and Rng::choose_mut in favor of SliceRandom::choose and SliceRandom::choose_mut. The argument (presented here) was that we can think of the the rng object as a source of data. So functions (like gen_range) which generate data should live on the Rng trait. Whereas when we're operating on other data (like choose), we should stick functions on those data objects and take the Rng as a parameter.

I.e. vec.choose(rng) rather than rng.choose(vec).

I think that argument also argues for doing distribution.sample(rng) rather than rng.sample(distribution). The name sample reinforces that the distribution is the source of the data that we're sampling from.

I also think it's generally a bad idea to have two functions that do the same thing. Feels like a source of confusion where it's easy to wonder what the difference is or why one should choose one over the other.

Same goes for sample_iter.

This PR deprecates Rng::sample and Rng::sample_iter and changes all internal callers (mostly tests) to use Distribution::sample instead (there were no internal callers of Rng::sample_iter).

With this change, all functions on Rng generate new values, making the trait feel more coherent. The values are either returned (most functions) or written to a passed in buffer (fill, try_fill).

@sicking sicking force-pushed the nosample branch 2 times, most recently from f7ebf1e to 8fd4d0d Compare July 13, 2018 02:54
@sicking sicking mentioned this pull request Jul 13, 2018
28 tasks
@sicking
Copy link
Contributor Author

sicking commented Jul 13, 2018

Any ideas what the test failure is? Looks like a tool problem.

thread 'main' panicked at 'unsupported custom section: '.debug_info'', src/wasm_context.rs:646:25

@dhardy
Copy link
Member

dhardy commented Jul 13, 2018

There is one difference IIRC: rng.sample(d) takes d by value, not by reference.

I agree that these are redundant, though I'm still unsure about removing them.

@sicking
Copy link
Contributor Author

sicking commented Jul 13, 2018

Yeah, I've always found that pretty weird. Especially since you usually want to sample from a Distribution multiple times.

Fortunately, because of this, you can still pass a reference to a Distribution to Rng::sample. Though I'm not really sure why we have that setup.

@dhardy
Copy link
Member

dhardy commented Jul 13, 2018

Yeah, it's weird. It's because some distributions have no parameters, thus you don't really need an instance: rng.sample(Uniform). But it's probably unnecessary.

@sicking
Copy link
Contributor Author

sicking commented Jul 13, 2018

Ah, I see. So would we still need that if this PR lands?

@dhardy
Copy link
Member

dhardy commented Jul 14, 2018

Well, this works:

let x: f32 = Standard.sample(&mut rng);

But there's no easy way to specify the type within the function call (since neither Standard nor sample are templated over type T), so there's some advantage to Rng::sample still:

rng.sample::<f32, _>(Standard)

@sicking
Copy link
Contributor Author

sicking commented Jul 15, 2018

You can always do

Distribution::<f32>::sample(Standard, rng)

But if we think that's too verbose, we could add

pub trait DistributionAs {
    fn sample_as<T, R: Rng + ?Sized>(&self, rng: &mut R) -> T
        where Self: Distribution<T>;
}
impl<D> DistributionAs for D {
    fn sample_as<T, R: Rng + ?Sized>(&self, rng: &mut R) -> T
        where Self: Distribution<T> {
        self.sample(rng)
    }
}

which makes it possible to write

Standard.sample_as::<f32, _>(rng)

@dhardy
Copy link
Member

dhardy commented Jul 15, 2018

I don't know, now you're just adding another complication to replace the first 😄

Removing these doesn't seem like a usability win or to have much advantage to be honest.

Besides, your original argument is that Rng functions generate new values — well, rng.sample(Uniform::new(5, 70)) is still generating new values.

Any other rationales? Otherwise I'm going to close this.

@sicking
Copy link
Contributor Author

sicking commented Jul 15, 2018

I don't really see why rng.sample(Uniform::new(0, 10)) is meaningfully different from rng.choose(slice[0..10]) or rng.choose_iter(0..10).

As things stand right now, Rng is neither "do stuff related to randomness", nor "generate random values". On top of that sample and sample_iter are duplicated functions.

I agree that the disambiguation rules for the sample on Rng is different from the one on Distribution in ways that can be useful. However if someone has code like:

let x: f32 = Standard.sample(rng);
do_stuff(x);

and get compilation errors when changing to

do_stuff(Standard.sample(rng));

I don't think it's going to be obvious to that person to switch to

do_stuff(rng.sample::<f32, _>(Standard);

I.e. I don't think Rng::sample is a very good solution to the disambiguation problem in Distribution::sample.

I definitely think that the simplest solution to the problem of disambiguation is to simply rely on the standard rust features of Distribution::<f32>::sample(Standard, rng). It's a few more characters to type, but it's very discoverable and idiomatic.

The other thing to remember is that this is mainly a problem for Open01 and OpenClosed01. Standard is generally used through Rng::gen, and all other distributions only implement Distribution<T> for one T. So most developers are likely not going to run into this problem.

@vks
Copy link
Collaborator

vks commented Jul 16, 2018

I don't understand why you want to deprecate Rng::sample but keep Rng::gen_*. The latter are special cases of Rng::sample. (FWIW, I would be fine with deprecating all of them.)

@dhardy
Copy link
Member

dhardy commented Jul 16, 2018

Rng is about convenience. The fact that the only original functionality on Rng is fill and try_fill (and the only originality here is applying fill_bytes / try_fill_bytes to different types) should tell you that.

I don't think we should remove rng.gen_bool(p) because the alternative, Bernoulli::new(p).sample(&mut rng), is not very concise.

Maybe I was wrong in removing choose and shuffle; I did so because these implementations were in the way of building a nice API around extension traits, but they could be re-introduced as convenience wrappers around SliceRandom::choose etc.

Technically, yes, we could remove Rng entirely. I see Rng as a place to conveniently expose common functionality to make life easier for users, and especially to make Rand easier to learn.

With that in mind, I suppose we could un-deprecate choose/choose_mut/shuffle and keep the current wrapper implementations forever.

@vks
Copy link
Collaborator

vks commented Jul 16, 2018

Rng is about convenience. The fact that the only original functionality on Rng is fill and try_fill [...] should tell you that.

This is not really reflected by the documentation. It does not clarify that only fill and try_fill are original.

With that in mind, I suppose we could un-deprecate choose/choose_mut/shuffle and keep the current wrapper implementations forever.

Alternatively, we could move fill and friends to SliceRandom as well and declare Rng completely as a convenience wrapper. This is a bit orthogonal to what SliceRandom currently does, but the name is generic enough to allow it.

@dhardy
Copy link
Member

dhardy commented Jul 16, 2018

fill is only a thin wrapper around fill_bytes already.

@vks
Copy link
Collaborator

vks commented Jul 16, 2018

We also have to consider that Rng is the first exposure for new Rust programmers to the crates.io ecosystem in TRPL:

let secret_number = rand::thread_rng().gen_range(1, 101);

If we want to reduce the API surface, we should probably discuss (maybe even with the docs team) whether something like

let mut rng = rand::thread_rng();
let secret_number = rand::Uniform::new(1, 101).sample(&mut rng);

is newcomer-friendly enough.

@dhardy
Copy link
Member

dhardy commented Jul 16, 2018

If we want to reduce the API surface

I don't want to advocate a large and complex API, but in this case I don't think a small amount of redundancy (in the form of convenience functions wrapping other implementations) adds much complexity, and I think it does add a significant amount of convenience (look at the example just posted; even ignoring repeated symbols and local variables, it goes from 3 identifiers to 5: rand, thread_rng, Uniform, new, sample).

There's a case for teaching users the building blocks of a toolkit and a case for teaching how to accomplish basic tasks; I think good design has to consider trade-offs between the two.

@vks
Copy link
Collaborator

vks commented Jul 16, 2018

it goes from 3 identifiers to 5

Arguably gen_range just hardcodes an identifier into another, but I guess it helps with type inference.

There's a case for teaching users the building blocks of a toolkit and a case for teaching how to accomplish basic tasks; I think good design has to consider trade-offs between the two.

Fair enough, I'm only slightly worried that this is not made clear by the current documentation.

@pitdicker
Copy link
Contributor

In #483 we deprecated Rng::choose and Rng::choose_mut in favor of SliceRandom::choose and SliceRandom::choose_mut.

Oops, missed those changes. Deprecating choose, choose_mut and shuffle seems reasonable to me. They are very much algorithms on data, that need an RNG for their operations. I do think we should consider a bit more careful how much user code this may break before the release.

Like @dhardy I very much see Rng as a convenience trait. For a random number generator trait any method that 'generates' values seems appropriate to me.

  • gen,
  • fill and try_fill,
  • gen_range (and I liked the idea of gen_below, but that is another discussion),
  • sample and sample_iter,
  • gen_bool and gen_ratio (or some other name).

I think that argument also argues for doing distribution.sample(rng) rather than rng.sample(distribution). The name sample reinforces that the distribution is the source of the data that we're sampling from.

In this case I don't see the distribution as a source of data, but as a transformation. Not really sure how to put it though.

@sicking
Copy link
Contributor Author

sicking commented Jul 17, 2018

Ok, seems I'm the only one that feel these are out of place, and are sufficiently bothered by the redundancy to think that we should remove them.

@sicking sicking closed this Jul 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants