Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributions: Uniform, Uniform01, Open01, etc. #81

Closed
dhardy opened this issue Jan 5, 2018 · 10 comments
Closed

Distributions: Uniform, Uniform01, Open01, etc. #81

dhardy opened this issue Jan 5, 2018 · 10 comments
Labels

Comments

@dhardy
Copy link
Owner

dhardy commented Jan 5, 2018

Update: we will likely use these distributions:

  • Uniform for the default distribution, replacing Rand (including ints, floats with range [0, 1), etc.)
  • extra FP distributions per this strategy

Original post follows.


I'm looking at bringing over some of the changes to distributions code to upstream, in particular replacing Sample and IndependentSample with Distribution, but the change-set touches on a few other things (adjustments to things like Exp1 are needed to support dynamic dispatch (Rng+?Sized) in Distribution::sample, since Rand doesn't support this and changing that would break all Rand implementations everywhere).

So, question: is everyone happy with the distribution names we converged on in this branch, that is

  • Uniform for integers,
  • Uniform01 and Open01 etc. for floats,
  • and Default for "a sensible distribution for any type"?
    This is of course redundant; we could drop Uniform and Uniform01 in favour of using Default directly. Possibly there is a better name than Default (but please not Rand)!

Wacky ideas:

  • UniformFull or FullRange
  • Uniform01HalfOpen01

Of course we don't need any changes. Just asking since it didn't always appear people were happy with these names.

@dhardy dhardy added the question label Jan 5, 2018
@pitdicker
Copy link

I'm looking at bringing over some of the changes to distributions code to upstream

🎉. Also, thanks for opening this issue for a fresh look and discussion. Anyone we should cc?

In rust-lang/rfcs#2106 (comment) I suggested that Basic may be a good alternative to Default, so it does not clash with std::default::Default.

Dropping Uniform in favor of Default (or Basic) seems like a nice simplification for integers. FullRange is a nice find as alternative, it fits with the Range distribution. But I don't think code using this is thinking in terms of ranges, and it may feel less natural there.

For the uniform distributions of floats we now have:

  • Uniform01: [0, 1)
  • Open01: (0, 1)
  • Closed01: [0, 1]

I don't think the names are perfect. Uniform01 and the alternative HalfOpen01 both have disadvantages. How do you know which side is open? [0, 1) vs (0, 1].

For the Open01 distribution I can see real uses. In some situations you never want to generate 0.0, for example when the value is to be used for division or log. It uses simple rejection sampling on top of Uniform01.

Personally I am not sure if we need both the Uniform01 and Closed01 distribution. We can guarantee that every value has an uniform chance of occurring. But the chance of generating 1.0 is one in 2^53 for f64 (and 1 in 2^22 for f32). Does that really merit two different distributions?

There is an other not-all-that-meaningful difference between Uniform01 and Closed01: all numbers that straddle an exponent boundary have a tiny chance of occurring to often. 0.5 will occur every 1 in 2^23 results too often, 0.25 every 1 in 2^24, 0.125 every 2^25 (for f32, and I wrote this quickly, exponents could be off by 1). The problem is that there is just no better distribution possible for a closed-open range. Closed01 has a distribution as perfect as technically possible, but it also is slightly slower.

I think renaming Uniform01 to Default (or Basic) is a good solution, leaving only Open01 and Closed01 as slower but sometimes useful alternatives for floats.

@pitdicker
Copy link

(adjustments to things like Exp1 are needed to support dynamic dispatch (Rng+?Sized)

Removing the Sized constraint is a desired change, but you probably want to wait with it so all the changes that effect RNG's are bundled in one release?

I don't really understand the consequences around dynamic dispatch. I should just try it and see what brakes, but would you be willing to explain it?

@pitdicker
Copy link

pitdicker commented Jan 5, 2018

In #72 (comment) I made a list of changes / extra tests we may want to add to the distributions, besides something like an external test suite. There is not really anything in there that should happen before the merge into upstream though.

Something I would really like to see is to have the floating point range implementation be exact, and not sometimes crossing the boundaries. Now it is a bit strange that we can say: the precision of the results of Range has improved much, but it is still imprecise around the boundaries. I still believe it must be simple, but haven't found the trick yet.

@dhardy
Copy link
Owner Author

dhardy commented Jan 5, 2018

Dynamic dispatch: lots of OO languages use a vtable at the beginning of an object, for example an Java object of theHashMap class is in fact a fat object with a pointer to the HashMap virtual table as well as the internal map fields. That way if you cast the object to an AbstractMap then call size(), the run-time finds HashMap::size. Rust works a bit differently. There isn't a Map trait because it requires Higher Kinded Types, but for example String implements Write. This String object does not have a vtable, nor does a &mut String reference to it, but if you want to cast that to &mut Write, the CPU needs to know how to access the write_str function implementation for String. The way that works is that a &mut String is a fat pointer, containing one pointer to the vtable and another to the String object. So what's ?Sized got to do with it? &mut Write is by default Sized, implying the compiler knows the size of the pointed object, which is only possible if the compiler knows (at compile time) the type of the underlying object. If it doesn't, you need &mut Write + ?Sized.

@dhardy
Copy link
Owner Author

dhardy commented Jan 5, 2018

Removing the Sized constraint is a desired change, but you probably want to wait with it so all the changes that effect RNG's are bundled in one release?

The last few commits re-implement some types as distributions to let me switch them. This still needs Uniform01 or equivalent however.

Why do it now? Changing Distribution::sample from sample<R: Rng> to sample<R: Rng + ?Sized> is a breaking change: all implementations must be updated. So I'd rather get it right first time.

@dhardy
Copy link
Owner Author

dhardy commented Jan 5, 2018

In #72 (comment) I made a list of changes ...

I'm not looking at changing the algorithms now, just the infrastructure. I might let you make the algorithms PRs ;-)

Hmm, rand::distributions::Default or rand::distributions::Basic... I still prefer the former, but not certain. Another option might be Standard, but that name is probably even more confusing than Default; obviously Normal is inappropriate (thanks Gauss).

Removing Uniform and Uniform01 may make sense, though they don't cost us much (aside from confusing users).

@dhardy
Copy link
Owner Author

dhardy commented Jan 5, 2018

@pitdicker maybe I can annoy you even more by putting implementations in src/distributions/float.rs and src/distributions/integer.rs. Then I think we also do not need src/utils.rs. Okay?

At least, the compiler doesn't complain, though my text editor highlights both Default and float incorrectly. Not important; the module names will probably never be exposed to users anyway.

@pitdicker
Copy link

Thanks for the explanation around ?Sized. I mostly get the idea behind how it works, but have some trouble yet so see all the consequences for the API... Give me some time and otherwise I will trust you on it.

I might let you make the algorithms PRs ;-)

Ok, it is high time I start working on upstream 😄.

maybe I can annoy you even more by putting implementations in src/distributions/float.rs and src/distributions/integer.rs. Then I think we also do not need src/utils.rs. Okay?

Just two weeks ago I was thinking something like that would be nice, but more with the range code in mind. Could the RangeImp parts also be part of the new float.rs and integer.rs files? And what is left be moved to src/distributions/mod.rs?

@dhardy
Copy link
Owner Author

dhardy commented Jan 6, 2018

I hadn't thought about moving the range code, but I guess that could work, if there's not too much to it.

The distributions top-level module currently has Distribution, some Rand bits, some "weighted" distribution code I don't know what to do with, and the ziggurat function. Maybe we should try moving the last two things out if we're adding Default/Basic and Range.

@dhardy
Copy link
Owner Author

dhardy commented Aug 23, 2018

I believe we settled this, though we now use Standard as the "default" distribution.

@dhardy dhardy closed this as completed Aug 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants