Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

privacy: Rewrite VisiblePrivateTypesVisitor #29973

Merged
merged 12 commits into from
Dec 18, 2015
Merged

Conversation

petrochenkov
Copy link
Contributor

Some notes:
This patch enforces the rules from RFC 136 and makes "private in public" a module-level concept and not crate-level. Only pub annotations are used by the new algorithm, crate-level exported node set produced by EmbargoVisitor is not used. The error messages are tweaked accordingly and don't use the word "exported" to avoid confusing people (#29668).

The old algorithm tried to be extra smart with impls, but it mostly led to unpredictable behavior and bugs like #28325.
The new algorithm tries to be as simple as possible - an impl is considered public iff its type is public and its trait is public (if presents).
A type or trait is considered public if all its components are public, complications with private types leaking to other crates/modules through trait impls and type inference are deliberately ignored so far.

The new algorithm is not recursive and uses the nice new facility Crate::visit_all_items!

Obsolete pre-1.0 feature visible_private_types is removed.

This is a [breaking-change].
The two main vectors of breakage are type aliases (#28450) and impls (#28325).
I need some statistics from a crater run (cc @alexcrichton) to decide on the breakage mitigation strategy.
UPDATE: All the new errors are reported as warnings controlled by a lint private_in_public and lint group future_incompatible, but the intent is to make them hard errors eventually.

Closes #28325
Closes #28450
Closes #29524
Closes #29627
Closes #29668
Closes #30055

r? @nikomatsakis

@petrochenkov
Copy link
Contributor Author

Oh, and one interesting corner case is not covered - public reexport of a private enum's variant:

pub use E::V;

enum E {
    V
}

but it should be done in resolve, not in privacy.

Edit: Fixed in 7b0f1d6

@petrochenkov
Copy link
Contributor Author

There are two ways to reduce the breakage:

  1. Turn some errors into warnings.
  2. "Normalize" types before privacy checking by substituting all type aliases (both free and associated if possible). So
pub struct Pub;
type Priv = Pub;
pub fn f(arg: Priv) {}

will be permitted. I like this variant, it makes type very similar to use, but I don't know how to do the normalization for more complex cases like

struct Priv;
type Alias<T = Priv> = T;
pub fn f(arg: Alias) {}

Edit: as a minimum, type aliases should be substituted when determining publicity of Tr and T in impl Tr for T {...}

@bors
Copy link
Contributor

bors commented Nov 23, 2015

☔ The latest upstream changes (presumably #29530) made this pull request unmergeable. Please resolve the merge conflicts.

@petrochenkov
Copy link
Contributor Author

Rebased.

@nikomatsakis
Copy link
Contributor

Exciting! I was on PTO since Thu, I'll try to read this today or tomorrow.

@nikomatsakis
Copy link
Contributor

(I am scheduling a crater run now.)

@nikomatsakis
Copy link
Contributor

Preliminary crater report: https://gist.github.com/nikomatsakis/e7feaa7c4bc0ebcc6559

I've not dug into this at all.

@nikomatsakis
Copy link
Contributor

Note that the root regression division may mean that this change is underestimating the amount of breakage.

@petrochenkov
Copy link
Contributor Author

TL;DR: People don't write pub unless they have to.

quickcheck-0.2.24 - Private trait in where clause (out of date)
nix-0.4.1 - Private type alias in foreign function, solvable by simple normalization1
xml-rs-0.2.2 - Private enum in inherent impl method, Private type alias in inherent impl method, solvable by simple normalization
encoding-0.2.32 - Private trait in param bounds in trait and inherent impls
cgmath-0.5.0 - Private trait in trait bound
protobuf-1.0.4 - Private struct in method of public trait impl
twox-hash-0.1.1 - Private struct in trait method
dbus-0.2.1 - Private trait in param bound of inherent impl
libgit2-sys-0.3.8 - Private type alias in type alias, solvable by simple normalization
rust-gmp-0.2.9 - Private struct in type alias
simple_parallel-0.3.0 - Private struct in param bound in inherent impl method
civet-0.8.1 - Private struct in inherent impl method
sxd-document-0.1.2 - Private struct in trait method
mach-0.0.4 - Private struct in type alias
fann-sys-0.1.6 - Private type alias in type alias, solvable by simple normalization
fd-0.2.0 - Private type alias in foreign function, solvable by simple normalization
peano-1.0.1 - Private trait in param bound and associated type
locale-0.1.8 - Reexported private variant
geom-0.2.1 - Private trait in param bound in inherent impl
handlebars-markdown-helper-0.1.3 - not a root regression (aho-corasick-0.3.4 is to blame)
hashmap2-0.2.0 - Private struct in inherent impl method
hipack-0.1.0 - Private enum in type alias
conduit-conditional-get-0.7.2 - Private type alias in trait impl method, solvable by simple normalization
libxdo-0.4.0 - Private struct in type alias
marid-0.0.2 - Private struct in associated type
maxminddb-0.6.0 - Private type alias in trait impl method, solvable by not so simple normalization
mcpat-0.8.3 - Private struct in type alias
mpd-0.0.7 - Private struct in inherent impl method
nemo-0.2.1 - Private trait in param bound in trait impl
plumbum-0.0.8 - Private enum in type alias (out of date)
postgis-0.1.6 - Private trait in param bound in inherent impl
rand-distributions-0.1.2 - Private trait in param bound in trait impl
rodio-0.3.1 - Private struct in variant field
rules-0.0.2 - Private enum in type alias
ecdh-0.0.12 - Private type alias in foreign function, solvable by simple normalization
endianrw-0.2.1 - Private trait in associated type bound
shuffled-iter-0.1.0 - Private struct in associated type
figtree-0.2.2 - Private type alias in inherent impl method, solvable by simple normalization
twiddle-0.1.0 - Private trait in param bound in trait impl
flexmesh-0.1.1 - Private type in type alias
zdd-0.1.0 - Private trait in param bound in trait

Top non-root regressions are due to aho-corasick-0.3.4 (it has private type alias in trait methods, solvable by simple normalization) or due to xml-rs-0.2.2

1 By "simple normalization" I mean substitution of type alias Priv in

pub struct Pub;
type Priv = Pub; // Simple type alias without type parameters
pub fn f(arg: Priv) {} // OK, no error

before checking.

@petrochenkov
Copy link
Contributor Author

Proposed way forward:

  1. Keep the old VisiblePrivateTypesVisitor in frozen state.
  2. Build an "old error set" with it.
  3. Then run the new visitor, report only errors presenting in the old error set, otherwise emit a warning.
  4. Implement simple type alias normalization, because it's very reasonable thing to do and it also reduces the number of warnings noticeably. Implement full normalization in the uncertain future.
  5. Remove the old VisiblePrivateTypesVisitor somewhere in the next summer or so (or even in 2.0)

@petrochenkov
Copy link
Contributor Author

Another more extreme variants:

  • Make it a warn by default lint, not a hard error. The restriction enforced by this pass is pretty artificial anyway. Nothing will break if this pass is removed completely and private things will stay private due to earlier privacy checking pass.
  • Implement the first edition of RFC 136, it gives more useful and less artificial guarantees - everything exported by a crate is nameable. I would also expect less breakage from it.

@nikomatsakis
Copy link
Contributor

@petrochenkov

Proposed way forward:

I was going to propose mostly the same thing. But some quibbles:

  1. Then run the new visitor, report only errors presenting in the old error set, otherwise emit a warning.

It would be better to emit a lint, I think. I've been wanting to propose that we add a "future-compatibility" lint for this purpose. The reasons to prefer a lint are:

  • it works well with cargo's cap-lints feature;
  • it works well with #[deny(warnings)], to help ensure that people do not ignore this error on the crate they are currently building (presuming we add that lint into the warning set);
  • even if we do not add the lint into the warning set, people CAN set #[deny(future_compat)]

However, I wanted to try and advertise the idea of this lint a bit more broadly for some reason, but maybe we can just add it here and then tweak the design. (We have a bit of time before this change hits stable.)

  1. Implement simple type alias normalization, because it's very reasonable thing to do and it also reduces the number of warnings noticeably. Implement full normalization in the uncertain future.

Sorry, what is "simple type alias normalization"?

@petrochenkov
Copy link
Contributor Author

@nikomatsakis

Sorry, what is "simple type alias normalization"?

Footnote 1 in the previous message #29973 (comment)
There must be some way in type checker to do it properly, but I'm not familiar enough with type checker. The simple variant covers all cases encountered in the crater report.

It would be better to emit a lint, I think.

Yeah, I should have added an unresolved question "what kind of warning exactly?" to my proposed way forward. #[deny(warnings)] creates as many problems as it solves, maybe it shouldn't turn this specific warning into an error? It would break too many crates (but at least not their dependencies).

@nikomatsakis
Copy link
Contributor

@petrochenkov

#[deny(warnings)] creates as many problems as it solves, maybe it shouldn't turn this specific warning into an error? It would break too many crates (but at least not their dependencies).

I'm not sure, but I think you can view this as a feature or a bug. That is, we have had regressions in rustc where RFC1214 warnings were not noticed because they occurred in stage1 and were not subject to deny(warnings). I tend to think you should give people what they ask for. As you say, it won't break dependencies, and you can always re-enable warnings. It also lets us get a more accurate crater measurement, since the issue of "root regressions" is moot.

@nikomatsakis
Copy link
Contributor

@petrochenkov

By "simple normalization" I mean substitution of type alias Priv...

...ok, that makes sense. But what is "full normalization" then?

In some sense, the type checker already does this sort of normalization (for better and worse). That is, it expands type aliases in its internal representation and keeps no record of them. But I don't think this is particularly useful for your check because, when it comes to privacy, you care about the path that people used to reach something, and not just the item they reached in the end (the type checker doesn't track the path). Anyway, privacy is (usually) not something the type checker should have to be overly concerned with I think.

@petrochenkov
Copy link
Contributor Author

@nikomatsakis

But what is "full normalization" then?

Substitution of type aliases beyond the simplest case - associated type aliases (when possible), type aliases with default type parameters.

Anyway, privacy is (usually) not something the type checker should have to be overly concerned with I think.

I don't suggest to do it in type checker, but I presume that type checker has tools for doing it, which can be reused in rustc_privacy somehow. Anyway, the simplest substitution can be done trivially in privacy without involving anything from type checker.

Without this "normalization" some regressions from the crater report look pretty silly, e.g. from libgit2-sys:

#[cfg(target_env = "msvc")] type __enum_ty = i32;
#[cfg(not(target_env = "msvc"))] type __enum_ty = u32;
// then in some macro
pub type $name = __enum_ty;

In place of the library authors I'd start to resent - "Hey, It's i/u32 I return here, not something private! __enum_ty is just a convenient implementation detail, don't make me write use i32 as __enum_ty; or introduce a private inner module to do the same thing". (As an aside, rustdoc should probably substitute these too.)

@nikomatsakis
Copy link
Contributor

@petrochenkov

Substitution of type aliases beyond the simplest case - associated type aliases (when possible), type aliases with default type parameters.

Ah. I'm not sure this kind of normalization makes sense or is necessary. After all, there is no "path" to follow when you do associated type normalization, for example, and there are already privacy rules that guarantee that the types that result from associated type normalization will be publicly accessible. (Basically, things appearing in impls, must be declared pub.) Similar rules I think apply to defaults.

Without this "normalization" some regressions from the crater report look pretty silly

Yes, I certainly see your point, and I've been annoyed by this same thing in the past. Basically wanting some private aliases and shorthands that you don't necessarily want to expose. It's tricky though because sometimes the type aliases are important things that rustdoc probably wants to preserve (e.g., io::Result<T> == Result<T,io::Error>) because we expect end users to use them, and sometimes they are not. Perhaps this can be keyed off of whether the type alias itself is public. This feels like it might potentially be good fodder for an amendment RFC though, not sure.

cc @rust-lang/lang thoughts on this? (Normalization of type aliases)

@petrochenkov
Copy link
Contributor Author

Perhaps this can be keyed off of whether the type alias itself is public.

That was my idea as well. After #29822 rustdoc has all the necessary information to discern between these two kinds of type aliases.

@petrochenkov
Copy link
Contributor Author

Updated with the old visitor restored, warnings used instead of some errors, and lint used instead of a plain warning.

@petrochenkov
Copy link
Contributor Author

Updated with (non-associated) type alias substitution.

@alexcrichton
Copy link
Member

I'll schedule another crater run.

@alexcrichton
Copy link
Member

This time around crater reports one failure.

@petrochenkov
Copy link
Contributor Author

@alexcrichton

This time around crater reports one failure.

This failure is expected, it's a result of the fix for resolve, I'm not sure it deserves a warning period, but if we want to be extra careful maybe it makes sense to put a warning there as well.

@alexcrichton
Copy link
Member

Cool, good to know! I'll leave the review to @nikomatsakis, but I'm pretty excited to see the nuts and bolts tightened up here!

@nikomatsakis
Copy link
Contributor

OK, I did a first pass and by and large I think this is pretty nice. It's hard for me to be sure if there are corner cases not covered but it certainly seems like a big improvement. I'm not quite ready to r+ yet though, largely because there are a few policy issues I'd like to resolve:

  • This PR ignores the privacy of constants that appear in types. For example, const SIZE: usize = 4; pub fn foo(arg: [i32; SIZE]) is legal (iiuc). I don't think I ever considered this case before, but it's not obvious to me that it should be legal (though it seems quite analogous to type aliases to me).
  • I'd like @rust-lang/lang to make a definitive decision about type aliases (probably during our meeting tomorrow). This is definitely a modification to the RFC text as I understood it, but I think a relatively minor one, and I think it is well-motivated. Still I want to be sure it gets enough visibility and is not something we just slip in during the PR.
  • I am not sure whether it makes sense to have a distinct lint for this case or to have one 'catch-all' lint like future_incompatible. Some arguments on both sides I came up with:
    • we plan to eventually make this a hard error, at which point the lint would be "deprecated". If we just have one future_incompatible lint, this is a non-issue; the lint can always exist, even if it is checking different things at different points in time.
    • OTOH it seems better if people can have more fine-grained control over just what they expect to be happening. Using distinct lints would allow you to forbid a particular change (i.e., once you've solved it) while having others remain as warnings.
    • I think the last point in particular convinces me that your approach is correct, but I'd still like to bring it up.

One other point: I'd like to ensure that there is a comprehensive and systematic set of tests for all the edge cases we can come up with. This is hard for me to tell, since there are older tests and of course the newer tests you've added, and the tests don't seem to share any distinct naming scheme. One nit I have with your naming is that you added most of the new tests as issue-12345.rs, which is good practice I think for a regression test, but sometimes also undesirable if there is no similar test. My usual rule of thumb is to give tests names like privacy-something-descriptive.rs unless the test kind of duplicates other tests and is really just a narrow regression test for a particular case, in which case I'd put it as issue-12345.rs Anyway all this is sort of neither here nor there, but maybe we should consider adding some systematic tests?

Some things I came up with that I'd like to know whether there are tests for:

  • values of associated types in the impl of a public trait
  • values of associated types in the impl of a private trait
  • bounds on an associated type in a trait declaration
  • etc

Anyway, I have encountered this same problem (how to manage and track what tests exist and which tests we may want to exist) in other contexts and still find it frustrating! I'd like some kind of side files in the test directory that show these matrices and which tests cover what. Obviously out of scope for this PR.

@nikomatsakis
Copy link
Contributor

@petrochenkov

[Regarding checking after inference]: This is what I think the proper solution should be.

I agree with this now.

@petrochenkov
Copy link
Contributor Author

I'd like to leave the constant checking to a separate PR, because it needs discussion. It's also a new kind of restriction that can be unexpected for users. I'll create an issue for this.

@nikomatsakis
Copy link
Contributor

@petrochenkov

This const checking will lead us to a strange territory when constant expressions become more permissive.

Heh, a good question. If we REALLY want to guarantee that one can freely change a constant, we are indeed forced to a pretty extreme position -- too extreme, in my view. For example,

const PRIVATE_KEY: [u8; 256] = ...;
pub const PUBLIC_KEY: [u8; 256] = ...;
pub const SIGNED_GREETING = sign("Hello, world!", &PRIVATE_KEY, &PUBLIC_KEY);
pub const fn sign(text: &str, private_key: &[u8], public_key: &[u8]) {
    // some kind of plausible signing algorithm
}

Given the improvements to const fn evaluation that @eddyb and I have been talking about, it is certainly plausible that we could actually implement the above fn at some point in the not-that-distinct future. In that case, we could regard this as illegal, because it "exposes" PRIVATE_KEY. But that had better not be true or else the security of the internet is in peril. But of course if we PERMIT references to PRIVATE_KEY, then we risk exposing PRIVATE_KEY altogether. And if we try to be smarter than that then we wind up in a nice little research field about information flow, which is most certainly an area I do NOT want to be in!

Of course you don't need to go to such esoteric examples. It's enough to have something like:

const LEN: u32 = ...;
pub const APPROXIMATE_LEN: u32 = LEN / 4;

I may want to keep the precise LEN hidden, but expose it to within +-3. Now I cannot do that.

A compromise might be to say that we do not check for privacy in the RHS of a constant definition nor in a const fn body. This implies that I could write any of the above code, but I could only use APPROXIMATE_LEN or PUBLIC_KEY or SIGNED_GREETING in a public signature: I cannot use LEN or PRIVATE_KEY. This is good. It does mean that one "leak" a "private" constant if one wants to:

const FOO: u32 = 22;
pub const BAR: u32 = FOO;

but if you are going to do that, you have to at least do it explicitly. Put another way, we guarantee for you (I think?) that the only constants the caller can observe are those that are declared public, but we leave it to you to vet that this only exposes what you intended. I am not sure if I find that persuasive. It doesn't feel very consistent with the rest of the "private things appearing in public APIs" logic. Another compromise might be to say that we do not check the body of const fn, but it also isn't giving you any particular guarantees.

@rust-lang/lang, thoughts on these thorny questions about public constants?

@petrochenkov, do you think any similar concerns arise for type aliases? I don't think so, particularly given your "normalization" rule, which effectively means that the body of the type alias would have to be "deeply" public anyhow (iow, type alias normalization is kind of implementing the stricter rule).

We've certainly got to land some of this excellent code soon!

@nikomatsakis
Copy link
Contributor

@petrochenkov

I'd like to leave the constant checking to a separate PR, because it needs discussion. It's also a new kind of restriction that can be unexpected for users. I'll create an issue for this.

I agree with this inclination. The only question in my mind is what to do about type alias normalization. Certainly if we revert it, we can always add it back in later (perhaps with an RFC). This sort of makes sense, since it is weakening existing, stable rules.

@petrochenkov
Copy link
Contributor Author

One more comment about type alias normalization - it still has to be performed for impls:

pub struct Pub;
type PrivAliasOfPub = Pub;
impl PubTr for PrivAliasOfPub {...} // This impl is public even if `PrivAliasOfPub` is private 

so most of the code for it will stay in place.

@nikomatsakis
Copy link
Contributor

@petrochenkov

One more comment about type alias normalization - it still has to be performed for impls

Ah, true.

Type aliases are still substituted when determining impl publicity
@petrochenkov
Copy link
Contributor Author

Updated with type aliases not substituted.

@nikomatsakis
Copy link
Contributor

@bors r+ p=2

@bors
Copy link
Contributor

bors commented Dec 18, 2015

📌 Commit 785cbe0 has been approved by nikomatsakis

@bors
Copy link
Contributor

bors commented Dec 18, 2015

⌛ Testing commit 785cbe0 with merge ef91cdb...

bors added a commit that referenced this pull request Dec 18, 2015
Some notes:
This patch enforces the rules from [RFC 136](https://github.com/rust-lang/rfcs/blob/master/text/0136-no-privates-in-public.md) and makes "private in public" a module-level concept and not crate-level. Only `pub` annotations are used by the new algorithm, crate-level exported node set produced by `EmbargoVisitor` is not used. The error messages are tweaked accordingly and don't use the word "exported" to avoid confusing people (#29668).

The old algorithm tried to be extra smart with impls, but it mostly led to unpredictable behavior and bugs like #28325.
The new algorithm tries to be as simple as possible - an impl is considered public iff its type is public and its trait is public (if presents).
A type or trait is considered public if all its components are public, [complications](https://internals.rust-lang.org/t/limits-of-type-inference-smartness/2919) with private types leaking to other crates/modules through trait impls and type inference are deliberately ignored so far.

The new algorithm is not recursive and uses the nice new facility `Crate::visit_all_items`!

Obsolete pre-1.0 feature `visible_private_types` is removed.

This is a [breaking-change].
The two main vectors of breakage are type aliases (#28450) and impls (#28325).
I need some statistics from a crater run (cc @alexcrichton) to decide on the breakage mitigation strategy.
UPDATE: All the new errors are reported as warnings controlled by a lint `private_in_public` and lint group `future_incompatible`, but the intent is to make them hard errors eventually.

Closes #28325
Closes #28450
Closes #29524
Closes #29627
Closes #29668
Closes #30055

r? @nikomatsakis
@bluss bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label Dec 18, 2015
@nikomatsakis
Copy link
Contributor

@petrochenkov are you aware of an open issue concerning the interactions between inference and privacy?

@petrochenkov
Copy link
Contributor Author

@nikomatsakis

are you aware of an open issue concerning the interactions between inference and privacy?

I opened one yesterday - #30476

@sgrif
Copy link
Contributor

sgrif commented Dec 20, 2015

I'm investigating some breakage in Diesel, and I believe this PR is the source. Just to make sure I'm clear, the intention was to disallow using private type aliases that appear anywhere? I've been using them quite a bit to shorten some more gnarly where clauses. I can make the type aliases public and #[doc(hidden)], but it seems like a confusing change to me.

@petrochenkov
Copy link
Contributor Author

@sgrif
Do you mean something like

type PrivAlias = SomethingLongButPublic;
pub fn public_function(arg: PrivAlias) {}

?

If yes, then my intention was to allow this, that's why type aliases were originally substituted, but it was reverted later to follow the letter of the RFC. It can be fixed at any moment (by removing three lines of code), but it needs a decision from the lang team.

@pnkfelix
Copy link
Member

cc #16463

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants