Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Remove #[crate_id] #109

Merged
merged 7 commits into from
Jun 25, 2014
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
203 changes: 203 additions & 0 deletions active/0000-remove-crate-id.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
- Start Date: (fill me in with today's date, YYYY-MM-DD)
- RFC PR #: (leave this empty)
- Rust Issue #: (leave this empty)

# Summary

Remove the `crate_id` attribute, knowledge of versions from rustc, and the
`crate_type` attribute. Allow keywords in attributes, and add a `#[crate]`
attribute similar to the old `#[link]` attribute. Filenames will no longer have
versions, nor will symbols, and the compiler will grow a new flag, `--extern`,
to override searching for external crates.

# Motivation

The intent of CrateId and its support has become unclear over time as the
initial impetus, `rustpkg`, has faded over time. With `cargo` on the horizon,
doubts have been cast on the compiler's support for dealing with crate
versions and friends. The goal of this RFC is to simplify the compiler's
knowledge about the identity of a crate to allow cargo to do all the necessary
heavy lifting.

This new crate identification is designed to not compromise on the usability of
the compiler independent of cargo. Additionally, all use cases support today
with a CrateId should still be supported.

# Detailed design

A new `crate` attribute will be accepted by the compiler. For example:

```rust
#![crate(name = "json", type = "dylib", type = "rlib", version = "1.0.2-pre")]
```

Breaking down this attribute:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks!


* `crate` - This is a new top-level attribute which has an inner list of meta
items which are intended to describe the current crate.
* `name` - This meta item is recognized by the compiler to provide a name for
the crate being compiled. This will override the compiler's inference
of the crate name based on the file name being compiled. This is also
later used to match with when upstream crates link to this one.
* `type` - This will supersede the `#[crate_type]` attribute. The `crate`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, since this has a restricted and fixed set of possible values, it feels like using idents might be better, e.g.

#![crate(type(dylib, rlib))]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would tend to lean towards consistency among the inner attributes, but this is an interesting idea!

attribute can contain multiple `type` meta items, describing the type
of output of this crate. This is used to describe whether the output
is an rlib, dylib, executable, or staticlib.
* `version` - This, and all other members of the `crate` attribute, are not
specially recognized by the compiler. These attributes are used as
controllers for the hashes used in the rest of the compiler.

In addition to allowing metadata being specified through the `crate` attribute,
the compiler will also grow a new flag, `-C metadata=foo` to be able to specify
metadata via the command line that should be part of the crate's metadata
attributes. The usage of this metadata is described below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you specify arbitrary metadata keys, or just the 3 keys above (except type isn't really metadata, it directly controls the output of the compiler, so it's probably not appropriate to use with -C).

And what happens if you specify a key that already exists? type allows multiple instances of that attribute, but the other attributes don't seem to make sense to allow multiple. And what about arbitrary attributes? I can imagine arbitrary keys you might want to specify that want the multiple-values approach, and arbitrary keys that don't.

Since name and type are special (as they control the compiler's output), but version is not, perhaps version should actually be moved into a new key metadata that holds key/value pairs.

#![crate(name = "json", type = "dylib", type = "rlib", metadata(version = "1.0.2-pre"))]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this approach, I think it's ok to treat the -C flag as overriding a previous value and not worrying about needing a way to allow a list of values for a metadata attribute. I don't think we can really do that with the attribute syntax right now anyway, because a MetaItem can't be a bare string literal (meaning you can't say authors("Bob", "Fred", "Joe")). And even if we adjust the AST to allow for that, we can punt on allowing that flexibility from -C unless someone has a compelling reason to require it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted in the section explaining version, all other members of the crate attribute are considered metadata. The name is very specifically used as metadata for the hash of symbols/library names.

You're right that the type doesn't necessarily need to be considered metadata, and because it can also be specified from the command line I believe that we'd just ignore it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said the other attributes are metadata, but you didn't say that attributes not listed here are legal to write. It's a bit confusing, I thought you might have been saying e.g. name is also metadata (just metadata that the compiler uses).

I think it's more sensible to nest the metadata into a metadata list, otherwise I would expect e.g. -C name=foo or -C type=rlib to work (and I don't think the former should work, and the latter definitely shouldn't).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I follow. This proposal does not mention -C name=foo, nor -C type=rlib. The only proposed flag was -C metadata=foo which is purely used to seed hashes, it has nothing to do with the crate attribute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using the crate attribute to specify name, type, and metadata, but you're putting the metadata into the same namespace as the name and type keys. If -C adds extra metadata values, that's equivalent to adding those metadata values to the #![crate] attribute in the source. But either we have to explicitly state that "name and type are not valid names for metadata attributes", or -C can actually specify metadata that you cannot represent with #![crate].

By using #![crate(name = ..., type = ..., metadata(version = ...))], that removes this ambiguity. Now it's clear that -C cannot influence the crate name or type, and it also allows name and type to be reused as metadata keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using metadata() like that also enables the ability to warn about unknown crate attributes, which I think is a useful property.

We also need a new name for talking about this, "crate attributes" already refers to #![foo] constructs, so using it to also refer to the name and type of #![crate(...)] is a bit confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like I may need to phrase this better, I've pushed an updated version to clarify that all of the "metadata" is just a pile of bytes to hash. I would lean away from a specialized metadata(...) section because it's not the only data being hashed (it leaves out the name), and it's not really metadata, it's just stuff to hash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your updated version actually doesn't address any of my concerns here. And it is metadata, but the only thing the metadata is used for is as input to the hash. But presumably it's also stored in the crate and can be extracted when reading the crate data, right? So it can be used by tools that operate on crates to provide information (e.g. cargo could theoretically use it to record the source the crate was installed from, and then read it later to validate that the source matches the crate).

Also if my suggestion to allow for specifying metadata attributes on the extern crate line is implemented, then this metadata would obviously be used for that as well.

I don't feel like a metadata(...) section implies that's the full extent of the data that leads to the hash; after all, the entire section is optional, and crates with no metadata(...) section won't have the same hash.


## Keywords in attributes

The compiler currently disallows keywords in attributes. This rule would be
amended to allow any identifier an attribute names and attribute keys. This is
primarily done to allow the `crate` and `type` attributes to exist.

## Naming library filenames

Currently, rustc crates filenames for library following this pattern:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be 'creates' and not 'crates'? Too much crate 😉


```
lib<name>-<version>-<hash>.rlib
```

The current scheme defines `<hash>` to be the hash of the CrateId value. This
naming scheme achieves a number of goals:

* Libraries of the same name can exist next to one another if they have
different versions.
* Libraries of the same name and version, but from different sources, can exist
next to one another due to having different hashes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would they have different hashes? You haven't defined any mechanism by which the source of a library can be specified.

I would suggest adding an optional crate metadata attribute called source that is meant specifically for this purpose. For a GitHub project it would be suggested that the source be github.com/username/projectname, like we're currently doing with crate IDs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by your question, this section is referencing today's behavior, and I stated above that the hash is of the CrateId.

Down below, I stated that the hash is the contents of the metadata (crate attribute + command line flags). If you wanted to disambiguate your library oh github then you could add that to the crate attribute and it'd get hashed in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said all three goals are still satisfied by the new scheme, but you didn't define any way to specify a library source. Your suggestion of adding it to the metadata is covered by another one of my comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above, one of the major goals of this scheme is to have arbitrary metadata (as noted in the explanation of version).

* Rust libraries can have very privileged names such as `core` and `std` without
worrying about polluting the global namespace of other system libraries.

Under this new scheme, the new filenames would be:

```
lib<name>-<hash>.rlib
```

Where the new hash is defined to be the hash of the contents of the metadata.
The metadata is a merging of the command line metadata as well as the contents
of the `crate` attribute. All three of the original goals are still satisfied by
this new naming scheme. The version would no longer be explicitly visible in the
filename, but the hash would include the version and be distinct across two
versions of the same library.

## Loading crates

The goal of the crate loading phase of the compiler is to map a set of `extern
crate` statements to (dylib,rlib) pairs that are present on the filesystem. To
do this, the current system matches dependencies via the CrateId syntax:

```rust
extern crate json = "super-fast-json#0.1.0";
```

In today's compiler, this directive indicates that the a filename of the form
`libsuper-fast-json-0.1.0-<hash>.rlib` must be found to be a candidate. Further
checking happens once a candidate is found to ensure that it is indeed a rust
library.

Concerns have been raised that this key point of dependency management is where
the compiler is doing work that is not necessarily its prerogative. In a
cargo-driven world, versions are primarily managed in an external manifest, in
addition to doing other various actions such as renaming packages at compile
time.

One solution would be to add more version management to the compiler, but this
is seen as the compiler delving too far outside what it was initially tasked to
do. With this in mind, this is the new proposal for the `extern crate` syntax:

```rust
extern crate json = "super-fast-json";
```

Notably, the CrateId is removed entirely, along with the version and path
associated with it. The string value of the `extern crate` directive is still
optional (defaulting to the identifier), and the string can only contain
alphanumeric characters, `-`, and `_`. Note that the string is not a valid rust
identifier due to its usage of the `-` character.

The compiler's searching and file matching logic would be altered to only match
crates based on name. If two versions of a crate are found, the compiler will
unconditionally emit an error. It will be up to the user to move the two
libraries on the filesystem and control the `-L` flags to the compiler to enable
disambiguation.

This scheme is strictly less powerful than the previous, but it moves a good
deal of logic from the compiler to cargo.

### Manually specifying dependencies

Cargo is often seen as "expert mode" in its usage of the compiler. Cargo will
always have prior knowledge about what exact versions of a library will be used
for any particular dependency, as well as where the outputs are located.

If the compiler provided no support for loading crates beyond matching
filenames, it would limit many of cargo's use cases. For example, cargo could
not compile a crate with two different versions of an upstream crate.
Additionally, cargo could not substitute `libfast-json` for `libslow-json` at
compile time (assuming they have the same API).

To accomodate an "expert mode" in rustc, the compiler will grow a new command
line flag of the form:

```
--extern json=path/to/libjson
```

This directive will indicate that the library `json` can be found at
`path/to/libjson`. The file extension is not specified, and it is assume that
the rlib/dylib pair are located next to one another at this location (`libjson`
is the file stem).

This will enable cargo to drive how the compiler loads crates by manually
specifying where files are located and exactly what corresponds to what.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When multiple libraries are specified, is this multiple --extern flags or is there a syntax for including multiple directives under one flag (e.g. --extern json=path/to/libjson,yaml=path/to/libyaml? I'm inclined to say it should be the former (multiple flags), but this should be made clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the format specified, --extern foo=bar will be accepted (as specc'd currently)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton Command-line flags can't necessarily be specified multiple times. I take it from your answer you're expecting that --extern can be specified multiple times, and that doing so combines both values instead of e.g. overriding the earlier one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There is precendent for this with rustc (fsvo "combines"), e.g. --cfg, -C, -Z, -W.)


## Symbol mangling

Today, mangled symbols contain the version number at the end of the symbol
itself. This was originally intended to tie into Linux's ability to version
symbols, but in retrospect this is generally viewed as over-ambitious as the
support is not currently there, nor does it work on windows or OSX.

Symbols would no longer contain the version number anywhere within them. The
hash at the end of each symbol would still include the symbol via the metadata
in the `crate` attribute or from the command line, however.

# Drawbacks

* The compiler is able to operate fairly well independently of cargo today, and
this scheme would hamstring the compiler by limiting the number of "it just
works" use cases. If cargo is not being used, build systems will likely have
to start using `--extern` to specify dependencies if name conflicts or version
conflicts arise between crates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could add purely optional syntax to the extern crate statement to list metadata attributes that must match?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the old extern crate directives were, and we explicitly removed them in favor of just matching a crate id. It'd be an interesting idea to bring them back, however.


* This scheme still has redundancy in the list of dependencies with the external
cargo manifest. The source code would no longer list versions, but the cargo
manifest will contain the same identifier for each dependency that the source
code will contain.

# Alternatives

* The compiler could go in the opposite direction of this proposal, enhancing
`extern crate` instead of simplifying it. The compiler could learn about
things like version ranges and friends, while still maintaining flags to fine
tune its behavior. It is unclear whether this increase in complexity will be
paired with a large enough gain in usability of the compiler independent of
cargo.

# Unresolved questions

* An implementation for the more advanced features of cargo does not currently
exist, to it is unknown whether `--extern` will be powerful enough for cargo
to satisfy all its use cases with.

* Does allowing keywords in attributes set an unusual precedent for other
portions of the language?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me says yes, but part of me says that attributes are metadata and shouldn't care about keywords. At this point I'm leaning towards the latter, since there's no reason why keywords will ever be special in attributes.

If we do decide that this is an issue, we could work around it with two steps:

  1. Change type to kind. That's not a keyword, so it should be ok.
  2. Turn crate into Rust's first contextual keyword. We've talked about having contextual keywords in the past (e.g. with in, and when crate was first introduced), but there hasn't been a compelling reason to add the first one. But crate is only significant when it immediately follows the keyword extern, and is an error to use anywhere else, so it's a good candidate to be a contextual keyword.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it does not. In fact, attributes are the outliers here! Other syntax extensions can and do accept keywords as regular identifiers. A great example is local_data_key!(pub ...), but others use mod, struct, etc. The language that syntax extensions accept is limited only to "balanced delimiters", and I don't think restricting it further is a good idea.