Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Remove #[crate_id] #109

Merged
merged 7 commits into from
Jun 25, 2014
Merged

Conversation

alexcrichton
Copy link
Member

No description provided.

@alexcrichton
Copy link
Member Author

#![crate(name = "json", type = "dylib", type = "rlib", version = "1.0.2-pre)]
```

Breaking down this attribute:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks!

to satisfy all its use cases with.

* Does allowing keywords in attributes set an unusual precedent for other
portions of the language?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me says yes, but part of me says that attributes are metadata and shouldn't care about keywords. At this point I'm leaning towards the latter, since there's no reason why keywords will ever be special in attributes.

If we do decide that this is an issue, we could work around it with two steps:

  1. Change type to kind. That's not a keyword, so it should be ok.
  2. Turn crate into Rust's first contextual keyword. We've talked about having contextual keywords in the past (e.g. with in, and when crate was first introduced), but there hasn't been a compelling reason to add the first one. But crate is only significant when it immediately follows the keyword extern, and is an error to use anywhere else, so it's a good candidate to be a contextual keyword.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it does not. In fact, attributes are the outliers here! Other syntax extensions can and do accept keywords as regular identifiers. A great example is local_data_key!(pub ...), but others use mod, struct, etc. The language that syntax extensions accept is limited only to "balanced delimiters", and I don't think restricting it further is a good idea.

@metajack
Copy link

metajack commented Jun 7, 2014

A few problems with this scheme that were addressed in the previous crate_id scheme:

  1. The hash can no longer by calculated easily by external tools. You must invoke the rust compiler to get it. The new crate attribute, being unordered, is not something you can grep for, run SHA256 on, etc.
    2). I suppose you could get around this by canonicalizing the attribute order when you calculate the hash. This is even worse because the hash now depends on command line invocation parameters. So two compiles of the same library might generate different hashes.

  2. It's much harder to have a path in the hash, so people are unlikely to do that. crate_id made it very easy to include the path in the name so that libraries with the same name would not conflict.

  3. If you aren't using cargo, you must implement your own library finding. Since you can't know the hash you're supposed to have (because you don't know the full crate id), you may have multiple versions of the same library installed. It's possible there will be multiple versions with the same name and version number, but different hashes. This seems like a regression from the current behavior where we know exactly what the full file name is of the extern.

the crate being compiled. This will override the compiler's inference
of the crate name based on the file name being compiled. This is also
later used to match with when upstream crates link to this one.
* `type` - This will supersede the `#[crate_type]` attribute. The `crate`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, since this has a restricted and fixed set of possible values, it feels like using idents might be better, e.g.

#![crate(type(dylib, rlib))]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would tend to lean towards consistency among the inner attributes, but this is an interesting idea!

@alexcrichton
Copy link
Member Author

  1. The hash can no longer by calculated easily by external tools. You must invoke the rust compiler to get it. The new crate attribute, being unordered, is not something you can grep for, run SHA256 on, etc.

A constant compiler invocation will always print the same hash, and the compiler can always print the hash out very quickly. What is the use case for relying on something other than the compiler to emit the hash?

I would think that you can always shell out to the compiler itself. The compiler only needs to parse the crate attributes (almost always successful) to print the hash.

2). I suppose you could get around this by canonicalizing the attribute order when you calculate the hash. This is even worse because the hash now depends on command line invocation parameters. So two compiles of the same library might generate different hashes.

That's intentional, however. Cargo needs a way to pass arbitrary metadata in the manifest that's not present in the crate file.

  1. It's much harder to have a path in the hash, so people are unlikely to do that. crate_id made it very easy to include the path in the name so that libraries with the same name would not conflict.

I'm not sure I follow, #![crate(path = "foo/bar/baz")] puts a path in the hash?

  1. If you aren't using cargo, you must implement your own library finding.

Can you elaborate on this? Without cargo, you have exactly what we have today, except that crates are only matched by name, not by name/version.

Since you can't know the hash you're supposed to have (because you don't know the full crate id), you may have multiple versions of the same library installed. It's possible there will be multiple versions with the same name and version number, but different hashes. This seems like a regression from the current behavior where we know exactly what the full file name is of the extern.

Yes, this is an explicit regression that I pointed out in the RFC. This seems like a rare enough use case that the benefits gained by what cargo can do outweigh it, however.

@metajack
Copy link

metajack commented Jun 7, 2014

What is the use case for relying on something other than the compiler to emit the hash?

I don't personally rely on it, but build tools are weird. What are we gaining by making this harder?

Cargo needs a way to pass arbitrary metadata in the manifest that's not present in the crate file.

What metadata does it need to pass exactly that isn't already in a crate_id?

I'm not sure I follow, #![crate(path = "foo/bar/baz")] puts a path in the hash?

Your examples don't have path. You seem to treat it as some ancillary, unimportant information, when in fact it is fairly critical.

Without cargo, you have exactly what we have today, except that crates are only matched by name, not by name/version.

Today we know the hash exactly since I can write extern crate foo = "github.com/mozilla/servo#1.0. If the hash is arbitrary based on command line arguments, I have no idea which of two libraries I need that differ only by hash. The compiler will fail since it can't resolve to a single one. I am stuck.

This seems like a rare enough use case that the benefits gained by what cargo can do outweigh it, however.

It doesn't seem that rare to me. Every distro has glib, glib2, ruby1.9.1 and ruby, etc. For a long time git was git-core. Look at the Clojars repos to see how many forks of popular packages exist simultaneously. What is ths other metadata that we need that we don't already have?

I was under the impression the metadata Cargo needed was exactly the same, but that we didn't want to duplicate it in the Cargo manifest and in the Rust source. Why do we need to extend the crate hash to arbitrary data in addition to avoiding duplication?

@alexcrichton
Copy link
Member Author

Ok, I'm going to try to take a step back and explain why I believe that crate ids just aren't sufficient today, and why cargo doesn't want to use them.

In a cargo-driven world, applications and libraries will have a manifest file. In this manifest the name of the crate will be listed, the version, and other metadata like the author. Additionally, external dependencies, as well as their version requirements, will be listed in this file. Note that it is very important that this be a separate file because cargo needs to be able to quickly download the manifests for remote packages in order to solve the dependency graph for a particular build.

So, in that world, you'll see a few similarities to what we have today. The #[crate_id] specifies the name/version of a libraries, and some other attributes work with the author. The extern crate statements serve as declaration of dependencies, but there is no concept of version ranges. Primarily, you'll note that this scheme does not solve the "one manifest to download" problem. Note that cargo also plans on a laundry list of more advanced features that are simply not possible with a crate id:

  • Easily switching a crate to be from a remote location to a local path. This is not possible with our extern crate syntax.
  • Specifying that a "pseudo-crate" which can be substituted for a number of concrete crates exposing the same interface. This is not possible with the interface to the compiler today.
  • Specifying a range of versions that are suitable for a package. This is not possible with the extern crate syntax.
  • Substituting one package for another in an upstream dependency (such as the pseudo crates above). This is not possible with how the compiler is architected today.

There are more examples that I'm sure @wycats and @carllerche could weigh in with.

So, with this information in mind, here are some thoughts about the crate id system that we have today:

  • Primarily, you can extract a manifest from the rust source (theoretically), but you cannot easily download a single manifest file for a crate.
  • Version ranges, a crucial part of cargo, do not work with crate ids. The extern crate syntax could be altered, but this is going down the route of making rustc itself a version dependency manager, which I believe is unacceptable.
  • Flavorful configuration is impossible with crate ids. You're locked in to exactly what we're specifying, with very little flexibility in terms of resolving upstream crates.

The primary goal of this RFC is to enable these very legitimate use cases of cargo while retaining the usefulness of rustc itself. Note that it's tough for me to debate about nitpickity points about the crate id syntax because, for the reasons outlined above, it is simply insufficient for cargo.

Sorry if this wasn't clear in the original RFC, I can amend some wording here and there. But with all this in mind, I'll try to address your questions now


What is the use case for relying on something other than the compiler to emit the hash?

I don't personally rely on it, but build tools are weird. What are we gaining by making this harder?

Can you explain how you think we're making this harder? Today it's rustc --crate-file-name, and tomorrow it will be the same utility. I envision that all manual usage of rustc (such as non-cargo build systems) will never use the -C metadata=foo flag. This is purely meant for cargo to transmit version and perhaps other information.

What metadata does it need to pass exactly that isn't already in a crate_id?

Ah, another primary reason for removing crate ids is to remove duplication. The manifest file for cargo must know the name/version of a library, and mandating that it be written down in two places is something that should be avoided. Yes, all the information is in the crate id. No, I do not want to duplicate it across the manifest and the file name.

Your examples don't have path. You seem to treat it as some ancillary, unimportant information, when in fact it is fairly critical.

As I noted above, arbitrary values are allowed in the #[crate] attribute, all of which are very important. Each piece is used to generate the hash. Did you mean "ancillary and unimportant" via some other means?

Today we know the hash exactly since I can write extern crate foo = "github.com/servo/servo#1.0. If the hash is arbitrary based on command line arguments, I have no idea which of two libraries I need that differ only by hash. The compiler will fail since it can't resolve to a single one. I am stuck.

Yes, this is what I explicitly called out as a drawback of this design. This is a consequence of designing for minimal duplication between cargo and the compiler. I do not believe that this is an important enough use case (as most projects will simply just use cargo). Manually driving rustc has very little need for urls or versions, it only ever needs names.

It doesn't seem that rare to me. Every distro has glib, glib2, ruby1.9.1 and ruby, etc.

I would expect this case to be rare because cargo will already have to handle this case, and it will be the predominate use case for having all these dependencies all over the place.

I was under the impression the metadata Cargo needed was exactly the same, but that we didn't want to duplicate it in the Cargo manifest and in the Rust source

Yes, that is a primary concern of this design.

Why do we need to extend the crate hash to arbitrary data in addition to avoiding duplication?

We want to allow for all the same use cases today to continue to work tomorrow. Allow arbitrary data is a tool to reach that end. I'll say this again, but I expect cargo users to rarely use the #[crate] attribute, and non-cargo users will rely on the #[crate] attribute for this sort of metadata. I expect it to be very rare to mix the two.

@steveklabnik
Copy link
Member

Note that it is very important that this be a separate file because cargo needs to be able to quickly download the manifests for remote packages in order to solve the dependency graph for a particular build.

I think this point is really important.


## Naming library filenames

Currently, rustc crates filenames for library following this pattern:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be 'creates' and not 'crates'? Too much crate 😉

bors added a commit to rust-lang/rust that referenced this pull request Jun 13, 2014
…ichton

With this change, rustc creates a unique type identifier for types in debuginfo. These type identifiers are used by LLVM to correctly handle link-time-optimization scenarios but also help rustc with dealing with inlining from other crates. For more information, see the documentation block at the top of librustc/middle/trans/debuginfo.rs and also [my blog post about the topic](http://michaelwoerister.github.io/2014/06/05/rust-debuginfo-and-unique-type-identifiers.html). This should fix the LTO issues that have been popping up lately. 

The changes to the debuginfo module's inner workings are also improved by this. Metadata uniquing of pointer types is not handled explicitly instead of relying on LLVM doing the right thing behind the scenes, and region parameters on types should not lead to metadata duplication anymore.

There are two things that I'd like to get some feedback on:
1. IDs for named items consist of two parts: The [Strict Version Hash](https://github.com/mozilla/rust/blob/0.10/src/librustc/back/svh.rs#L11) of their defining crate and the AST node id of their definition within that crate. My question is: Is the SVH a good choice for identifying the crate? Is it even going to stay? The [crate-id RFC](rust-lang/rfcs#109) got me confused.
2. Unique Type Identifiers can be arbitrary strings and right now the format is rather verbose. For debugging this is nice, because one can infer a lot about a type from the type id alone (it's more or less a signature). For deeply nested generics, id strings could get rather long though. One option to limit the id size would be to use some hashcode instead of the full id (anything that avoids collision as much as possible). Another option would be to use a more compact representation, like ty_encode. This reduces size but also readability.
Since these ID's only show up in LLVM IR, I'm inclined to just leave in the verbose format for now, and only act if sizes of rlibs become a problem.
@ghost
Copy link

ghost commented Jun 14, 2014

Do crates need to know what kind of library they're compiled into? Can we please remove type=dylib and friends in favor of just lib?

  • other languages do it with compiler flags and build system/package manager settings
  • it makes rustc harder to script for
  • it can be overridden anyway
  • it will be overridden anyway by packagers who want to conform to system-wide preferences, whether packaging policy or user settings. Some people will want -Z lto, others think static libraries are a waste of space with glibc, etc.

@alexcrichton
Copy link
Member Author

I have scaled this back to remove all extra metadata from filenames, making them entirely predictable. Additionally I removed the #[crate] attribute entirely for just a #[crate_name] attribute. The #[crate_type] attribute is unchanged.

@metajack, @wycats, does this look ok to you guys?

@wycats
Copy link
Contributor

wycats commented Jun 19, 2014

@alexcrichton Cargo needs the ability to provide additional metadata that can be used in symbol mangling.

Unless I'm misreading, this proposal would result in any two packages with the same name (regardless of version or source) producing collidable symbols.

We need the ability to allow both versions and a namespace to drive the mangled symbols.

@alexcrichton
Copy link
Member Author

Unless I'm misreading, this proposal would result in any two packages with the same name (regardless of version or source) producing collidable symbols.

Oops, sorry! I accidentally deleted this as part of the revisions. I've updated to explain that -C metadata=foo still exists.

@wycats
Copy link
Contributor

wycats commented Jun 19, 2014

@alexcrichton sweet

and the `<hash>` was removed to make the output filename predictable.

The three original goals can still be satisfied with this simplified naming
scheme. As explained in th enext section, the compiler's "glob pattern" when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "th enext" -> "the next"


```rust
#![crate_name = "foo"]
#![crate_type = "lib"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps these could be unified, as

#![crate(name = "foo", type = "lib")]

Similar to how you had it before, but only explicitly-defined keys are allowed, instead of arbitrary keys, and the two allowed keys are "name" and "type".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess doing this still requires allowing keywords as attributes, but I think that's fine anyway. There's no reason for Rust's normal keyword set to matter inside of an attribute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to take a more conservative approach, and multiple type keys seemed odd enough to avoid this for now. I don't think we have many other attributes that accept the same key more than once (and it has semantic meaning).

@lilyball
Copy link
Contributor

Using a glob of lib${name}*.rlib will cause foo and foobar to be treated as conflicting library installations despite having different identifiers.

I think it needs to search for lib${name}{,-*}.rlib instead (which probably requires doing two separate globs because AFAIK glob() doesn't support this sort of thing). This way libfoo.rlib and libfoobar.rlib would not be considered to conflict, but libfoo.rlib and libfoo-bar.rlib would (as expected).

Although given that now multiple matching library names are considered to conflict, and any suffix is allowed, that suggests we need to explicitly disallow the - character in crate identifiers. Otherwise #![crate_name = "foo-bar"] would cause the resulting library to conflict with libfoo.rlib. Perhaps instead we should enforce that the crate name needs to follow the same identifier rules as Rust identifiers, and then keep around extern crate foo = "bar" purely to allow for renaming crates the same way we allow use foo = bar. Although since the crate identifier must be an identifier, this means we can drop the string literal entirely and use extern crate foo = bar.

Requiring identifier rules like that also suggests that perhaps we should drop the string literal in the attribute too, so it's #![crate_name = foo], although I believe that will require changing the parser to allow that. If this is done, we should probably also do the same thing to crate_type, as in #![crate_type = rlib], because using a string literal suggests that arbitrary strings should be allowed. Of course, with my suggested change above, this would instead be #![crate(name = foo, type = rlib)].

@lilyball
Copy link
Contributor

Another concern I have with dropping the hash is that dylib Rust crates may now conflict with third-party non-Rust libraries. For example, my rust-lua project currently has a library name of liblua-23a36a8d-0.1.rlib. If I were to compile a dylib, that would be liblua-23a36a8d-0.1.dylib. With this RFC, that would then turn into liblua.dylib, which conflicts with the already-existing liblua.dylib from my Lua 5.1 installation.

The ability to have an un-suffixed library name seems useful for Rust libraries that vend extern "C" interfaces, but it's problematic when writing a Rust library that wraps a C library. It's worth considering whether a library should have the ability to explicitly request a suffix, so as to avoid this conflict. This is something that the build system can handle, but I would be happier if I knew my library would work properly when compiled by hand using rustc lib.rs (among other reasons, this helps if someone using a different build system wants to build my library as a dependency, because they don't have to try and replicate my build system's logic for the suffix, they can just compile it directly). This may be considered to be more confusing than it's worth.

@lilyball
Copy link
Contributor

Also, on the topic of wrapping C libraries, I'm concerned that considering multiple name matches to be inherently a conflict will also be problematic here. If I build and install my rust-lua as liblua-23a36a8d-0.1.dylib, and someone else says extern crate lua;, rustc will find my liblua-23a36a8d-0.1.dylib as well as the C library liblua.dylib, and will presumably consider that an error.

However, since extern crate declarations must be crates, rather than arbitrary libraries, it's reasonable to expect that the C library liblua.dylib should not be considered a conflict, by virtue of not being a crate.

To that end, I would suggest that all matching dylibs be tested to see if they are Rust crates, even though the crate metadata would not actually be compared to anything to see if it matches (just merely checked to see if it exists). rlibs are by definition Rust crates so they don't need to be tested in this manner.

@alexcrichton
Copy link
Member Author

Using a glob of lib${name}*.rlib will cause foo and foobar to be treated as conflicting library installations despite having different identifiers.

This is not true, the metadata of the library would contain the exact name. The glob is just to filter out names so we don't have to read everything in the world.

@lilyball
Copy link
Contributor

This is not true, the metadata of the library would contain the exact name. The glob is just to filter out names so we don't have to read everything in the world.

My interpretation of the current RFC is that the "library name" is the name of the library on-disk, not some name embedded in the metadata. Specifically in the following line:

The compiler's searching and file matching logic would be altered to only match crates based on name. If two versions of a crate are found, the compiler will unconditionally emit an error.

If the "name" here in fact refers to the crate_name as specified in the library metadata, rather than the name on disk, then that resolves my complaint.

@brson brson merged commit 4878811 into rust-lang:master Jun 25, 2014
@lilyball
Copy link
Contributor

I am a bit concerned that this was merged without addressing the issue I raised about unsuffixed library names compiled as dylibs.

In fact, thinking about it some more, it's much more serious than at first glance, because the build system cannot actually rename a .dylib after it's built, without breaking the install path. So without a way to instruct rustc to append a suffix directly (instead of relying on the build system to do it), this makes it very impractical to build dylibs that have name collision issues (e.g. my liblua.dylib). It also means that a build system cannot re-add the version (which dylib names normally contain) even if there is no name conflict otherwise.

This isn't just an issue for e.g. rust-lua, it should be an issue for rust as well. Rust currently installs its libraries as dylibs in $prefix. I believe this needs to be changed to not install into $prefix regardless (see rust-lang/rust#13733), but until that happens, having rust install something called libregex.dylib or libterm.dylib into $prefix runs a very high risk of naming collisions. I believe the intention was that Rust would re-suffix its own libraries, but as I've already pointed out, renaming dylibs is rather problematic (at least, without using platform-specific mechanisms to fix up the install name, which may or may not require linker flags to provide the extra space for the renaming).

@alexcrichton alexcrichton deleted the crate-versioning branch June 25, 2014 00:56
@alexcrichton
Copy link
Member Author

In fact, thinking about it some more, it's much more serious than at first glance, because the build system cannot actually rename a .dylib after it's built, without breaking the install path

You have a window of time after a library has been built before it's been linked to anything else to rename it, which is precisely what the build system will do. Yes, once you have linked against the dynamic library you cannot rename it.

@lilyball
Copy link
Contributor

@alexcrichton I don't know how it works on Linux, but on OS X, a dylib contains its own name as part of its install name, which is the LC_ID_DYLIB load command that's embedded in the Mach-O file. Every library or executable that links against it uses that embedded path as the path to record in its own LC_LOAD_DYLIB load command, to instruct dyld where to find the dylib.

Looking at librustc, it already explicitly passes the -install_name flag to ld to tell it to use @rpath/libname as the install name. This flag is how you set the install name to something other than the default based on the output filename.

If my build system is going to rename the dylib after it's produced, then I need to be able to pass custom linker flags. Specifically, I either need to provide my own -install_name, which would presumably conflict with the one librustc already provides, or I would need to pass -headerpad_max_install_names or -headerpad size in order to provide the padding for me to later run install_name_tool on the dylib.

Either way, this is awkward, platform-specific, and it's not reasonable to expect people to know how to do this, especially if they're not even Mac users.

This is why rustc needs a built-in way to ask it to suffix the library filename. It's not appropriate to rely on the library author to figure out how to set up their build system to make this work, especially if they don't even have a Mac to test the results on. And that's assuming there aren't platform-specific issues on other platforms too.

@alexcrichton
Copy link
Member Author

I imagine that we could make a special exception for -o when the crate types are just libraries. In this case we could just ignore the extension and tack on the relevant dylib extension.

@o11c
Copy link
Contributor

o11c commented Jul 6, 2014

Ugh, this breaks linking to C libraries of the same name.

  • C library is called libtermkey.so
  • Rust tries to compile the "termkey" crate"
  • Linker starts writing libtermkey.so
  • Linker sees that it needs to link in libtermkey.so
  • Linker finds the empty file it just created, dies with "file format not recognized".

@alexcrichton
Copy link
Member Author

That is the purpose of the -C extra-filename flag, it's used to disambiguate among a rust library and an external library. Otherwise I would recommend changing the name of the crate or ensuring that the files are not in the same directory.

@o11c
Copy link
Contributor

o11c commented Jul 6, 2014

@alexcrichton It would make life much easier for everyone to just output librust-foo.so or something automatically.

Just because cargo exists now isn't an excuse to make common workflows complicated.

withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
@Centril Centril added A-attributes Proposals relating to attributes A-flags Proposals relating to rustc flags or flags for other tools. labels Nov 23, 2018
yejunjin added a commit to yejunjin/reference that referenced this pull request Nov 30, 2022
The `crateid` term or attribute is replaced with `crate_name` in [chapter crates and source files](https://doc.rust-lang.org/reference/crates-and-source-files.html#the-crate_name-attribute). It was original proposed in [RFC0109] (https://rust-lang.github.io/rfcs/0109-remove-crate-id.html) and merged in [PR109](rust-lang/rfcs#109)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-attributes Proposals relating to attributes A-flags Proposals relating to rustc flags or flags for other tools.
Projects
None yet
Development

Successfully merging this pull request may close these issues.