-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Remove #[crate_id] #109
Conversation
#![crate(name = "json", type = "dylib", type = "rlib", version = "1.0.2-pre)] | ||
``` | ||
|
||
Breaking down this attribute: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thanks!
to satisfy all its use cases with. | ||
|
||
* Does allowing keywords in attributes set an unusual precedent for other | ||
portions of the language? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of me says yes, but part of me says that attributes are metadata and shouldn't care about keywords. At this point I'm leaning towards the latter, since there's no reason why keywords will ever be special in attributes.
If we do decide that this is an issue, we could work around it with two steps:
- Change
type
tokind
. That's not a keyword, so it should be ok. - Turn
crate
into Rust's first contextual keyword. We've talked about having contextual keywords in the past (e.g. within
, and whencrate
was first introduced), but there hasn't been a compelling reason to add the first one. Butcrate
is only significant when it immediately follows the keywordextern
, and is an error to use anywhere else, so it's a good candidate to be a contextual keyword.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it does not. In fact, attributes are the outliers here! Other syntax extensions can and do accept keywords as regular identifiers. A great example is local_data_key!(pub ...)
, but others use mod
, struct
, etc. The language that syntax extensions accept is limited only to "balanced delimiters", and I don't think restricting it further is a good idea.
A few problems with this scheme that were addressed in the previous crate_id scheme:
|
the crate being compiled. This will override the compiler's inference | ||
of the crate name based on the file name being compiled. This is also | ||
later used to match with when upstream crates link to this one. | ||
* `type` - This will supersede the `#[crate_type]` attribute. The `crate` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, since this has a restricted and fixed set of possible values, it feels like using idents might be better, e.g.
#![crate(type(dylib, rlib))]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would tend to lean towards consistency among the inner attributes, but this is an interesting idea!
A constant compiler invocation will always print the same hash, and the compiler can always print the hash out very quickly. What is the use case for relying on something other than the compiler to emit the hash? I would think that you can always shell out to the compiler itself. The compiler only needs to parse the crate attributes (almost always successful) to print the hash.
That's intentional, however. Cargo needs a way to pass arbitrary metadata in the manifest that's not present in the crate file.
I'm not sure I follow,
Can you elaborate on this? Without cargo, you have exactly what we have today, except that crates are only matched by name, not by name/version.
Yes, this is an explicit regression that I pointed out in the RFC. This seems like a rare enough use case that the benefits gained by what cargo can do outweigh it, however. |
I don't personally rely on it, but build tools are weird. What are we gaining by making this harder?
What metadata does it need to pass exactly that isn't already in a crate_id?
Your examples don't have
Today we know the hash exactly since I can write
It doesn't seem that rare to me. Every distro has glib, glib2, ruby1.9.1 and ruby, etc. For a long time git was git-core. Look at the Clojars repos to see how many forks of popular packages exist simultaneously. What is ths other metadata that we need that we don't already have? I was under the impression the metadata Cargo needed was exactly the same, but that we didn't want to duplicate it in the Cargo manifest and in the Rust source. Why do we need to extend the crate hash to arbitrary data in addition to avoiding duplication? |
Ok, I'm going to try to take a step back and explain why I believe that crate ids just aren't sufficient today, and why cargo doesn't want to use them. In a cargo-driven world, applications and libraries will have a manifest file. In this manifest the name of the crate will be listed, the version, and other metadata like the author. Additionally, external dependencies, as well as their version requirements, will be listed in this file. Note that it is very important that this be a separate file because cargo needs to be able to quickly download the manifests for remote packages in order to solve the dependency graph for a particular build. So, in that world, you'll see a few similarities to what we have today. The
There are more examples that I'm sure @wycats and @carllerche could weigh in with. So, with this information in mind, here are some thoughts about the crate id system that we have today:
The primary goal of this RFC is to enable these very legitimate use cases of cargo while retaining the usefulness of rustc itself. Note that it's tough for me to debate about nitpickity points about the crate id syntax because, for the reasons outlined above, it is simply insufficient for cargo. Sorry if this wasn't clear in the original RFC, I can amend some wording here and there. But with all this in mind, I'll try to address your questions now
Can you explain how you think we're making this harder? Today it's
Ah, another primary reason for removing crate ids is to remove duplication. The manifest file for cargo must know the name/version of a library, and mandating that it be written down in two places is something that should be avoided. Yes, all the information is in the crate id. No, I do not want to duplicate it across the manifest and the file name.
As I noted above, arbitrary values are allowed in the
Yes, this is what I explicitly called out as a drawback of this design. This is a consequence of designing for minimal duplication between cargo and the compiler. I do not believe that this is an important enough use case (as most projects will simply just use cargo). Manually driving
I would expect this case to be rare because cargo will already have to handle this case, and it will be the predominate use case for having all these dependencies all over the place.
Yes, that is a primary concern of this design.
We want to allow for all the same use cases today to continue to work tomorrow. Allow arbitrary data is a tool to reach that end. I'll say this again, but I expect cargo users to rarely use the |
I think this point is really important. |
|
||
## Naming library filenames | ||
|
||
Currently, rustc crates filenames for library following this pattern: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be 'creates' and not 'crates'? Too much crate
😉
…ichton With this change, rustc creates a unique type identifier for types in debuginfo. These type identifiers are used by LLVM to correctly handle link-time-optimization scenarios but also help rustc with dealing with inlining from other crates. For more information, see the documentation block at the top of librustc/middle/trans/debuginfo.rs and also [my blog post about the topic](http://michaelwoerister.github.io/2014/06/05/rust-debuginfo-and-unique-type-identifiers.html). This should fix the LTO issues that have been popping up lately. The changes to the debuginfo module's inner workings are also improved by this. Metadata uniquing of pointer types is not handled explicitly instead of relying on LLVM doing the right thing behind the scenes, and region parameters on types should not lead to metadata duplication anymore. There are two things that I'd like to get some feedback on: 1. IDs for named items consist of two parts: The [Strict Version Hash](https://github.com/mozilla/rust/blob/0.10/src/librustc/back/svh.rs#L11) of their defining crate and the AST node id of their definition within that crate. My question is: Is the SVH a good choice for identifying the crate? Is it even going to stay? The [crate-id RFC](rust-lang/rfcs#109) got me confused. 2. Unique Type Identifiers can be arbitrary strings and right now the format is rather verbose. For debugging this is nice, because one can infer a lot about a type from the type id alone (it's more or less a signature). For deeply nested generics, id strings could get rather long though. One option to limit the id size would be to use some hashcode instead of the full id (anything that avoids collision as much as possible). Another option would be to use a more compact representation, like ty_encode. This reduces size but also readability. Since these ID's only show up in LLVM IR, I'm inclined to just leave in the verbose format for now, and only act if sizes of rlibs become a problem.
Do crates need to know what kind of library they're compiled into? Can we please remove
|
@alexcrichton Cargo needs the ability to provide additional metadata that can be used in symbol mangling. Unless I'm misreading, this proposal would result in any two packages with the same name (regardless of version or source) producing collidable symbols. We need the ability to allow both versions and a namespace to drive the mangled symbols. |
Oops, sorry! I accidentally deleted this as part of the revisions. I've updated to explain that |
@alexcrichton sweet |
and the `<hash>` was removed to make the output filename predictable. | ||
|
||
The three original goals can still be satisfied with this simplified naming | ||
scheme. As explained in th enext section, the compiler's "glob pattern" when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "th enext" -> "the next"
|
||
```rust | ||
#![crate_name = "foo"] | ||
#![crate_type = "lib"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps these could be unified, as
#![crate(name = "foo", type = "lib")]
Similar to how you had it before, but only explicitly-defined keys are allowed, instead of arbitrary keys, and the two allowed keys are "name" and "type".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess doing this still requires allowing keywords as attributes, but I think that's fine anyway. There's no reason for Rust's normal keyword set to matter inside of an attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to take a more conservative approach, and multiple type
keys seemed odd enough to avoid this for now. I don't think we have many other attributes that accept the same key more than once (and it has semantic meaning).
Using a glob of I think it needs to search for Although given that now multiple matching library names are considered to conflict, and any suffix is allowed, that suggests we need to explicitly disallow the Requiring identifier rules like that also suggests that perhaps we should drop the string literal in the attribute too, so it's |
Another concern I have with dropping the hash is that dylib Rust crates may now conflict with third-party non-Rust libraries. For example, my rust-lua project currently has a library name of The ability to have an un-suffixed library name seems useful for Rust libraries that vend |
Also, on the topic of wrapping C libraries, I'm concerned that considering multiple name matches to be inherently a conflict will also be problematic here. If I build and install my rust-lua as However, since To that end, I would suggest that all matching dylibs be tested to see if they are Rust crates, even though the crate metadata would not actually be compared to anything to see if it matches (just merely checked to see if it exists). rlibs are by definition Rust crates so they don't need to be tested in this manner. |
This is not true, the metadata of the library would contain the exact name. The glob is just to filter out names so we don't have to read everything in the world. |
My interpretation of the current RFC is that the "library name" is the name of the library on-disk, not some name embedded in the metadata. Specifically in the following line:
If the "name" here in fact refers to the |
I am a bit concerned that this was merged without addressing the issue I raised about unsuffixed library names compiled as dylibs. In fact, thinking about it some more, it's much more serious than at first glance, because the build system cannot actually rename a .dylib after it's built, without breaking the install path. So without a way to instruct rustc to append a suffix directly (instead of relying on the build system to do it), this makes it very impractical to build dylibs that have name collision issues (e.g. my This isn't just an issue for e.g. rust-lua, it should be an issue for rust as well. Rust currently installs its libraries as dylibs in |
You have a window of time after a library has been built before it's been linked to anything else to rename it, which is precisely what the build system will do. Yes, once you have linked against the dynamic library you cannot rename it. |
@alexcrichton I don't know how it works on Linux, but on OS X, a dylib contains its own name as part of its install name, which is the Looking at If my build system is going to rename the dylib after it's produced, then I need to be able to pass custom linker flags. Specifically, I either need to provide my own Either way, this is awkward, platform-specific, and it's not reasonable to expect people to know how to do this, especially if they're not even Mac users. This is why rustc needs a built-in way to ask it to suffix the library filename. It's not appropriate to rely on the library author to figure out how to set up their build system to make this work, especially if they don't even have a Mac to test the results on. And that's assuming there aren't platform-specific issues on other platforms too. |
I imagine that we could make a special exception for |
Ugh, this breaks linking to C libraries of the same name.
|
That is the purpose of the |
@alexcrichton It would make life much easier for everyone to just output Just because cargo exists now isn't an excuse to make common workflows complicated. |
The `crateid` term or attribute is replaced with `crate_name` in [chapter crates and source files](https://doc.rust-lang.org/reference/crates-and-source-files.html#the-crate_name-attribute). It was original proposed in [RFC0109] (https://rust-lang.github.io/rfcs/0109-remove-crate-id.html) and merged in [PR109](rust-lang/rfcs#109)
No description provided.