-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting code generators with source maps and multiple source directories #1573
Conversation
One thing to note, I am quite torn between source maps and a |
Source maps might also benefit from existing tooling, though I don't have any ready examples of that. =) |
} | ||
``` | ||
|
||
It also requires an unforntunate amount of macros to link in the generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What an unforntunate typo. Thanks!
Now that stable procedural macros are on the horizon, do we really care about these |
A fair question. One of the reasons that we've been contemplating improving support for |
Sure, I was a bit unsure reading exactly how easy it would be to implement and maintain (say the generated code contains procedural macros! Which spans go in the token tree?), but if so I have no problems. |
@Ericson2314: That's a little of what I touch upon in this comment. It wouldn't be that hard for a code generator to expand: ...
let x = (1, two!(), 3);
... Into: let x = (1,
#line "macro.rs" 5 10
2
#line "original_file.rs" 1 26
, 3); Which the Rust parser would interpret and switch what it thinks is the current span. It would take a lot more effort to create a source map, but we might be able to help out with libraries to do this. I also expect we'll always have some people doing code generation, and not want to get into the game where they want to implement procedural macros. Whether or not this RFC is helpful for those people has yet to be seen :) |
cc @jimmycuadra |
This would be a great quality of life improvement for folks using code generation on stable in place of compiler plugins. I am concerned about making these changes in light of the libmacro RFC. That is, if the changes here don't have value beyond temporary pain relief for code generation without stable compiler plugins, I don't think it's the right choice. I'd rather deal with the current situation until libmacro lands and is stabilized, even if that takes a long time. It's also possible that only one of the two of Erick's proposed changes could be done. Support for multiple paths for discovering modules on the file system is a bigger pain point for me personally than source mapping, and would probably be much easier to implement and maintain. |
One thing I just realized is that this may not support code generation in the crate root. For example if a There's a few ways I can think of around this perhaps:
I guess none of those are that great :( |
check first in the current directory, then it will iterate through each search | ||
path until the file is found. | ||
|
||
Cargo would then be updated to add the `$OUT_DIR` first in the search path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we'll probably want to namespace this by default, so perhaps cargo can pass -I $OUT_DIR/src
by default? That way we won't pollute the output directory if there are multiple trees of output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, on second thought I think Cargo's gonna need to be a bit smarter here or we need to expand this section a bit more. So let's say there's two project with source trees like:
// a
Cargo.toml
lib.rs
foo.rs
// b
Cargo.toml
src/lib.rs
src/foo.rs
The "magic build script" for serde will likely mirror the output structure to match the input structure. This means that the build script will generate (assuming no intermediate directory like I mentioned in my comment above this one)
// a
$OUT_DIR/foo.rs
// b
$OUT_DIR/src/foo.rs
I'd hope that both cases would work, but it's not clear to me how we'd implement this to make them both work. Can we expand on exactly what arguments Cargo is passing in these two situations, along with exactly what the compiler is doing in terms of lookup paths?
Specifically, this section is a little vague to me:
When Rust needs to look for some file, it will check first in the current directory, then it will iterate through each search path until the file is found.
This to me seems like it can be interpreted as:
- I passed
src/lib.rs
to the compiler - The compiler saw
mod foo
- The compiler deduced it needs to look for
src/foo.rs
- All lookup paths are queried for
src/foo.rs
, where the final lookup path is the current directory
This, however, is incompatible with passing absolute paths to the compiler (which Cargo does frequently). as lookup paths will be queried for an absolute path which fails.
I am a bit worried about the impact stale generated files may have.
A possible fix would be to have cargo empty |
@plietar that's a good point! Cargo currently tries to not remove much from One possible route to solve this would be to have build scripts print out include paths to pass to the compiler (e.g. Cargo doesn't pass one by default). That way on the second run in your example nothing would get printed so you wouldn't pick up the stale copy. |
I wonder if the answer to both stale build files and Alex's comment isn't some kind of more precise listing, where we tell rustc "here is an input file and a module path". i.e., |
The other thing to consider is that I imagine different plugins may want to be chained -- in which case, they're probably want to operate over the contents of the build directory that were produced by the previous plugin? |
Yes, going through multiple passes of code generation is something to account for. We do two passes in Rusoto, first our own code generation, then Serde, and right now we have to come up with a file naming scheme to manually disambiguate the input and output files at each pass: https://github.com/rusoto/rusoto/blob/ca980558d85e95beeacf933818a1a6690d973cde/codegen/src/lib.rs#L39 |
@nikomatsakis hm yeah that's not a bad thought. Perhaps not the most ergonomic, but in terms of code generation may work well. We may have to worry about the ease of composing build script libraries together, however, to ensure that only one key is printed per module and it's only printed at the end (without undue input on the user's part). I definitely agree that we always need to support multiple passes for code generation, etc. I don't think it's a concern of this RFC, however, as all of that logic is housed in build scripts. For example you can register multiple plugins with syntex or you can just read files from the output directory if they were previously written there. That is, nothing about the output directory is implicit, and if you're calling a library in a build script it just needs to be appropriately configured to handle these cases. Now we do need to worry about the ergonomics of this, however, to ensure that we can easily call two code generators in a build script without requiring a huge amount of configuration or requiring them to know about each other perhaps. |
Why can't the build script itself detect and report errors in the input file by writing to stderr? That RFC looks very specific and seems tailored for serde. In my opinion the fact that a build script can generate invalid Rust code is per se a bug in the build script and shouldn't get support by the compiler. |
@tomaka while it is easy for a build script to check syntax, it may not want/be able to perform type/borrow checking. This applies to serde, but also for example to a templating system which allows arbitrary expression to be used. Imagine I type On a side note, the two features of this RFC are orthogonal. Wouldn't it make sense to split it in two ? I imagine source maps will be longer to specify and settle on a format, while multiple directories could be agreed on and implemented sooner. |
If Indicating a location within a template is helpful only when the code generated by the build script is very similar to the original source code of the template, which I don't think is often the case except for serde or libraries that derive traits a-la serde. |
On Fri, Apr 15, 2016 at 11:58:31AM -0700, Alex Crichton wrote:
I agree and disagree. I think we need to establish conventions that plugins can use, that's all. |
On Sat, Apr 16, 2016 at 12:06:07AM -0700, tomaka wrote:
Because the errors are not known until we compile the project.
Why do you say that? It seems equally applicable to LALRPOP, for |
the ability to physically output tokens in the same line and column so that | ||
each token wouldn't need to be annotated with the correct positioning. | ||
|
||
Another option would be to just adopt the [DWARF](...) debuginfo format, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DWARF link is broken, or at least for me reading the RFC online on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, just pushed up a fix for this, thanks.
I think that this may be missing out on part of what serde is doing, however. Essentially serde is expanding #[derive(Deserialize, Serialize)]
pub struct Foo { a: u32 }
fn foo_with_error(a: u32) -> i32 {
a
} The error here has nothing to do with serde, but if serde generates a file which includes |
@alexcrichton Yes, my point is that in my opinion no crate other than serde would benefit from this feature. In addition to this I think everybody will agree to say that the fact that serde uses build scripts is a hack, and that ideally serde should use a syntax extension/procedural macro. Therefore I don't think it's worth bloating the compiler for one precise use case that is going to be obsolete in a year when procedural macros are here. |
There exist crates other than serde that use syntex, @tomaka. |
I agree with tomaka here. As I said in my earlier comment, while the attempt to ease the pain of code generation without stable compiler plugins is a good goal, I don't think it should be done in a way that wouldn't be useful on its own even when we have stable procedural macros. |
I don't think that plugins will ever fully supplant simple code generation, which has a lower barrier to entry and is often a better match when starting from non-Rust sources. The features proposed in this RFC aim to make the code generation experience in Rust more first class. They are in no way tied to serde or even syntex -- e.g., source maps are an extremely common technique found in most compilers to improve code generation ergonomics. It is often the case that the code generator cannot by itself detect potential errors, because doing so can essentially require full type checking. In such cases, source maps are invaluable. But even if you disagree with all of that, the fact is that the plugin system revamp is still in its early stages, and it will take some time to reach even de facto stability, let alone becoming officially stable. In the meantime, we can take these straightforward, commonplace steps to improve the code generation experience in Rust in the short term -- and we desperately need to do so, given the nontrivial portion of the ecosystem that ends up tapping into highly unstable parts of the nightly compiler because of the smoother experience it provides. If at some later point these features fall out of use (in favor of plugins), we can consider deprecation and eventual removal. |
My feeling is roughly this. If we had a stable plugin API now, we could probably forego the stuff in this RFC and instead use some nice libraries that make it easy to write a plugin that wraps a code-generator. But we don't have a stable plugin API now, and what we do have is a high demand to write certain kinds of plugins:
People who want these either use I think what this really comes down to are two questions:
I am not sure what the answer to the first question is, but it seems to me that there is still a fair amount of work to go. For example, there are currently three pending RFCs (#1560, #1561, and #1562) describing various aspects of the system. Each of these is non-trivial. Once they are accepted, I expect a reasonably large amount of time iterating over the precise interface and set of support libraries that are needed as well (admittedly, some of that may take place in nursery libraries working over top a stable core, or at least that has always been a goal). The thinking with this RFC then was that we could spec out something that is easy to implement which basically improves existing techniques. Given that it should be easy to implement now, it also will (presumably) be relatively straight-forward to support into the future. I think this is all fairly true, although the final end-product is looking a bit more complex than we originally thought. But the core idea of "add ability to direct compiler to other files outside the main tree" and "add ability to override spans" are all pretty minor bits of the compiler to update. Now it may also be that we wind up with something nice enough that it provides an appealing approach into the future, at least if you are doing code generation anyhow. If nothing else, I imagine it provides a good place to spec out and experiment with "libmacro" designs as well. |
A few days ago I was adding optional rustc-serialize and serde support to a codegen via stable codegen, and the experience was pretty awful unfortunately. I want to just write down what I encountered here to ensure that we can at least plan for or be aware of what happens. Ideally, I wanted to write this in my source: #[cfg_attr(feature = "rustc-serialize", derive(RustcEncodable, RustcDecodable))]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
struct Foo {
a: i32,
}
To me this just highlights how codegen can be a suitable replacement for plugins, but the support libraries are likely going to get pretty complicated and/or hairy. I think that's fine because there only needs to be "one support library" to do things like this, but it also indicates to me that the codegen story is pretty "not fleshed out" today and will take some time to stabilize and make more robust. |
Just like we trust the Rust team to keep the implementation std's interface stable even as its unstable implementation constantly changes, perhaps there should be a way for stable users to trust/white-list wildly popular libraries to use unstable features under the hood? |
@Ericson2314 The problem here isn't whether plugins are officially marked stable. The problem is that they are unstable in practice, meaning that there is frequent (though small) breakage. Consequently, crates wind up needing to agree on a version of the nightly compiler to work with (often manifested as which |
@aturon Ah, I was unaware of that. Thanks for the clarification. |
A case where plugins are not at all a solution is code generators that support multiple target languages, one of which happens to be Rust, and are generally not written in Rust. Examples include Ragel and the Protocol Buffers compiler (both written in C++). |
A syntax extension can exec a process. |
@erickt I believe there've been a number of offline discussions about this between various parties, would you be willing to summarize some of the conclusions reached here? |
I am currently toying around with a tool which spits out generated rust code as part of build.rs. It however does not use syntex or something similar. As a result it's quite tricky for that tool to expose sourcemaps but it would be trivial to emit something like I feel like the runtime debug information would be independent of this and actually something Rust could already do. Rust already generates DWARF files so those would automatically point back to the source location I assume since the compiler would override the location it knows about with the information from the |
@alexcrichton @aturon @erickt summary of discussion from London (as best I can remember, and apologies for the delay, I missed the ping). First of all, it is worth pointing to #1681, which I think everyone involved agrees is a better solution for a short-term fix for procedural macros. This RFC has a major advantage over #1681 (or even more long term proposals for procedural macros) is that it supports pure code generation, which is sometimes useful (in particular as a more user-friendly solution to build.rs Cargo build scripts). However, I don't think that code generation is seen as anywhere near as high priority a problem as getting users of Serde and other high use/low complexity procedural macros on to stable rather than nightly. A minor advantage of this RFC over #1681 is that it would support better span information for the generated code. I suspect this could be mitigated to some extent, but as currently stated, the #1681 essentially has no span info in the output. OTOH, that comes at the expense of implementation complexity in this RFC. #1681 has the advantage that it is closer in mechanism and interface to both existing and proposed future procedural macros, has lower implementation cost, and a better story for evolving users to the 'glorious future'. Both proposals are essentially string-based, and so have zero hygiene support. The limitation of #1681 to derive means that has a smaller negative impact than here. We had previously discussed a fast-track version of token-based procedural macros, based on the future macro plans, but relying more heavily on external libraries. @erickt prototyped a version of this approach that used the current syntax extension interfaces. To me at least, #1681 is a natural evolution of that idea (and superior). |
I certainly agree with what I took to be the thrust of your comments (that #1681 is a generally better way forward); but I'm not sure that this RFC really has any advantages over the approach in #1681, or at least they are quite small:
I'm not sure this is true -- I mean, if you have #1681 (or the longer-term plans), it seems straight-forward to wrap a traditional "pure" code generator and spit the output back into rustc. This has led me (and, I think, @alexcrichton) to the general opinion that, if time were no object, we'd rather just have procedural macros than to have more mature support for systems that generate files on disk. Moreover, if you want to have the pre-processing done seamlessly, this approach basically forces serde (or other systems) to examine all the inputs so as to scrour for
It's certainly true that #1681 doesn't include support for spans. But it seems pretty easy to add if we wanted. Moreover, because in #1681 things are only pre-processed when they need to be, this means the spans of all the other content is left unperturbed -- it's only the derive impls that have suboptimal spans, right? (In contrast, in this proposal, since we are reproducing all the content around the plugin, sourcemap support is absolutely essential.) Basically instead of getting the span of the relevant field you get the |
Well "major advantage" might have been a bit strong. I think one advantage of this RFC is that the input does not need to lex according to the Rust rules and therefore you gain a small amount of flexibility. It would allow literate programming, for example. But I agree with your sentiment, that on balance macros are a better way to support code gen. I think that you are wrong that #1681 as stated preserves any span info, since we convert all tokens to a string then back to tokens we must lose all span info, even if the macro does nothing. The only times that we keep that info is if the macro literally does nothing - including not adding any tokens. We could of course add some facility to do that in the future, but it isn't (and I don't think should be) part of the RFC. That would probably look a little bit like the source maps here, though hopefully a bit simpler. Again, I agree with your conclusion that this is a smaller issue overall. |
Note the following text from the RFC:
I think that's what @nikomatsakis was getting at: if you get an error in the output of a custom derive mode in #1681, the error will at least point back to the use of derive. That's not as good as pointing directly at a field, perhaps, but it seems good enough. |
Note though that a plugin could take a path to a file to use as input, and
|
re spans to be concrete, take this example:
If there is another definition of
Good point. |
We didn't discuss this in the meeting, but given that we decided to move to accept #1681, I'm going to move to close this RFC. @rust-lang/lang or @rust-lang/tools, feel free to object if you disagree. :) |
Hear ye, hear ye! This RFC is now entering final comment period regarding a motion to close in favor of #1681. |
RFC #1681, and macros 2.0 for that matter, are of no use for generating code for some other language (bindgen or protobuf for example). Also, as I've mentioned before, this RFCs mixes two features. Multiple source directories is probably simple to implement, has fairly clear semantics, and has a small interface, while still actually enabling nice usage of code generators. Code maps on the other hand are a lot more complicated, but don't add anything other than nicer error messages (this is obviously important, but less of a priority I think). Would it not make sense to split this RFC in two, and move forward faster on the multiple source directories ? |
Code maps could be implemented outside of the language in macros 2.0, FWIW. |
The @rust-lang/lang discussed this RFC and has decided to close this RFC, as previously proposed. We prefer the approach described in #1681. |
Rendered
This RFC proposes two changes to the Rust compiler and Cargo in order to better support code generators:
rustc
for multiple source directories, and update Cargo to automatically add it's$OUT_DIR
directory to this directory.cc @alexcrichton, @aturon, @nikomatsakis