-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: A new codegen architecture. #186
Comments
This is just an idea I'd like to explore, please let me know what you think. |
I think this sounds like an great idea. Looking forward to a more readable code layout.
No other crates are currently depending on any of the cornucopia crates according to crates.io. So I wonder how much of a drawback this really is. I don't see what the use-case would be for depending on it in a library. |
I see all the benefits of using this architecture, but we must consider that this is a significant architectural change for the Rust ecosystem. We will be, to my knowledge, the first code-generation tool to do this. I also wonder how we could let the user choose the dependencies version if we generate the |
I think perseus also does this (or used to do, not sure) |
Perseus does use a similar pattern, but the context is a bit different admittedly.
Because of macros, external code generation is uncommon in Rust, both as a single file or crate. Practically speaking, declaring the generated module with With that in mind, I think we should choose the most technically advantageous and user-friendly method, instead of basing this decision solely on familiarity. (This is my opinion, I'd like to hear some more thoughts on this point).
With our current setup, we give the user the impression that they can choose any version of the dependencies, but this is a big compatibility hazard. Our generated code is only guaranteed to work with the dependency versions we use in the implementation/integration tests. Using another version could potentially cause cryptic errors. Since these errors would occur in the generated code, it would also be very hard to debug or even understand for our users. If we control the versions ourselves, we can at least ensure that the generated code is error-free, and the user can look up the manifest if they want to know which versions of the dependencies are compatible with their particular version of cornucopia. SQLx asked their (huge) user base what would be the best update strategy regarding dependencies, and the overwhelming consensus was to make breaking updates as often as required to offer the latest dependency updates. launchbadge/sqlx#1796. Diesel also seems to follow this "newest dependencies only" pattern for their public dependencies, from what I understand. With this in mind, and knowing the major updates can bring critical security updates (in addition to performance and features), I don't think there is a lot of demand for support of old versions of dependencies anyway. |
I didn't think it was so common for ORM libraries!
From my point of view, it is not obvious that a crate is easier to use than a simple file. But our dependency tree is so complex that I can see how it could be more ergonomic. If we are going to go in that direction, we should go all the way, as you suggested, and generate the client crates as well, in order to maximise the benefits. I have a project where a minor update to the postgres library changed the minimum required version of rustc, and I had to tie to a specific version. Maybe we could add an advanced flag to use workspace dependencies for some of these. We should certainly ask our users for their views via a pool. Do you know if Github supports this? |
I am starting to think about implementation details: We could use the CLI version as the generated crate version, this would be useful for debugging user issues. Currently, codegen returns a string and this will not work for a crate. We can return a filename and content string map or a temp directory. Using a real directory would reduce memory usage and ensure that we never leave the generated crate in an inconsistent state if codegen fails. To verify detect code changes in the integration test, we would hash the contents of the directory for comparison, which should be more efficient. |
I discussed compatibility hazards a couple times with the Rust Discord community, and the answer is more or less always the same. Here's a relevant quote from the latest such discussion: "Basically I'd be looking for ways to get this decision out of the hands/mind of the consumer of this code generator, can't imagine why they'd want to even deal with it."
MSRV changes are considered breaking. Looking at the git history, I see that
That wouldn't work for projects that want to include the generated code as a path dependency, not as a workspace member. A solution that would always work could be a MSRV-related SemVer breakages are rare enough that I honestly don't think this is a problem for the majority of our users. As a case in point, we changed our MSRV many times under the hood, only updated it for v0.9, and nobody has reported breakages, or asked to downgrade our MSRV. Before the crate reaches 1.0, I'd like to keep things moving fast and offer our early users an up-to-date experience. This means having a relatively high MSRV and up-to-date dependency versions. Once we're preparing for our first stable release (we're still a long way!), we should put a lot of effort in improving our support for older dependencies / MSRVs. But at this current stage, I think we should focus on experimenting and improving our actual UX and features. |
Yeah we can do polls over in the discussion section, but honestly I don't think we'd get enough visibility for it to be meaningful 😅. This thread is a decent place to have this discussion IMO. I'll try to ping our regular contributors/users to get some more opinions on this. Edit: I also asked for some feedback on our Discord. Let's hope we get some more thoughts on this; the more we get, the better informed of a decision we can make. @skreborn What do you think of this? (info in the original post). |
Agreed on all points. |
For a project I'm working on at the moment we've already started to use cornucopia in it's own crate. Our structure looks something like the following
The lib.rs looks like the following use std::str::FromStr;
pub use deadpool_postgres::{Pool, Transaction, PoolError};
pub use tokio_postgres::Error as TokioPostgresError;
pub use cornucopia_async::Params;
pub use queries::tracking_data::TrackingData;
pub use queries::conjunctions::Conjunction;
pub use queries::orbit_data::OrbitData;
pub use queries::users::User;
pub use queries::organisations::Organisation;
pub use types::public::DataSharingPolicy;
pub fn create_pool(database_url: &str) -> deadpool_postgres::Pool {
let config = tokio_postgres::Config::from_str(database_url).unwrap();
let manager = if database_url.contains("sslmode=require") {
let mut root_store = rustls::RootCertStore::empty();
root_store.add_server_trust_anchors(webpki_roots::TLS_SERVER_ROOTS.0.iter().map(
|ta| {
rustls::OwnedTrustAnchor::from_subject_spki_name_constraints(
ta.subject,
ta.spki,
ta.name_constraints,
)
},
));
let tls_config = rustls::ClientConfig::builder()
.with_safe_defaults()
.with_root_certificates(root_store)
.with_no_client_auth();
let tls = tokio_postgres_rustls::MakeRustlsConnect::new(tls_config);
deadpool_postgres::Manager::new(config, tls)
} else {
deadpool_postgres::Manager::new(config, tokio_postgres::NoTls)
};
deadpool_postgres::Pool::builder(manager).build().unwrap()
}
include!(concat!(env!("OUT_DIR"), "/cornucopia.rs")); And [package]
name = "db"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]
path = "lib.rs"
[dependencies]
tokio-postgres = { version = "0.7", features = [
"with-time-0_3",
"with-serde_json-1",
] }
cornucopia_async = "0"
deadpool-postgres = { version = "0", features = ["serde"] }
postgres-types = { version = "0", features = ["derive"] }
time = { version = "0", default-features = false, features = ["formatting"] }
webpki-roots = "0"
rustls = "0"
tokio-postgres-rustls = "0"
serde_json = "1"
futures = "0.3"
So everything to do with the DB including migrations is in one folder called db. As you can see in this code I'm compiling TLS into the exe. This is because we deploy to Docker and we always use a Scratch container. I'd like to keep that flexibility. The Do I need cornucopia to generate the I'm OK with this structure as I have complete control over the crate. Perhaps a cornucopia Hope this helps. |
Another thing I forgot is that some people are happy to generate the code into their projects and check it in. I don't do this and follow the https://github.com/hyperium/tonic and https://github.com/kaj/ructe model where generated code is not checked in as it's considered a build artifact. I think that is the |
@ianpurton Thanks for contributing your thoughts 😄 . Its really interesting to me since your setup is not one I had considered previously.
Everything here is compatible with the proposal. You can generate the cornucopia code wherever and include it in your [package]
name = "db"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]
path = "lib.rs"
[dependencies]
# You can still use whatever you want in your own crate
webpki-roots = "0"
rustls = "0"
tokio-postgres-rustls = "0"
# Cornucopia takes care of its own dependencies.
# This generated crate is tailor-made for your exact queries, no need for features and whatnot.
cornucopia_gen = { path = "..." } This is really powerful: if you add a query that uses UUIDs, when your queries are re-built, the generated crate will automatically add the As a side effect, this also removes the need for the |
Won't it be faster to compile as well? Or maybe this doesn't apply to path dependencies..? |
@LouisGariepy Sounds great. |
@LouisGariepy One more thought. If I'm generating code with So I suspect the |
@ianpurton You're right, Cargo resolves all dependencies, not just build-dependencies before running the build script, unfortunately. There have been requests to change this though rust-lang/cargo#8709, What to do if the codegen crate doesn't yet exist when using a build script.Depending on the situation (CI or not), there are three simple solutions (either could work):
I don't think this is too bad since this is a one-time situation (or can be automated in CI), but I'd like to hear your thoughts on this.
AFAIK, it should help with parallel compiling. Path dependencies are treated just as any other dependency (except it doesn't have to be downloaded from crates.io) |
Since there's been a lot of discussions and some of the pros and cons are getting scattered, I'll make a summary. The problemWhat this issue is trying to address is the generated code's complex dependencies. There's a good breakdown over in #163, but the gist is:
This is a lot for the average user. The solutionsMany solutions have been proposed, all with their pros and cons. Let's take a look Status QuoThe most simple "fix" is to accept things how they are. The disadvantages have already been discussed above. The main advantage is that "it works" and our current users are used to this. Feature-flags + reexporting dependenciesOne possible solution is to reexport all the necessary codegen dependencies (both internal and public) from our Cornucopia crate. This solves our compatibility hazards (whatever version of cornucopia they use, they get the right dependencies reexported), and is a familiar pattern that many rust projects use to make their user's life easier. This would be similar to what SQLx does. The drawbacks are mainly that it would require us to create feature flags for all our public dependencies ("serde_json", "uuid", etc.) and reexport a bunch of types. This is not too much of a problem since our public dependencies are pretty stable and we already have a We also have to be careful to make our feature flags additive. For example, what happens if someone puts the Another drawback is that this would "pollute" our @Virgiel Might also have some more drawbacks to add to this solution. Dependency-aware CLIAnother proposed solution is to make our CLI able to analyze the user's First and foremost, this is still requires a lot of manual reading / fixing by the user. We could apply the fixes automatically, but I'm really not comfortable with the idea of automatically modifying the user's manifest (see below for why). Analyzing a Also, we wouldn't able to inform the user about unused dependencies. We have to assume that if a dependency is in the manifest, the user could potentially be using it for something other than the generated code. There's no automatic way to know for sure which manifest should be analyzed, especially in a workspace crate. So, we'd have to add a
The following example illustrates this: we are in folder `foo`, our queries are in folder `queries`, the destination folder is `target`, but `crate_baz/Cargo.toml` is the manifest we want to analyze.
Its also not super compatible with So, there are significant limitations and the implementation effort would be higher than pretty much all the other solutions. Still, it makes it possible to solve some of our problems while keeping the same codegen architecture. Generate a crate instead of a moduleAKA the proposition in the original post of this thread. I won't go into too much into detail because the link above already explains it, but the gist is that the generated crate would encapsulate its own dependencies, making the dependency setup completely automated for users. Our query analysis will determine exactly what your specific queries require as dependencies, no actions or features required. This also makes our code generation future-proof, and ensures less risks of compatibility hazards. There are also some other miscellaneous advantages like better compile times, more readable generated code, and we wouldn't need to publish our client crates anymore. But these are not the main focus here, The drawbacks are that crate generation is a new kind of workflow which could be unfamiliar to users. Its probably the "most different" out of all the proposed solutions. It also somewhat complexifies some internal parts of Cornucopia see this comment. Finally, if you use a build script instead of the CLI, it requires a single manual action the first time you use Cornucopia in a new project (What to do if the codegen crate doesn't yet exist when using a build script) |
If you have a preferred solution between these, please state it clearly! This would be much appreciated. There's still time to look at other solutions too, so if you have an idea, please let it be known. Thanks all :) |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Still can't come up with any other negative than that publishing crates that depend on cornucopia won't work. Just go with the crate solution already 😁 (Just kidding, it's great that you don't just press the button and implement it!) |
Sorry if it felt like I was beating around the bush 😄 This is not a change that can be undone lightly so I want to make sure we're going in the right direction. Regardless of the technical merits, I also want to make sure that everyone is on board, out of respect for the time you all put into cornucopia. I'll leave this issue open so we can still collect some more information, but I'll start working on the implementation so we can also see how it works out in practice. Cheers! |
You are thinking things through, I like that. Good luck with the implementation! Can't wait to see the end result 😀 |
I'm doing something similar to what ian mentioned with protobuf (tonic) generated code for a grpc server. I have a separate crate in the workspace that only contains the generated stuff, and I know everything in there is scoped to supporting the generated code. It has some small downsides like you can't implement common traits (serde etc) for the generated types now because they're both in external crates, whereas you could if you generated the code into your "main" crate. I'd be curious to see how your Cargo managing workflow works :) |
I think that crate generation is just fine, and as I said, there are other examples in the Rust ecosystem where entire crates are generated. Well besides Perseus there is also https://github.com/einride/protocrate-rs and maybe others. It's only appropriate to generate a single file if you don't have dependencies IMO. If your generated code has dependencies, generating a crate is kind of unavoidable. And, the thing in common between cornucopia, protocrate and perseus is that they want to generate code that depends on outside crates. |
To get tonic to add Serialize setup your build.rs like so. fn main() {
// Compile our proto
tonic_build::configure()
// We want to be able to convert proto to json, this enables that.
.type_attribute(".", "#[derive(serde::Deserialize, serde::Serialize)]")
// Prost will automatically include it's own prosy_types for google Timestamp etc.
// But this doesn't have Serialize and Deseriealize so we switch it off.
// Maybe we could use https://github.com/fdeantoni/prost-wkt
.compile_well_known_types(true)
.compile(&["api.proto"], &["./"])
.unwrap();
} |
I would love something similar to this For example, derive specta's |
Hi Louis, this is an awesome project. And I love this idea you have around changing codegen architecture...it would definitely make the user experience nicer because we wouldn't have to manually add dependencies based on the types. Also - having everything cornucopia-related in its own crate provides clearer separation of concerns. +1 from me on this! 😀 |
Hi! I have not used cornucopia yet. I was reading through the issues as part of my evaluation before I try adding it into my stack. Today I'm using deadpool-postgres with my SQL embedded directly in my code 😱 Cornucopia's model scratches several itches and lines up with my current stack in a fantastic way - thank you! The current situation with dependencies gave me pause but not enough to turn me off. For context, I imagine that for my first run at this I'll be using the CLI to generate code against a live database I maintain separately and checking in the generated code. This seems like the easiest way to work with my build server. I will be adopting cornucopia slowly instead of converting my whole code base over in one fell swoop. Generating the code as its own crate instead of a module strikes me as the most helpful option to let someone in my position adopt cornucopia into an existing system. |
I agree with what has been said so far, the separation of concerns seems like a nice thing and I like to structure my applications in this way anyways. It might make it slightly harder to use in a simple application since it requires a workspace, but I think generating a crate is the right™ way to go about this. At least it is what I would personally prefer. |
I definitely want to keep at least the option to have my build script update my crate source rather than create a whole new crate, and if I have to fix the missing dependency errors I'd have to deal with anyway if I was writing that DB code on my own, so be it. |
@kennethuil understood! I'd be grateful if you could share some more info on this
Specifically, how is using a separate crate a disruption to your workflow? Thank you 😄 |
On further thought, having an extra crate would only require a small adjustment in my Dockerfile and no other adjustment in my CI/CD pipeline. And I would expect rust-analyzer to just work. Although it would complicate any attempt to customize the Cargo.toml for the Cornucopia-generated code. |
👋🏻 Hey folks! I've been following your project for a while and this issue was one of the things that pushed me to pull the trigger on If there's a chance we could collaborate and shape the solution in such a way that fits your needs, more than happy to do so. |
@LukeMathWalker I like this. As I'm a downstream user of cornucopia I like that I can choose to version control the generated crate or just generate it as part of my build and add it to .gitignore. I like the idea using meta data to call up cornucopia. [package.metadata.px.generate]
# The generator is a binary in the current workspace.
# It's the only generator type we support at the moment.
generator_type = "cargo_workspace_binary"
# The name of the binary.
generator_name = "cornucopia"
# The arguments to be passed to the binary.
# It can be omitted if there are no arguments.
generator_args = ["...", "...", "..."] Good look with pavex, I'm watching it to see where it goes. |
In the specific case of [package.metadata.px.generate]
generator_type = "registry_binary"
generator_name = "cornucopia"
generator_version = "0.2.3"
generator_args = ["...", "...", "..."] and (I should probably have a |
@LukeMathWalker I'm following your work on The new codegen architecture is far from ready but How does
That would be fantastic ! |
The current design is very simple—the generator is always invoked, and it's the generator's job to determine if there is any work to do (e.g. if the hash of a file has changed). We can definitely iterate here. |
This is actually very sensible, as it is the only way to make the generator work on its own if desired. |
I am working on a similar feature and this might be of your interest: rust-lang/cargo/issues/14065 I will start a PR for it. |
It was brought to my attention over a discussion on Discord that we could improve our current architecture by generating a crate instead of a single file for our codegen.
The main benefit of this is that it would allow us to automatically generate a
Cargo.toml
file customized to support all the necessary dependencies and features required by the user's queries, without polluting their own manifest. This would hide all the complexity of the current setup while ensuring automatically that compatible versions and features are used. This is all done without tampering with the user's ownCargo.toml
so there are less risks of unwanted side effects. This would be a neat solution to #163. We also wouldn't have to read and update the toml files, we could simply generate the whole manifest, which is much simpler.Another benefit is that it would allow us more freedom to organize the generated code in a readable/understandable way instead of having to cram everything into a single file (e.g. db types, sync/async #176 ). Taking this further, we could even generate the client crates directly into this generated crate which would save us from having to publish the 3 client crates.
As a side-effect benefit, cargo can parallelize work over crates, so if you have a lot of queries/generated code, this will improve your compile times.
So, in summary,
Drawback
Cargo won't publish crates with path dependencies (yet, and the maintainers seem to think its too large of a change to ever be merged), so the user either has to
This is true for any internal crate so I don't think its such a big drawback, but some users might find that annoying nonetheless.
We could keep the current workflow under a "manual configuration" option (where the user must manually declare the generated modules and manage the correct dependencies) for cases where this is absolutely unwanted.
The text was updated successfully, but these errors were encountered: