Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: A new codegen architecture. #186

Open
LouisGariepy opened this issue Dec 1, 2022 · 41 comments
Open

RFC: A new codegen architecture. #186

LouisGariepy opened this issue Dec 1, 2022 · 41 comments
Labels
breaking Introduces breaking changes. codegen An issue regarding code generation, or the generated code itself. dependencies Dependency versions, features and changes. feature New feature or request

Comments

@LouisGariepy
Copy link
Member

It was brought to my attention over a discussion on Discord that we could improve our current architecture by generating a crate instead of a single file for our codegen.

The main benefit of this is that it would allow us to automatically generate a Cargo.toml file customized to support all the necessary dependencies and features required by the user's queries, without polluting their own manifest. This would hide all the complexity of the current setup while ensuring automatically that compatible versions and features are used. This is all done without tampering with the user's own Cargo.toml so there are less risks of unwanted side effects. This would be a neat solution to #163. We also wouldn't have to read and update the toml files, we could simply generate the whole manifest, which is much simpler.

Another benefit is that it would allow us more freedom to organize the generated code in a readable/understandable way instead of having to cram everything into a single file (e.g. db types, sync/async #176 ). Taking this further, we could even generate the client crates directly into this generated crate which would save us from having to publish the 3 client crates.

As a side-effect benefit, cargo can parallelize work over crates, so if you have a lot of queries/generated code, this will improve your compile times.

So, in summary,

  • Much better user experience
  • Better compile times
  • More readable/scalable codegen.
  • Gives us more flexibility over generated code and its dependencies in general (future-proof).

Drawback

Cargo won't publish crates with path dependencies (yet, and the maintainers seem to think its too large of a change to ever be merged), so the user either has to

  1. Not publish their crate.
  2. Publish the generated crate separately.
    This is true for any internal crate so I don't think its such a big drawback, but some users might find that annoying nonetheless.

We could keep the current workflow under a "manual configuration" option (where the user must manually declare the generated modules and manage the correct dependencies) for cases where this is absolutely unwanted.

@LouisGariepy
Copy link
Member Author

This is just an idea I'd like to explore, please let me know what you think.

@LouisGariepy LouisGariepy added feature New feature or request dependencies Dependency versions, features and changes. breaking Introduces breaking changes. codegen An issue regarding code generation, or the generated code itself. labels Dec 1, 2022
@jacobsvante
Copy link
Contributor

I think this sounds like an great idea. Looking forward to a more readable code layout.

  1. Not publish their crate.

No other crates are currently depending on any of the cornucopia crates according to crates.io. So I wonder how much of a drawback this really is. I don't see what the use-case would be for depending on it in a library.

@Virgiel
Copy link
Member

Virgiel commented Dec 2, 2022

I see all the benefits of using this architecture, but we must consider that this is a significant architectural change for the Rust ecosystem. We will be, to my knowledge, the first code-generation tool to do this.

I also wonder how we could let the user choose the dependencies version if we generate the Cargo.toml file.

@dlight
Copy link

dlight commented Dec 2, 2022

I think perseus also does this (or used to do, not sure)

@LouisGariepy
Copy link
Member Author

LouisGariepy commented Dec 3, 2022

Perseus does use a similar pattern, but the context is a bit different admittedly.

this is a significant architectural change for the Rust ecosystem

Because of macros, external code generation is uncommon in Rust, both as a single file or crate. Practically speaking, declaring the generated module with mod cornucopia, or adding the generated crate to Cargo.toml are pretty similar and should be understandable to most Rust users.

With that in mind, I think we should choose the most technically advantageous and user-friendly method, instead of basing this decision solely on familiarity. (This is my opinion, I'd like to hear some more thoughts on this point).

how we could let the user choose the dependencies version if we generate the Cargo.toml file.

With our current setup, we give the user the impression that they can choose any version of the dependencies, but this is a big compatibility hazard. Our generated code is only guaranteed to work with the dependency versions we use in the implementation/integration tests. Using another version could potentially cause cryptic errors. Since these errors would occur in the generated code, it would also be very hard to debug or even understand for our users.

If we control the versions ourselves, we can at least ensure that the generated code is error-free, and the user can look up the manifest if they want to know which versions of the dependencies are compatible with their particular version of cornucopia.

SQLx asked their (huge) user base what would be the best update strategy regarding dependencies, and the overwhelming consensus was to make breaking updates as often as required to offer the latest dependency updates. launchbadge/sqlx#1796. Diesel also seems to follow this "newest dependencies only" pattern for their public dependencies, from what I understand. With this in mind, and knowing the major updates can bring critical security updates (in addition to performance and features), I don't think there is a lot of demand for support of old versions of dependencies anyway.

@Virgiel
Copy link
Member

Virgiel commented Dec 3, 2022

SQLx asked their (huge) user base what would be the best update strategy regarding dependencies, and the overwhelming consensus was to make breaking updates as often as required to offer the latest dependency updates. launchbadge/sqlx#1796. Diesel also seems to follow this "newest dependencies only" pattern for their public dependencies, from what I understand. With this in mind, and knowing the major updates can bring critical security updates (in addition to performance and features), I don't think there is a lot of demand for support of old versions of dependencies anyway.

I didn't think it was so common for ORM libraries!

With that in mind, I think we should choose the most technically advantageous and user-friendly method, instead of basing this decision solely on familiarity. (This is my opinion, I'd like to hear some more thoughts on this point).

From my point of view, it is not obvious that a crate is easier to use than a simple file. But our dependency tree is so complex that I can see how it could be more ergonomic. If we are going to go in that direction, we should go all the way, as you suggested, and generate the client crates as well, in order to maximise the benefits.

I have a project where a minor update to the postgres library changed the minimum required version of rustc, and I had to tie to a specific version. Maybe we could add an advanced flag to use workspace dependencies for some of these.

We should certainly ask our users for their views via a pool. Do you know if Github supports this?

@Virgiel
Copy link
Member

Virgiel commented Dec 3, 2022

I am starting to think about implementation details:

We could use the CLI version as the generated crate version, this would be useful for debugging user issues.

Currently, codegen returns a string and this will not work for a crate. We can return a filename and content string map or a temp directory. Using a real directory would reduce memory usage and ensure that we never leave the generated crate in an inconsistent state if codegen fails.

To verify detect code changes in the integration test, we would hash the contents of the directory for comparison, which should be more efficient.

@LouisGariepy
Copy link
Member Author

LouisGariepy commented Dec 4, 2022

I discussed compatibility hazards a couple times with the Rust Discord community, and the answer is more or less always the same. Here's a relevant quote from the latest such discussion: "Basically I'd be looking for ways to get this decision out of the hands/mind of the consumer of this code generator, can't imagine why they'd want to even deal with it."

I have a project where a minor update to the postgres library changed the minimum required version of rustc

MSRV changes are considered breaking. Looking at the git history, I see that postgres has indeed broken their SemVer promises by changing their MSRV in a minor update 😞 . This is something that should be brought up in an issue on their repo. Realistically, we can't really do anything about dependencies not respecting their MSRV SemVer promises beyond telling them.

Maybe we could add an advanced flag to use workspace dependencies for some of these.

That wouldn't work for projects that want to include the generated code as a path dependency, not as a workspace member. A solution that would always work could be a --exact-versions option that would generate "=x.y.z" instead of "x.y.z" versionings.

MSRV-related SemVer breakages are rare enough that I honestly don't think this is a problem for the majority of our users. As a case in point, we changed our MSRV many times under the hood, only updated it for v0.9, and nobody has reported breakages, or asked to downgrade our MSRV.

Before the crate reaches 1.0, I'd like to keep things moving fast and offer our early users an up-to-date experience. This means having a relatively high MSRV and up-to-date dependency versions.

Once we're preparing for our first stable release (we're still a long way!), we should put a lot of effort in improving our support for older dependencies / MSRVs. But at this current stage, I think we should focus on experimenting and improving our actual UX and features.

@LouisGariepy
Copy link
Member Author

LouisGariepy commented Dec 4, 2022

We should certainly ask our users for their views via a pool. Do you know if Github supports this?

Yeah we can do polls over in the discussion section, but honestly I don't think we'd get enough visibility for it to be meaningful 😅. This thread is a decent place to have this discussion IMO.

I'll try to ping our regular contributors/users to get some more opinions on this.

Edit: I also asked for some feedback on our Discord. Let's hope we get some more thoughts on this; the more we get, the better informed of a decision we can make.


@skreborn What do you think of this? (info in the original post).

@LouisGariepy
Copy link
Member Author

I am starting to think about implementation details:

We could use the CLI version as the generated crate version, this would be useful for debugging user issues.

Currently, codegen returns a string and this will not work for a crate. We can return a filename and content string map or a temp directory. Using a real directory would reduce memory usage and ensure that we never leave the generated crate in an inconsistent state if codegen fails.

To verify detect code changes in the integration test, we would hash the contents of the directory for comparison, which should be more efficient.

Agreed on all points.

@9876691
Copy link

9876691 commented Dec 4, 2022

For a project I'm working on at the moment we've already started to use cornucopia in it's own crate. Our structure looks something like the following

db/
  migrations/
    20220529105613_initial_tables.sql
    ...
  queries/
    api_keys.sql
    ...
    users.sql
    ...
build.rs
Cargo.toml
lib.rs

The lib.rs looks like the following

use std::str::FromStr;

pub use deadpool_postgres::{Pool, Transaction, PoolError};
pub use tokio_postgres::Error as TokioPostgresError;
pub use cornucopia_async::Params;

pub use queries::tracking_data::TrackingData;
pub use queries::conjunctions::Conjunction;
pub use queries::orbit_data::OrbitData;
pub use queries::users::User;
pub use queries::organisations::Organisation;
pub use types::public::DataSharingPolicy;

pub fn create_pool(database_url: &str) -> deadpool_postgres::Pool {
    let config = tokio_postgres::Config::from_str(database_url).unwrap();

    let manager = if database_url.contains("sslmode=require") {
        let mut root_store = rustls::RootCertStore::empty();
        root_store.add_server_trust_anchors(webpki_roots::TLS_SERVER_ROOTS.0.iter().map(
            |ta| {
                rustls::OwnedTrustAnchor::from_subject_spki_name_constraints(
                    ta.subject,
                    ta.spki,
                    ta.name_constraints,
                )
            },
        ));

        let tls_config = rustls::ClientConfig::builder()
            .with_safe_defaults()
            .with_root_certificates(root_store)
            .with_no_client_auth();
        let tls = tokio_postgres_rustls::MakeRustlsConnect::new(tls_config);
        deadpool_postgres::Manager::new(config, tls)
    } else {
        deadpool_postgres::Manager::new(config, tokio_postgres::NoTls)
    };

    deadpool_postgres::Pool::builder(manager).build().unwrap()
}

include!(concat!(env!("OUT_DIR"), "/cornucopia.rs"));

And Cargo.toml looks like

[package]
name = "db"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[lib]
path = "lib.rs"

[dependencies]
tokio-postgres = { version = "0.7", features = [
    "with-time-0_3",
    "with-serde_json-1",
] }
cornucopia_async = "0"
deadpool-postgres = { version = "0", features = ["serde"] }
postgres-types = { version = "0", features = ["derive"] }
time = { version = "0", default-features = false,  features = ["formatting"] }
webpki-roots = "0"
rustls = "0"
tokio-postgres-rustls = "0"

serde_json = "1"
futures = "0.3"

build.rs just calls out to cornucopia for code gen on every change.

So everything to do with the DB including migrations is in one folder called db.

As you can see in this code I'm compiling TLS into the exe. This is because we deploy to Docker and we always use a Scratch container. I'd like to keep that flexibility.

The db folder is really neat. Everything DB is in there, it feels like a nice separation of concerns.

Do I need cornucopia to generate the Cargo.toml ?

I'm OK with this structure as I have complete control over the crate.

Perhaps a cornucopia init command that creates this initial structure would be cool.

Hope this helps.

@9876691
Copy link

9876691 commented Dec 4, 2022

Another thing I forgot is that some people are happy to generate the code into their projects and check it in.

I don't do this and follow the https://github.com/hyperium/tonic and https://github.com/kaj/ructe model where generated code is not checked in as it's considered a build artifact.

I think that is the build.rs way of doing things.

@LouisGariepy
Copy link
Member Author

LouisGariepy commented Dec 5, 2022

@ianpurton Thanks for contributing your thoughts 😄 . Its really interesting to me since your setup is not one I had considered previously.

As you can see in this code I'm compiling TLS into the exe. This is because we deploy to Docker and we always use a Scratch container. I'd like to keep that flexibility.
Do I need cornucopia to generate the Cargo.toml ?
I'm OK with this structure as I have complete control over the crate.

Everything here is compatible with the proposal. You can generate the cornucopia code wherever and include it in your db crate as a path dependency. Everything else stays the same more or less. This has all the benefits you outlined (keeping all the DB stuff isolated together, have complete control over your DB crate), but you also get all the other benefits of this proposal. For example,

[package]
name = "db"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[lib]
path = "lib.rs"

[dependencies]
# You can still use whatever you want in your own crate
webpki-roots = "0"
rustls = "0"
tokio-postgres-rustls = "0"

# Cornucopia takes care of its own dependencies.
# This generated crate is tailor-made for your exact queries, no need for features and whatnot.
cornucopia_gen = { path = "..." }

This is really powerful: if you add a query that uses UUIDs, when your queries are re-built, the generated crate will automatically add the uuid crate with the right features to its own internal dependencies. This keeps your db code clean, and creates another layer of separation with the generated code. Basically, the proposal aims to keep the user's own dependencies, and the generated code's dependencies clearly separated.


As a side effect, this also removes the need for the include!(concat!(env!("OUT_DIR"), "/cornucopia.rs")); trick, which has some weird interactions as we discovered earlier in this issue #138

@jacobsvante
Copy link
Contributor

Won't it be faster to compile as well? Or maybe this doesn't apply to path dependencies..?

@9876691
Copy link

9876691 commented Dec 5, 2022

@LouisGariepy Sounds great.

@9876691
Copy link

9876691 commented Dec 5, 2022

@LouisGariepy One more thought.

If I'm generating code with build.rs then I think it won't work. The Cargo.toml will reference a cornucopia_gen = { path = "..." } that doesn't exist yet.

So I suspect the build.rs won't run until it's created.

@LouisGariepy
Copy link
Member Author

@ianpurton

You're right, Cargo resolves all dependencies, not just build-dependencies before running the build script, unfortunately. There have been requests to change this though rust-lang/cargo#8709,

What to do if the codegen crate doesn't yet exist when using a build script.

Depending on the situation (CI or not), there are three simple solutions (either could work):

  • Create an empty crate with cargo new <codegen path>, then the build script will work normally.
    or
  • Use the CLI for the first codegen, then the build script will work normally.
    or
  • Run the build.rs before adding the path dependency, then the build script will work normally.

I don't think this is too bad since this is a one-time situation (or can be automated in CI), but I'd like to hear your thoughts on this.


@jacobsvante

Won't it be faster to compile as well? Or maybe this doesn't apply to path dependencies..?

AFAIK, it should help with parallel compiling. Path dependencies are treated just as any other dependency (except it doesn't have to be downloaded from crates.io)

@LouisGariepy LouisGariepy changed the title A new codegen architecture. RFC: A new codegen architecture. Dec 6, 2022
@LouisGariepy
Copy link
Member Author

LouisGariepy commented Dec 6, 2022

Since there's been a lot of discussions and some of the pros and cons are getting scattered, I'll make a summary.

The problem

What this issue is trying to address is the generated code's complex dependencies. There's a good breakdown over in #163, but the gist is:

  • The generated code's internal dependencies "leak" to the user. We require that the user knows about some seemingly unrelated crate futures and postgres-types. If we decide to change the internal crates we use, we will break the user's codegen, or leave them with silently unused dependencies.
  • The versions generated code's dependencies versions are a compatibility hazard. A user updating cornucopia can break their codegen and they won't get any indication of what exactly is going on.
  • We require that the user knows about some arcane details of the dependencies in question. For example, they have to know that to use Json, they need to import serde_json with the raw_value feature, and to add the with-serde_json-1 feature to both postgres and cornucopia_sync.

This is a lot for the average user.


The solutions

Many solutions have been proposed, all with their pros and cons. Let's take a look

Status Quo

The most simple "fix" is to accept things how they are. The disadvantages have already been discussed above. The main advantage is that "it works" and our current users are used to this.

Feature-flags + reexporting dependencies

One possible solution is to reexport all the necessary codegen dependencies (both internal and public) from our Cornucopia crate. This solves our compatibility hazards (whatever version of cornucopia they use, they get the right dependencies reexported), and is a familiar pattern that many rust projects use to make their user's life easier. This would be similar to what SQLx does.

The drawbacks are mainly that it would require us to create feature flags for all our public dependencies ("serde_json", "uuid", etc.) and reexport a bunch of types. This is not too much of a problem since our public dependencies are pretty stable and we already have a serde_json feature anyway, but still something to consider.

We also have to be careful to make our feature flags additive. For example, what happens if someone puts the deadpool flag without the async flag?

Another drawback is that this would "pollute" our cornucopia crate with a bunch of reexported public items. As a workaround, we could put the reexported items in some other crate. Not too pretty, but doable.

@Virgiel Might also have some more drawbacks to add to this solution.

Dependency-aware CLI

Another proposed solution is to make our CLI able to analyze the user's Cargo.toml and suggest fixes for the dependencies. For example, it would be able to tell that a particular dependency or feature is missing, or give general hints. This is good for guiding the user, but it still has some drawbacks.

First and foremost, this is still requires a lot of manual reading / fixing by the user. We could apply the fixes automatically, but I'm really not comfortable with the idea of automatically modifying the user's manifest (see below for why).

Analyzing a Cargo.toml file is not trivial. We have to deal with crate renames, optional dependencies, etc. It would be difficult to have a truly reliable implementation. It also requires us to add some relatively hefty transitive dependencies. Besides, making useful hints and suggestions would also take some additional effort.

Also, we wouldn't able to inform the user about unused dependencies. We have to assume that if a dependency is in the manifest, the user could potentially be using it for something other than the generated code.

There's no automatic way to know for sure which manifest should be analyzed, especially in a workspace crate. So, we'd have to add a --manifest_path option to the CLI command which makes it less ergonomic.

The following example illustrates this: we are in folder `foo`, our queries are in folder `queries`, the destination folder is `target`, but `crate_baz/Cargo.toml` is the manifest we want to analyze.
foo
 |-- queries
 |-- crate_baz
         |-- Cargo.toml
         |-- ...
 |-- target
 |-- Cargo.toml
 | ...

Its also not super compatible with build.rs users since the build script cannot do general-purpose printing, only warnings (considered a hack) and erors. So no hints.

So, there are significant limitations and the implementation effort would be higher than pretty much all the other solutions. Still, it makes it possible to solve some of our problems while keeping the same codegen architecture.

Generate a crate instead of a module

AKA the proposition in the original post of this thread.

I won't go into too much into detail because the link above already explains it, but the gist is that the generated crate would encapsulate its own dependencies, making the dependency setup completely automated for users. Our query analysis will determine exactly what your specific queries require as dependencies, no actions or features required. This also makes our code generation future-proof, and ensures less risks of compatibility hazards.

There are also some other miscellaneous advantages like better compile times, more readable generated code, and we wouldn't need to publish our client crates anymore. But these are not the main focus here,

The drawbacks are that crate generation is a new kind of workflow which could be unfamiliar to users. Its probably the "most different" out of all the proposed solutions. It also somewhat complexifies some internal parts of Cornucopia see this comment. Finally, if you use a build script instead of the CLI, it requires a single manual action the first time you use Cornucopia in a new project (What to do if the codegen crate doesn't yet exist when using a build script)

@LouisGariepy
Copy link
Member Author

If you have a preferred solution between these, please state it clearly! This would be much appreciated.

There's still time to look at other solutions too, so if you have an idea, please let it be known.

Thanks all :)

@9876691

This comment was marked as resolved.

@LouisGariepy

This comment was marked as resolved.

@jacobsvante
Copy link
Contributor

Still can't come up with any other negative than that publishing crates that depend on cornucopia won't work. Just go with the crate solution already 😁 (Just kidding, it's great that you don't just press the button and implement it!)

@LouisGariepy
Copy link
Member Author

Sorry if it felt like I was beating around the bush 😄 This is not a change that can be undone lightly so I want to make sure we're going in the right direction. Regardless of the technical merits, I also want to make sure that everyone is on board, out of respect for the time you all put into cornucopia.

I'll leave this issue open so we can still collect some more information, but I'll start working on the implementation so we can also see how it works out in practice.

Cheers!

@jacobsvante
Copy link
Contributor

You are thinking things through, I like that. Good luck with the implementation! Can't wait to see the end result 😀

@tbillington
Copy link

I'm doing something similar to what ian mentioned with protobuf (tonic) generated code for a grpc server. I have a separate crate in the workspace that only contains the generated stuff, and I know everything in there is scoped to supporting the generated code.

It has some small downsides like you can't implement common traits (serde etc) for the generated types now because they're both in external crates, whereas you could if you generated the code into your "main" crate.
That's not a deal breaker for me though so the separate crate has been a really nice workflow.

I'd be curious to see how your Cargo managing workflow works :)

@dlight
Copy link

dlight commented Jan 25, 2023

The drawbacks are that crate generation is a new kind of workflow which could be unfamiliar to users.

I think that crate generation is just fine, and as I said, there are other examples in the Rust ecosystem where entire crates are generated. Well besides Perseus there is also https://github.com/einride/protocrate-rs and maybe others.

It's only appropriate to generate a single file if you don't have dependencies IMO. If your generated code has dependencies, generating a crate is kind of unavoidable. And, the thing in common between cornucopia, protocrate and perseus is that they want to generate code that depends on outside crates.

@9876691
Copy link

9876691 commented Jan 26, 2023

@tbillington

It has some small downsides like you can't implement common traits (serde etc) for the generated types now because they're both in external crates, whereas you could if you generated the code into your "main" crate. That's not a deal breaker for me though so the separate crate has been a really nice workflow.

To get tonic to add Serialize setup your build.rs like so.

fn main() {
    // Compile our proto
    tonic_build::configure()
        // We want to be able to convert proto to json, this enables that.
        .type_attribute(".", "#[derive(serde::Deserialize, serde::Serialize)]")
        // Prost will automatically include it's own prosy_types for google Timestamp etc.
        // But this doesn't have Serialize and Deseriealize so we switch it off.
        // Maybe we could use https://github.com/fdeantoni/prost-wkt
        .compile_well_known_types(true)
        .compile(&["api.proto"], &["./"])
        .unwrap();
}

@dlight
Copy link

dlight commented Feb 8, 2023

To get tonic to add Serialize setup your build.rs like so.

I would love something similar to this .type_attribute() thing on Cornucopia because I want to derive arbitrary traits.

For example, derive specta's Type as used by the Prisma Rust client to integrate with rspc which is a RPC library for writing web APIs (basically it generates a typescript type out of your database table, and supports things like generating a type with a subset of columns, etc)

@farazfazli
Copy link

@ianpurton Thanks for contributing your thoughts smile . Its really interesting to me since your setup is not one I had considered previously.

As you can see in this code I'm compiling TLS into the exe. This is because we deploy to Docker and we always use a Scratch container. I'd like to keep that flexibility.
Do I need cornucopia to generate the Cargo.toml ?
I'm OK with this structure as I have complete control over the crate.

Everything here is compatible with the proposal. You can generate the cornucopia code wherever and include it in your db crate as a path dependency. Everything else stays the same more or less. This has all the benefits you outlined (keeping all the DB stuff isolated together, have complete control over your DB crate), but you also get all the other benefits of this proposal. For example,

[package]
name = "db"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[lib]
path = "lib.rs"

[dependencies]
# You can still use whatever you want in your own crate
webpki-roots = "0"
rustls = "0"
tokio-postgres-rustls = "0"

# Cornucopia takes care of its own dependencies.
# This generated crate is tailor-made for your exact queries, no need for features and whatnot.
cornucopia_gen = { path = "..." }

This is really powerful: if you add a query that uses UUIDs, when your queries are re-built, the generated crate will automatically add the uuid crate with the right features to its own internal dependencies. This keeps your db code clean, and creates another layer of separation with the generated code. Basically, the proposal aims to keep the user's own dependencies, and the generated code's dependencies clearly separated.

As a side effect, this also removes the need for the include!(concat!(env!("OUT_DIR"), "/cornucopia.rs")); trick, which has some weird interactions as we discovered earlier in this issue #138

Hi Louis, this is an awesome project. And I love this idea you have around changing codegen architecture...it would definitely make the user experience nicer because we wouldn't have to manually add dependencies based on the types. Also - having everything cornucopia-related in its own crate provides clearer separation of concerns. +1 from me on this! 😀

@jwest23
Copy link

jwest23 commented Mar 10, 2023

Hi! I have not used cornucopia yet. I was reading through the issues as part of my evaluation before I try adding it into my stack. Today I'm using deadpool-postgres with my SQL embedded directly in my code 😱

Cornucopia's model scratches several itches and lines up with my current stack in a fantastic way - thank you!

The current situation with dependencies gave me pause but not enough to turn me off.

For context, I imagine that for my first run at this I'll be using the CLI to generate code against a live database I maintain separately and checking in the generated code. This seems like the easiest way to work with my build server. I will be adopting cornucopia slowly instead of converting my whole code base over in one fell swoop.

Generating the code as its own crate instead of a module strikes me as the most helpful option to let someone in my position adopt cornucopia into an existing system.

@jvdwrf
Copy link

jvdwrf commented Mar 17, 2023

I agree with what has been said so far, the separation of concerns seems like a nice thing and I like to structure my applications in this way anyways.

It might make it slightly harder to use in a simple application since it requires a workspace, but I think generating a crate is the right™ way to go about this. At least it is what I would personally prefer.

@kennethuil
Copy link

I definitely want to keep at least the option to have my build script update my crate source rather than create a whole new crate, and if I have to fix the missing dependency errors I'd have to deal with anyway if I was writing that DB code on my own, so be it.

@LouisGariepy
Copy link
Member Author

@kennethuil understood!

I'd be grateful if you could share some more info on this

I definitely want to keep at least the option to have my build script update my crate source rather than create a whole new crate

Specifically, how is using a separate crate a disruption to your workflow?

Thank you 😄

@kennethuil
Copy link

kennethuil commented Mar 26, 2023

On further thought, having an extra crate would only require a small adjustment in my Dockerfile and no other adjustment in my CI/CD pipeline. And I would expect rust-analyzer to just work.

Although it would complicate any attempt to customize the Cargo.toml for the Cornucopia-generated code.

@LukeMathWalker
Copy link

👋🏻 Hey folks!

I've been following your project for a while and this issue was one of the things that pushed me to pull the trigger on cargo-px, a tool designed for "whole crate" generation.
It's a problem I'm personally running into with pavex, but it looks like it's a broader issue in the ecosystem.

If there's a chance we could collaborate and shape the solution in such a way that fits your needs, more than happy to do so.

@9876691
Copy link

9876691 commented May 3, 2023

@LukeMathWalker I like this. As I'm a downstream user of cornucopia I like that I can choose to version control the generated crate or just generate it as part of my build and add it to .gitignore.

I like the idea using meta data to call up cornucopia.

[package.metadata.px.generate]
# The generator is a binary in the current workspace. 
# It's the only generator type we support at the moment.
generator_type = "cargo_workspace_binary"
# The name of the binary.
generator_name = "cornucopia"
# The arguments to be passed to the binary. 
# It can be omitted if there are no arguments.
generator_args = ["...", "...", "..."]

Good look with pavex, I'm watching it to see where it goes.

@LukeMathWalker
Copy link

LukeMathWalker commented May 3, 2023

In the specific case of cornucopia, we'd probably need to add a new generator_type if you want to invoke directly the binary that gets published to the registry. As a sketch:

[package.metadata.px.generate]
generator_type = "registry_binary"
generator_name = "cornucopia"
generator_version = "0.2.3"
generator_args = ["...", "...", "..."]

and cargo-px would take care of installing it in the current workspace (i.e. not globally, to allow different projects to use different versions).

(I should probably have a generator sub-field rather than using it as a prefix and perhaps mimic the dependency fields from Cargo.toml)

@Virgiel
Copy link
Member

Virgiel commented May 6, 2023

@LukeMathWalker I'm following your work on pavex closely and am looking forward to replacing my current hyper based stack with it !

The new codegen architecture is far from ready but cargo-px will be very useful.

How does pavex know when to call the generator? Is this the same system that cargo uses with build scripts?

In the specific case of cornucopia, we'd probably need to add a new generator_type if you want to invoke directly the binary that gets published to the registry.

That would be fantastic !

@LukeMathWalker
Copy link

LukeMathWalker commented May 6, 2023

How does cargo-px know when to call the generator? Is this the same system that cargo uses with build scripts?

The current design is very simple—the generator is always invoked, and it's the generator's job to determine if there is any work to do (e.g. if the hash of a file has changed). We can definitely iterate here.

@Virgiel
Copy link
Member

Virgiel commented May 8, 2023

it's the generator's job to determine if there is any work to do

This is actually very sensible, as it is the only way to make the generator work on its own if desired.

@notdanilo
Copy link

I am working on a similar feature and this might be of your interest: rust-lang/cargo/issues/14065

I will start a PR for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Introduces breaking changes. codegen An issue regarding code generation, or the generated code itself. dependencies Dependency versions, features and changes. feature New feature or request
Projects
None yet
Development

No branches or pull requests