Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to ignore missing workspace members when building #14566

Closed
jephthia opened this issue Sep 19, 2024 · 19 comments
Closed

Option to ignore missing workspace members when building #14566

jephthia opened this issue Sep 19, 2024 · 19 comments
Labels
A-workspaces Area: workspaces C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request.

Comments

@jephthia
Copy link

Problem

Given the following virtual workspace:

[workspace]
resolver = "2"

members = [
  "apps/bin1",
  "apps/bin2",
  "apps/bin3",
  "libs/libA",
  "libs/libB",
  ...
]

Building bin1 with cargo build -p bin1 should be allowed even if bin2 is missing from the file system.

Use case: In docker when building a binary, only the relevant files are sent to the build context. For example, consider the following ignore file for building bin1

# Exclude everything
*

# Include relevant files only
!apps/bin1
!libs
!Cargo.toml
!Cargo.lock

The binary is then built with:
cargo build -p bin1 --locked --release

This fails with error: failed to load manifest for workspace member /code/app/bin2
because the other binaries which are members of the workspace aren't present, but they are irrelevant to this build so should not be needed.

Proposed Solution

An option to prevent a build from failing because of a missing member.
This could be done either:

through a flag:
cargo build -p bin1 --ignore-missing-members (with a better flag name)

or directly in the manifest?

[workspace]
ignore-missing-members = true

Notes

No response

@jephthia jephthia added C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage. labels Sep 19, 2024
@weihanglo
Copy link
Member

Every member is relevant and contributes to the workspace's Cargo.lock. Removing any of them may affect the dependency resolution.

Similar feature requests have been made:

and #6179 is the major one tracking it.

That being said, this docker use case is slightly different from others. Could you expand a bit about your workflow? Also wonder where the ignore file came from.

@weihanglo weihanglo added S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. A-workspaces Area: workspaces and removed S-triage Status: This issue is waiting on initial triage. labels Sep 19, 2024
@epage
Copy link
Contributor

epage commented Sep 19, 2024

Most of those conditional ones are for dealing with targets, cfgs, etc. This is related to docker caching from what I understoof of the text. Docker fingerpints the inputs to a layer and rebuilds if they change. If you have a layer that is meant for only a subset of your dependency tree, then you only want that fingerprinted. This is more similar to #2644 but in the workspace, rather than dealing with dependencies.

https://crates.io/crates/cargo-chef is the primary place for docker layer caching experiments. I'd recommend seeing if that meets or your needs or if it could be made to meet your needs.

@jephthia
Copy link
Author

That being said, this docker use case is slightly different from others. Could you expand a bit about your workflow? Also wonder where the ignore file came from.

Let's say we have a virtual workspace with just two binaries, server1 and server2

members = [
  "apps/server1",
  "apps/server2",
]

and the following file structure:

project/
├── apps/
│   ├── server1/
│   │   ├── src/
│   │   │   └── main.rs
│   │   └── Cargo.toml
│   └── server2/
│       ├── src/
│       │   └── main.rs
│       └── Cargo.toml
├── config/
│   └── docker/
│       ├── dev/
│       └── prod/
│           ├── Server1.Dockerfile
│           ├── Server1.Dockerfile.dockerignore
│           ├── Server2.Dockerfile
│           └── Server2.Dockerfile.dockerignore
├── Cargo.toml
├── Cargo.lock
└── docker-compose.yaml

Docker is used to build and run these binaries, the first workflow is for production and the second is for local development (both suffer from the same issue)

To build for production the following is used:
File: config/docker/prod/Server1.Dockerfile

FROM rust:1.80.1-bullseye AS build

RUN --mount=type=bind,source=apps/server1,target=apps/server1,rw \
    --mount=type=bind,source=Cargo.toml,target=Cargo.toml \
    --mount=type=bind,source=Cargo.lock,target=Cargo.lock \
    --mount=type=cache,target=target/ \
    --mount=type=cache,target=/usr/local/cargo/registry/ \
    <<EOF
set -e
cargo build -p server1 --locked --release <--- fails: "missing server2 member"
EOF

...

File: config/docker/prod/Server1.Dockerfile.dockerignore

# Ignore everything
*

# Include relevant files only
!apps/server1
!Cargo.toml
!Cargo.lock

Attempting to build this with
docker build -t server1 -f config/docker/prod/Server1.Dockerfile .
will fail since the source of server2 aren't included in the build context.

For the local development workflow, it's similar but with docker-compose.yaml being used instead:

  server1:
    build:
      context: .
      dockerfile: config/docker/dev/Server1.Dockerfile
    working_dir: /code
    volumes:
      - ./apps/server1:/code/apps/server1
      - ./Cargo.toml:/code/Cargo.toml
      - ./Cargo.lock:/code/Cargo.lock
      - server1_cargo_target:/code/target/
      - server1_cargo:/usr/local/cargo
    ports:
      - "8080:8080"
    command: ["cargo", "watch", "-x", "run", "-p", "server1"] <--- Fails: missing server2 member

Running docker compose up server1 fails.


This is related to docker caching

Ah caching isn't the concern for this, (we've long given up on getting caching to work correctly in CI 😅 at least for now)
We can accept slower builds, as long as it at least builds, but with this issue, the builds aren't even possible.

This came up because our current setup does not make use of cargo workspaces and instead treats each binary separately with their own target/ and Cargo.lock but as the monorepo grew, this became less than ideal so we decided to switch to cargo workspaces but hit this issue.

Are there issues that we're not considering that would come out of building a specific binary in a workspace without the presence of other binaries? Our thinking was that this wouldn't cause issues since each binary is unrelated to the other.

https://crates.io/crates/cargo-chef is the primary place for docker layer caching experiments. I'd recommend seeing if that meets or your needs or if it could be made to meet your needs.

Since caching isn't the issue for this we're a bit hesitant on taking on a dependency to solve this. As a workaround we're currently thinking of having a config/cargos/ directory that would store different Cargo.toml files for each necessary binary (e.g Cargo-server1.toml, Cargo-server2.toml) and during the mount, we'd pick the specific Cargo-xyz.toml needed for the build. But this extra maintenance would feel like a hack that goes against the point of a workspace.

@epage
Copy link
Contributor

epage commented Sep 19, 2024

So this would also be a problem for caching iiuc, even if that isn't your problem.

This came up because our current setup does not make use of cargo workspaces and instead treats each binary separately with their own target/ and Cargo.lock but as the monorepo grew, this became less than ideal so we decided to switch to cargo workspaces but hit this issue.

Maybe I missed it but could you expand on why you don't mount the entire source?

@jephthia
Copy link
Author

So this would also be a problem for caching iiuc, even if that isn't your problem.

Hmm yes, most likely

Maybe I missed it but could you expand on why you don't mount the entire source?

This is a recommended docker best practice as far as I know, we have a monorepo which contains many unrelated things, android app, ios app, non-rust libraries, many rust binaries, etc. Our common workflow is to simply exclude everything and only include the project being built.

From: https://docs.docker.com/build/concepts/context/#dockerignore-files

This helps avoid sending unwanted files and directories to the builder, improving build speed, especially when using a remote builder.

@Rusty-Weasel
Copy link

Rusty-Weasel commented Sep 20, 2024

why u do not use thinks like?

[workspace]
resolver = "2"

members = [
    "apps/*",
    "libs/*",
]

If no folder exists, then there is no member,
if folder exists, then there is a member.

@jephthia
Copy link
Author

why u do not use thinks like?

We have a monorepo which contains many unrelated things, android app, ios app, non-rust libraries, many rust binaries, etc.

[workspace]
resolver = "2"

members = [
   "apps/*",
   "libs/*",
]

This works when you have a simple flat structure and "rust only" apps/libs.
This fails as soon as you have anything else, non-rust directories that do not contain a Cargo.toml, nested structure, etc, which then requires finding the right combination of glob patterns to exclude which isn't even possible right now.

It is much simpler and easier to list all members directly which is how cargo new works by default, it automatically adds the new member to the list so we don't even need to do this ourselves.

@epage
Copy link
Contributor

epage commented Sep 20, 2024

non-rust directories that do not contain a Cargo.toml

I wonder if we should loosen this restriction. #4593 is the issue for this and #11405 is wider discussion on this feature.

nested structure

If we had #4593, would this then be resolved? Would anything in #11405 help?

It is much simpler and easier to list all members directly which is how cargo new works by default, it automatically adds the new member to the list so we don't even need to do this ourselves.

But if you use globs, then that isn't a problem anyways.

@jephthia
Copy link
Author

If we had #4593, would this then be resolved? Would anything in #11405 help?

If #4593 was resolved with the addition of #6009 to exclude cargo directories when needed then yes using the glob approach would work. But for us to consider that approach a "clean" solution, we'd then want a way to disable the default behavior of cargo new because right now whenever we add new binaries or libraries with cargo new the member is automatically added to the list with its full path.

i.e, if we had

members = [
    "apps/*",
    "libs/*",
]

and ran cargo new apps/binA we would end up with

members = [
    "apps/*",
    "libs/*",
    "apps/binA",
]

each time something is added, and so you'd now have to remember to go to the root Cargo.toml and remove the new entry that was added which would make this approach feel slightly "broken" and we can imagine developers coming to the project (especially if they are new to rust) wondering why we don't just use the default approach of letting cargo new add the new member as a full path.

This is what we're used to in other worlds like yarn workspaces (javascript) where a workspace such as:
File; package.json

{
    "private": true,
    "workspaces": [
        "apps/web_admin",
        "doesnt_exist"
    ]
}

can still be built even if a member isn't present.

May I ask what is the rationale behind the restriction of refusing to build if a listed member is not present?
If the member is missing but not referenced in this build wouldn't it just be a no-op, whereas,
if the member is missing but referenced, the build would end up failing appropriately once it gets to the place where the member is referenced but cannot be found.

i.e, isn't the outcome the expected one in either case? Is there something we're not considering?

@Rusty-Weasel
Copy link

Is there something we're not considering?

yes, this -> We have a monorepo which contains many unrelated things, android app, ios app, non-rust libraries, many rust binaries, etc.

On the one hand, you wish to have a clean solution,
on the other hand you expect things from cargo for what it is not build.

Cargo is for rust and it's structures, and not there to handle other unrelated things outside of the scope of rust.

When u mix different programming langugage tools/package managers/projects/etc it is all time a challenge,
if they are not cleanly seperated and configs proper adjusted, as example:

IDE: RustRover + IntelliJ
Repo: mono
Difficulty: 8/9 of 10 😄
Fun Level: 👎

Rust workspaces with rust libs/bins + Flutter (android/ios apps) + vcpgk + docker + github workflow can work smoothly together,
if you manually take care of different points at many places.

Then u got a monorep with rust + dart + kotlin + c/c++ and perhaps some usefull python scripts,
but this all is not a cargo issue, its a project structure or tools compatiblity issue.

As more different tools or languages you have in use, as higher the difficulty factor grows up.

My best solution for this is to reorganise your monorepo and perhaps to adjust many configs,
to get a clean and compatible structure, thats in my eyes, your real challenge

we'd then want a way to disable the default behavior of cargo new because right now whenever we add new binaries or libraries with cargo new the member is automatically added to the list with its full path.

with RustRover the problem does not occur, which ide do you use? or are you using vim/neovim?

@jephthia
Copy link
Author

on the other hand you expect things from cargo for what it is not build.
Cargo is for rust and it's structures, and not there to handle other unrelated things outside of the scope of rust.

I'm a bit unsure how else to describe this issue outside of what I've already described, this issue isn't about mixing different languages, the presence of non-rust code is just 1 additional reason which goes into the main issue being that one cannot build a rust package if any member is missing from the workspace, even if the missing member is not needed at all to build that package.

To go back to the initial example where even if the project only contained rust with the following stripped down structure as described earlier:

project/
├── apps/
│ ├── server1/
│ │ ├── src/
│ │ │ └── main.rs
│ │ └── Cargo.toml
│ └── server2/
│ ├── src/
│ │ └── main.rs
│ └── Cargo.toml
├── config/
│ └── docker/
│ ├── dev/
│ └── prod/
│ ├── Server1.Dockerfile
│ ├── Server1.Dockerfile.dockerignore
│ ├── Server2.Dockerfile
│ └── Server2.Dockerfile.dockerignore
├── Cargo.toml
├── Cargo.lock
└── docker-compose.yaml

the issue is still present, you cannot mount server1 in docker and build it without also including the source of all your other rust binaries/libraries when using full member paths.

To workaround this you must then use glob patterns for your members, which then breaks as soon as you have a folder that doesn't contain a Cargo.toml, i.e, just adding a folder like docs, resources, under apps would break this, or just grouping your libs into groups such as libs/utils/libA would break this, requiring #4593 to be resolved as mentioned.
And then, if you need to exclude certain directories, #6009 also needs to be resolved in order to easily exclude directories using glob patterns.
Finally, even then, cargo new [package] edits your members list and adds the full path of this new member requiring you to manually go remove it, so a way to disable this default behavior would also be needed.

My best solution for this is to reorganise your monorepo and perhaps to adjust many configs,
to get a clean and compatible structure, thats in my eyes, your real challenge

with RustRover the problem does not occur, which ide do you use? or are you using vim/neovim?

VSCode

I appreciate your time looking into this with me to come up with a solution, though, am I understanding correctly that your recommended solution to this is to forego groupings (nested structures), remove any folder without a Cargo.toml whether that's documentation, configs, etc, and finally to ask our developers to switch IDE from VSCode to RustRover?

When u mix different programming langugage tools/package managers/projects/etc it is all time a challenge,

Although, as mentioned, this issue isn't about mixing different languages, I'd just like to note that this workflow didn't pose any challenge for yarn workspaces, a missing member doesn't cause the whole build to fail.

Which brings me back to my question, @weihanglo or @epage would you be able to shed some light into the rationale behind the restriction of refusing to build if a listed member is not present even if the missing member is not needed for the build? Would this feature request of allowing the build to continue, as long as all required members are present, cause issues?

@epage
Copy link
Contributor

epage commented Sep 26, 2024

i.e, if we had

members = [
    "apps/*",
    "libs/*",
]

and ran cargo new apps/binA we would end up with

members = [
    "apps/*",
    "libs/*",
    "apps/binA",
]

each time something is added, and so you'd now have to remember to go to the root Cargo.toml and remove the new entry that was added which would make this approach feel slightly "broken" and we can imagine developers coming to the project (especially if they are new to rust) wondering why we don't just use the default approach of letting cargo new add the new member as a full path.

This is a bug, please open an issue for this. We have a test for a simple glob but it seems other globs aren't working.

Before cargo new

[workspace]
resolver = "2"
members = ["crates/*"]

After cargo new

[workspace]
resolver = "2"
members = ["crates/*"]

@epage

This comment has been minimized.

@epage
Copy link
Contributor

epage commented Sep 26, 2024

Which brings me back to my question, @weihanglo or @epage would you be able to shed some light into the rationale behind the restriction of refusing to build if a listed member is not present even if the missing member is not needed for the build? Would this feature request of allowing the build to continue, as long as all required members are present, cause issues?

The answer to this was given earlier

Every member is relevant and contributes to the workspace's Cargo.lock. Removing any of them may affect the dependency resolution.

Note that you said you were running with

$ cargo build -p bin1 --locked --release

The --locked flag means "fail if the lockfile would change". The lockfile contains every workspace member and their dependencies. Operating without some workspace members would remove those, causing that command to fail. Say you run without the flag, in my experience, developers are likely to commit changed files, even if the change is no relevant to their code (and we kind of teach that with Cargo.lock) which means you would likely end up with an unhappy developer workflow as people would find out late in the process that they broke things and need to fix it. If they then added dependencies in this state, then it won't take the missing members into account which could then be another source of breakages later down the line.

@jephthia
Copy link
Author

This is a bug, please open an issue for this. We have a test for a simple glob but it seems other globs aren't working.

So I tried it again and you're right, it doesn't add the full path member when running cargo new but since I can remember that it had happened when trying it that one time, I went back to my commands history to see what exact steps were done, and found

 9892  cd downloads
 9893  mkdir project
 9894  cd project
 9895  code . <-- opens VSCode
 9896  cd apps <-- added this folder in VSCode
 9897  cargo new server1
 9898  cargo new server2
 9899  cd ..
 9900  cd libs/group <-- added this folder in VSCode
 9901  cargo new --lib libA
 9902  cargo new --lib libB
 9903  cd ../..
 9904  cargo new --lib libs/group/libC

Before creating libC I updated the members from:
members = ["apps/server1", "apps/server2", "libs/group/libA", "libs/group/libB"]
to
members = ["apps/*", "libs/group/*"]
and after adding libC, it did update the members list and added the full path,

members = ["apps/*",
    "libs/group/*", "libs/group/libC"]

the weird thing is that I deleted all files to try it again to be sure but this time it didn't do it, adding libC kept the members as
members = ["apps/*", "libs/group/*"]

I tried it again a few more times but in the end I only got it to happen 1 more time after many tries.

I'm still not sure why it only happened twice (well 3 times if counting the original one last week) but I'm concluding that it's a non issue since I doubt anyone would run into this realistically.

@jephthia
Copy link
Author

Every member is relevant and contributes to the workspace's Cargo.lock. Removing any of them may affect the dependency resolution.

The --locked flag means "fail if the lockfile would change". The lockfile contains every workspace member and their dependencies. Operating without some workspace members would remove those, causing that command to fail. Say you run without the flag, in my experience, developers are likely to commit changed files, even if the change is no relevant to their code (and we kind of teach that with Cargo.lock) which means you would likely end up with an unhappy developer workflow as people would find out late in the process that they broke things and need to fix it. If they then added dependencies in this state, then it won't take the missing members into account which could then be another source of breakages later down the line.

Yeah we're on the same page for that, removing the --locked flag would be a deal breaker for us when building our release binaries even though it would work, for the reason you listed plus some more.

Alright, then if I understand correctly the root issue with our desired setup is wanting to build our binaries in a container by only mounting the source of the binary (+ the source of its "internal" dependencies a.k.a libs). Which means that in the end even the glob pattern solution "apps/*" would still not work since even if we just had:

project/
├── apps/
│   ├── server1/
│   └── server2/
├── Cargo.toml
└── Cargo.lock

and used

members = ["apps/*"]

even though now we would be able to run cargo run -p server1 locally we still wouldn't be able to build for production with
cargo run -p server1 --locked --release but this time the issue being that it can't build with the --locked flag instead of the original issue being that Cargo fails to build because of the missing source of server2?

@epage
Copy link
Contributor

epage commented Sep 27, 2024

I'm still not sure why it only happened twice (well 3 times if counting the original one last week) but I'm concluding that it's a non issue since I doubt anyone would run into this realistically.

Did you update rust in that time frame? We might have had bugs around that that were fixed.

@epage
Copy link
Contributor

epage commented Sep 27, 2024

even though now we would be able to run cargo run -p server1 locally we still wouldn't be able to build for production with
cargo run -p server1 --locked --release but this time the issue being that it can't build with the --locked flag instead of the original issue being that Cargo fails to build because of the missing source of server2?

Yes, thats correct.

@jephthia
Copy link
Author

Did you update rust in that time frame? We might have had bugs around that that were fixed.

No updates in-between, happened twice in about 50 or so tries so it can be ignored imo.

Yes, thats correct.

Okay then we'll be sticking with the current setup of not using a workspace for the time being, thank you for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-workspaces Area: workspaces C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request.
Projects
None yet
Development

No branches or pull requests

4 participants