Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cargo may potentially publish sensitive information to registries when --allow-dirty is passed #9398

Open
landaire opened this issue Apr 23, 2021 · 9 comments
Labels
C-bug Category: bug Command-package S-triage Status: This issue is waiting on initial triage.

Comments

@landaire
Copy link

landaire commented Apr 23, 2021

Problem

Cargo may include sensitive information when publishing a crate in a git directory with dirty or untracked files when using cargo publish --allow-dirty.

The way I tend to operate in my projects includes occasionally doing cargo run > output.txt or moving sample files into my project root and not including these temporary files in my .gitignore. When publishing a crate to a registry Cargo will warn the user if the git working directory is dirty or contains untracked files. This looks something like:

❯ cargo publish
Updating crates.io <http://crates.io> index
error: 6 files in the working directory contain changes that were not yet committed into git:

file1
file2
file3
file4
file5
file6

to proceed despite this and include the uncommitted changes, pass the `--allow-dirty` flag

I've seen this error a few times and was under the assumption that it was more of a warning that the version I'm publishing to crates.io may be out of sync with the version I'm going to publish to my Git repository.

There's some key wording here that you really need to pay attention to and think about: and include the uncommitted changes.

For untracked files this wording seems safe, since the files aren't even tracked by my git repo. It turns out cargo-publish does not include files ignored by .gitignore, but includes everything else -- including untracked files. Since Cargo operates similarly to git and respects the .gitignore, I initially did not believe this to be the case until I did some testing. This is a pretty serious footgun that can result in private information being accidentally disclosed to the world.

Steps

  1. Create a new Cargo project cargo new --lib test (lib or bin doesn't matter)
  2. Do not commit anything
  3. Run cargo publish to see the below output.
  4. Run cargo publish --allow-dirty then use a tool like cargo-download to download this crate and examine its contents with tar -tvf crate.tar.gz
❯ cargo publish
    Updating crates.io index
warning: manifest has no description, license, license-file, documentation, homepage or repository.
See https://doc.rust-lang.org/cargo/reference/manifest.html#package-metadata for more info.
error: 2 files in the working directory contain changes that were not yet committed into git:

Cargo.toml
src/lib.rs

to proceed despite this and include the uncommitted changes, pass the `--allow-dirty` flag
zsh: exit 101   cargo publish

Possible Solution(s)

Cargo should explicitly ignore untracked files by default, but warn about dirty files. This flag should also be split and changed to --include-dirty and --include-untracked to reflect that the dirty/untracked files will be included in the tarball submitted to the registry

Notes

Output of cargo version: cargo 1.52.0-nightly (90691f2bf 2021-03-16)

Related discussion which introduced this option: #2063

@landaire landaire added the C-bug Category: bug label Apr 23, 2021
landaire added a commit to landaire/cargo that referenced this issue Apr 24, 2021
@landaire
Copy link
Author

I just pushed a commit to my own branch (landaire@546c069) which implements the solution I proposed. There are some issues with this right now such as failing tests and the documentation has not been updated, but it works as intended. Tests need to be added for subrepositories and a pass needs to be done to cleanup code.

Is this a good enough direction to create a draft PR?

@joshtriplett
Copy link
Member

@landaire We really appreciate you taking the time to recognize and report this. This does seem like something that could be missed, and the risk of unintentionally disclosing something seems high enough to motivate a little more care.

I personally think a name with include in it would be more obvious than allow, and I think it'd be reasonable to add such an option; I don't think we'd want to hard-deprecate the old option, but adding a new option and referencing that new option in the help message makes sense.

That said, I don't think we should split this into two options. I think we should have one option that allows and includes all non-checked-in files, including dirty versions of tracked files, because I don't think there are likely to be circumstances in which someone wants one but not the other. It's more that people should be aware of both.

We'd be happy to accept a PR that changes the text of that warning message to make it clearer.

We may also be willing to accept a PR that changes the name of the option and hides the old option, if there's a single clear option name we can use that makes the full behavior clear.

@landaire
Copy link
Author

landaire commented May 4, 2021

Thank you @joshtriplett for taking the time to review this issue.

That said, I don't think we should split this into two options. I think we should have one option that allows and includes all non-checked-in files, including dirty versions of tracked files, because I don't think there are likely to be circumstances in which someone wants one but not the other. It's more that people should be aware of both.

I see your point, but this is exactly the behavior I want today. I leave untracked files in my working directory and am prevented from publishing a new version of my crate until I resolve the issue or just let everything through.

Resolution from cargo's perspective would either be tracking these files in my VCS and committing the change or explicitly ignoring them. As the maintainer this either doesn't make sense (temp files shouldn't be committed), is unnecessary (temp files shouldn't need to be in my .gitignore), or just doesn't fit my personal workflow (put temp files in /tmp/).

On the contrary, the argument can be made that Cargo doesn't and shouldn't know what files are used by your application and it should assume everything not ignored by Git is used.

I personally expect a package on crates.io to match some commit in the manifest's package.repository even though neither crates.io nor Cargo explicitly make this guarantee. With the changes on my branch you will be warned if you are attempting to publish dirty tracked files but you can still match the old behavior if you really want to. This loosely matches git push behavior, except you're warned on every push if you haven't fully committed your changes to tracked files.

I would be happy to create a PR to just include a new CLI option with clearer language and update the error message text. I do think the behavior as it stands today of error-by-default for untracked files is a poor UX and should be re-evalulated.

@joshtriplett
Copy link
Member

@landaire Can you clarify something? Are you saying you'd like to be able to have untracked and unignored files in your crate's working directory and have Cargo include them, or are you saying you'd like Cargo to silently ignore those files rather than warning, or is there some third behavior you want?

I've personally been saved from publishing a broken crate several times by Cargo making sure I don't have any untracked files. I've added a new xyz.rs file, thoroughly tested it with cargo test, gotten as far as git commit and git tag, and failed to git add the new file. cargo publish saved me from publishing a broken crate. (That's one example; I've seen a few others as well.)

@landaire
Copy link
Author

landaire commented May 5, 2021

are you saying you'd like Cargo to silently ignore those files rather than warning

Exactly this. Sometimes I move files into my working directory so I can just type the name instead of a full filesystem path or pipe the output of multiple runs to a temp file in my working directory. These are used just by me for my analysis or testing. I would like Cargo to always ignore these files, but warn me when files that are tracked by my VCS are dirty (this is useful for me to make sure there's no test code left behind).

I've personally been saved from publishing a broken crate several times by Cargo making sure I don't have any untracked files.

Perhaps this could be handled by doing smarter logic in PathSource::list_files and have that API return only files used by Cargo/rustc and files included in the VCS? This API is actually the root cause of the problem since it returns more than is needed to build the project.

With the above changes to PathSource you wouldn't even even need to change check_repo_state as I did. This would fix the CLI option splitting, cover files used at runtime/compile-time via include_bytes!() since they presumably need to be checked in to VCS unless they're in your .gitignore... but the implications of a change in PathSource I think are a little wider-reaching than just the publish and package commands. I'm also not sure how if an API for "give me the bare minimum files needed to build" exists in cargo today.

@joshtriplett
Copy link
Member

I don't think changing the list of files would necessarily help; anything that would ignore a new file would potentially cause breakage. There might be a potential solution there, but I think it'd need some further design and evaluation of tradeoffs.

For the moment, at least, I'd request that you start with a PR updating the help text, along with a proposal for what a combined option could be called.

Separately from that, if you're interested in changing the behavior further, I'd suggest making a list of use cases, soliciting additional use cases from others, and then showing how proposed solutions would behave for each of those use cases. Such use cases could include, for instance, "adding a new source file and forgetting to git add", or "having a top-level notes file you don't want to ship in the package", or any number of other things.

@arvidn
Copy link

arvidn commented Nov 28, 2022

I want to second this issue. I'm surprised that random files that happen to be in my working dir are included in the package. The cargo-fuzz corpus is created in the working tree, and I learned that cargo publish tried to include it by exceeding the crates.io limit, failing to publish.

What is the recommended way of publishing a package from a repository with additional files, like the fuzzing corpus or other local development files? I want to keep those files, so git clean isn't what I'm looking for.

Ideally, I would like to only include the source files required to build the targets in the crate.

@ChrisDenton
Copy link
Member

There's the include and exclude fields in Cargo.toml which may help: https://doc.rust-lang.org/cargo/reference/manifest.html#the-exclude-and-include-fields

@briansmith
Copy link

briansmith commented Jan 3, 2024

In my case, I would like to have an include-untracked list of files/patterns next to include/exclude in Cargo.toml so everybody knows which files are intentionally not in the VCS, which would ideally accomplish the following:

  • Anybody who reads Cargo.toml would know which files aren't checked into VCS and which aren't (in my case, because they are prebuilt during pulishing)
  • Allow publishing untracked files without using --allow-dirty, so that I don't accidentally publish uncommitted changes to files that are tracked in the VCS.
  • Have Cargo include the VCS info using the git commit hash for the files that are tracked in VCS, which is one of the top requests for my project.
  • Prevent me from publishing if an explicitly-listed (no wildcards) file does not exist.
  • (Least important) Allow me to put those files in .gitignore so I don't accidentally add them to version control. (Currently I have to leave them out of .gitignore so cargo publish --allow-dirty doesn't skip them.)

briansmith added a commit to briansmith/ring that referenced this issue Mar 1, 2024
`--allow-dirty` doesn't allow us to be as granular as we'd like;
see rust-lang/cargo#9398 (comment).

This isn't foolproof but it gets us closer to what we want.

The new `mk/publish.sh` is a step towards more robust publishing with
tagging, etc.
briansmith added a commit to briansmith/ring that referenced this issue Mar 1, 2024
`--allow-dirty` doesn't allow us to be as granular as we'd like;
see rust-lang/cargo#9398 (comment).

This isn't foolproof but it gets us closer to what we want.

The new `mk/publish.sh` is a step towards more robust publishing with
tagging, etc.
@epage epage added the S-triage Status: This issue is waiting on initial triage. label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug Command-package S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests

7 participants