Skip to content

Commit

Permalink
package: canonicalize tar headers for crate packages
Browse files Browse the repository at this point in the history
Currently, when reading a file from disk, we include several pieces of
data from the on-disk file, including the user and group names and IDs,
the device major and minor, the mode, and the timestamp.  This means
that our archives differ between systems, sometimes in unhelpful ways.

In addition, most users probably did not intend to share information
about their user and group settings, operating system and disk type, and
umask.  While these aren't huge privacy leaks, cargo doesn't use them
when extracting archives, so there's no value to including them.

Since using consistent data means that our archives are reproducible and
don't leak user data, both of which are desirable features, let's
canonicalize the header to strip out identifying information.

We set the user and group information to 0 and root, since that's the
only user that's typically consistent among Unix systems.  Setting
these values doesn't create a security risk since tar can't change the
ownership of files when it's running as a normal unprivileged user.

Similarly, we set the device major and minor to 0.  There is no useful
value here that's portable across systems, and it does not affect
extraction in any way.

We also set the timestamp to the same one that we use for generated
files.  This is probably the biggest loss of relevant data, but
considering that cargo doesn't otherwise use it and honoring it makes
the archives unreproducible, we canonicalize it as well.

Finally, we canonicalize the mode of an item we're storing by looking at
the executable bit and using mode 755 if it's set and mode 644 if it's
not.  We already use 644 as the default for generated files, and this is
the same algorithm that Git uses to determine whether a file should be
considered executable.  The tests don't test this case because there's
no portable way to create executable files on Windows.
  • Loading branch information
bk2204 committed Nov 16, 2020
1 parent 436b9eb commit e46ca84
Show file tree
Hide file tree
Showing 2 changed files with 63 additions and 0 deletions.
20 changes: 20 additions & 0 deletions src/cargo/ops/cargo_package.rs
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,23 @@ fn timestamp() -> u64 {
.as_secs()
}

fn canonicalize_header(header: &mut Header) {
// Let's not include information about the user or their system here.
header.set_username("root").unwrap();
header.set_groupname("root").unwrap();
header.set_uid(0);
header.set_gid(0);
header.set_device_major(0).unwrap();
header.set_device_minor(0).unwrap();

let mode = if header.mode().unwrap() & 0o100 != 0 {
0o755
} else {
0o644
};
header.set_mode(mode);
}

fn tar(
ws: &Workspace<'_>,
ar_files: Vec<ArchiveFile>,
Expand Down Expand Up @@ -524,6 +541,8 @@ fn tar(
format!("could not learn metadata for: `{}`", disk_path.display())
})?;
header.set_metadata(&metadata);
header.set_mtime(time);
canonicalize_header(&mut header);
header.set_cksum();
ar.append_data(&mut header, &ar_path, &mut file)
.chain_err(|| {
Expand All @@ -540,6 +559,7 @@ fn tar(
header.set_mode(0o644);
header.set_mtime(time);
header.set_size(contents.len() as u64);
canonicalize_header(&mut header);
header.set_cksum();
ar.append_data(&mut header, &ar_path, contents.as_bytes())
.chain_err(|| format!("could not archive source file `{}`", rel_str))?;
Expand Down
43 changes: 43 additions & 0 deletions tests/testsuite/package.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@ use cargo_test_support::registry::{self, Package};
use cargo_test_support::{
basic_manifest, cargo_process, git, path2url, paths, project, symlink_supported, t,
};
use flate2::read::GzDecoder;
use std::fs::{self, read_to_string, File};
use std::path::Path;
use tar::Archive;

#[cargo_test]
fn simple() {
Expand Down Expand Up @@ -1917,3 +1919,44 @@ src/main.rs
))
.run();
}

#[cargo_test]
fn reproducible_output() {
let p = project()
.file(
"Cargo.toml",
r#"
[project]
name = "foo"
version = "0.0.1"
authors = []
exclude = ["*.txt"]
license = "MIT"
description = "foo"
"#,
)
.file("src/main.rs", r#"fn main() { println!("hello"); }"#)
.build();

// Timestamp is arbitrary and is the same used by git format-patch.
p.cargo("package")
.env("SOURCE_DATE_EPOCH", "1000684800")
.run();
assert!(p.root().join("target/package/foo-0.0.1.crate").is_file());

let f = File::open(&p.root().join("target/package/foo-0.0.1.crate")).unwrap();
let decoder = GzDecoder::new(f);
let mut archive = Archive::new(decoder);
for ent in archive.entries().unwrap() {
let ent = ent.unwrap();
let header = ent.header();
assert_eq!(header.mode().unwrap(), 0o644);
assert_eq!(header.uid().unwrap(), 0);
assert_eq!(header.gid().unwrap(), 0);
assert_eq!(header.mtime().unwrap(), 1000684800);
assert_eq!(header.username().unwrap().unwrap(), "root");
assert_eq!(header.groupname().unwrap().unwrap(), "root");
assert_eq!(header.device_major().unwrap().unwrap(), 0);
assert_eq!(header.device_minor().unwrap().unwrap(), 0);
}
}

0 comments on commit e46ca84

Please sign in to comment.