Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

touch some files when we use them #6477

Merged
merged 5 commits into from
Jan 16, 2019
Merged

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Dec 22, 2018

This is a small change to improve the ability for a third party subcommand to clean up a target folder. I consider this part of the push to experiment with out of tree GC, as discussed in #6229.

how it works?

This updates the modification time of a file in each fingerprint folder and the modification time of the intermediate outputs every time cargo checks that they are up to date. This allows a third party subcommand to look at the modification time of the timestamp file to determine the last time a cargo invocation required that file. This is far more reliable then the current practices of looking at the accessed time. accessed time is not available or disabled on many operating systems, and is routinely set by arbitrary other programs.

is this enough to be useful?

The current implementation of cargo sweep on master will automatically use this data with no change to the code. With this PR, it will work even on systems that do not update accessed time.

This also allows a crude script to clean some of the largest subfolders based on each files modification time.

is this worth adding, or should we just build clean --outdated into cargo?

I would love to see a clean --outdated in cargo! However, I think there is a lot of design work before we can make something good enough to deserve the cargo teams stamp of approval. Especially as an in tree version will have to work with many use cases some of witch are yet to be designed (like distributed builds). Even just including cargo-sweeps existing functionality opens a full bike shop about what arguments to take, and in what form (cargo-sweep takes a days argument, but maybe we should have a minutes or a ISO standard time or ...). This PR, or equivalent, allows out of tree experimentation with all different interfaces, and is basically required for any LRU based system. (For example Crater wants a GC that cleans files in an LRU manner to maintain a target folder below a target size. This is not a use case that is widely enough needed to be worth adding to cargo but one supported by this PR.)

what are the downsides?

  1. There are legitimate performance concerns about writing so many small files during a NOP build.
  2. There are legitimate concerns about unnecessary wrights on read-only filesystems.
  3. If we add this, and it starts seeing widespread use, we may be de facto stabilizing the folder structure we use. (This is probably true of any system that allows out of tree experimentation.)
  4. This may not be an efficient way to store the data. (It does have the advantage of not needing different cargos to manipulate the same file. But if you have a better idea please make a suggestion.)

@rust-highfive
Copy link

r? @ehuss

(rust_highfive has picked a reviewer for you, use r? to override)

@Eh2406

This comment has been minimized.

@ehuss
Copy link
Contributor

ehuss commented Dec 23, 2018

I think this would be useful. I'm not sure if it really needs a dedicated file. I did some tests on an old Windows machine, and I didn't see a noticeable performance difference. However, creating lots of small files generally isn't desirable. Would it be possible to just adjust the mtime of one of the other files? Touching dep-KIND-PKG-HASH may make #2426 worse, so that may not be a good idea, but I think the other two files would be possible options. It should probably get a test, too.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Dec 23, 2018

Good points.

It would be nice if there was a reliable way to know if the fingerprint was made with a cargo that is maintaining the mtime or if falling back to the atime is required.
Cargo can decide not to have a clear signal.

  • If the tools always uses max(atime,mtime) then it will be safely conservative, but I fear some system that always returns 'now' for atime breaking the tooling.
  • If the tool always uses mtime then it will be too aggressive for older Cargos.

The tooling can decide to be to conservative, not support older cargos, or do version number based feature detection. (assuming we don't add a way to opt out.) A separate file makes it very clear if Cargo is making this data available to the tool. (file mtime iif file exists else max atime.) If there is another signale you would prefer, let me know.

I will add a test when we decided what we want the behavior to be and next time I look at the code.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Dec 27, 2018

Thinking about it more, I don't think there is a reasonable way for a cleaner script to handle when cargo is sometimes opting out of this. Version number detection sounds approximately good enough. What is the best way to cross platform touch a file?

@alexcrichton
Copy link
Member

Could we perhaps touch the mtime of the artifacts themselves? (like all the rlibs, binaries, etc). I think that Cargo doesn't compare the mtime on those artifacts (even historical Cargos) and that may make it easiest for other tools too? IIRC we already check for existence of all artifacts on nop builds, so opening up a file handle to set the mtime on it may not take too much longer. If this becomes a performance concern we could always add configuration/env vars to turn it off, but it seems reasonable to me to have it on by default.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 2, 2019

that may make it easiest for other tools too?

In what way? It would make a simple tool, like the published version of cargo-sweep, that does not know about the structure of the target dir closer to being correct but not really useable. I think a simple tool would start leaving the outputs, and delling the fingerprints leading to a complete rebuild. A tool that knows about the structure, like my branch of cargo-sweep, doesn't really care which file we pick as the authoritative one.

@alexcrichton
Copy link
Member

Oh I was just thinking that if the output artifacts had mtimes on them then a tool could just delete any artifact older than N days, and the .fingerprint metadata would need cleaning eventually but it's not really large enough to warrant lots of scrutiny

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 2, 2019

Ok, I am convinced that that could be a good next step!
What is the best way to cross platform touch a file?

let t = FileTime::from_system_time(SystemTime::now());
filetime::set_file_times(path, t, t):

Feels kinda dependent on the SystemTime being the same clock as the File System.

@alexcrichton
Copy link
Member

Hm that's what I would naively say we should do, but I'll admit I have no idea how the filesystem clock and SystemTime clocks are related...

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 9, 2019

@alexcrichton where do we check for existence of all artifacts on nop builds?

@alexcrichton
Copy link
Member

I think here

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 10, 2019

This has been updated. I rebased. I also switch to touching existing files instead of adding a new one. Tuch was imped with SystemTime. After our discussion in #2426, I am not convinced that there is a better alternative, nor that the problems are all that severe. In addition none of this change is for cargos correctness.

The files being touched are:

  • the hash file in the fingerprint. This allows cargo-sweep to clean all of target on systems that do not support atime, with no change to its code.
  • the output artifacts. This allows a simple shell script to clean all files in deps and examples older than a target.

when CI is green, I will update the title and op.

@Eh2406 Eh2406 changed the title adds a timestamp file in the fingerprint folder touch some files when we use them Jan 10, 2019
@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 13, 2019

Added tests. Do people have thoughts, or is this good to go?

Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good. Alex?

@alexcrichton
Copy link
Member

@bors: r=ehuss

👍

@bors
Copy link
Collaborator

bors commented Jan 16, 2019

📌 Commit 97363ca has been approved by ehuss

@bors
Copy link
Collaborator

bors commented Jan 16, 2019

⌛ Testing commit 97363ca with merge 513d230...

bors added a commit that referenced this pull request Jan 16, 2019
touch some files when we use them

This is a small change to improve the ability for a third party subcommand to clean up a target folder. I consider this part of the push to experiment with out of tree GC, as discussed in #6229.

how it works?
--------

This updates the modification time of a file in each fingerprint folder and the modification time of the intermediate outputs every time cargo checks that they are up to date. This allows a third party subcommand to look at the modification time of the timestamp file to determine the last time a cargo invocation required that file. This is far more reliable then the current practices of looking at the `accessed` time. `accessed` time is not available or disabled on many operating systems, and is routinely set by arbitrary other programs.

is this enough to be useful?
--------

The current implementation of cargo sweep on master will automatically use this data with no change to the code. With this PR, it will work even on systems that do not update `accessed` time.

This also allows a crude script to clean some of the largest subfolders based on each files modification time.

is this worth adding, or should we just build `clean --outdated` into cargo?
------
I would love to see a `clean --outdated` in cargo! However, I think there is a lot of design work before we can make something good enough to deserve the cargo teams stamp of approval. Especially as an in tree version will have to work with many use cases some of witch are yet to be designed (like distributed builds). Even just including `cargo-sweep`s existing functionality opens a full bike shop about what arguments to take, and in what form (`cargo-sweep` takes a days argument, but maybe we should have a minutes or a ISO standard time or ...). This PR, or equivalent, allows out of tree experimentation with all different interfaces, and is basically required for any `LRU` based system. (For example [Crater](rust-lang/crater#346) wants a GC that cleans files in an `LRU` manner to maintain a target folder below a target size. This is not a use case that is widely enough needed to be worth adding to cargo but one supported by this PR.)

what are the downsides?
----

1. There are legitimate performance concerns about writing so many small files during a NOP build.
2. There are legitimate concerns about unnecessary wrights on read-only filesystems.
3. If we add this, and it starts seeing widespread use, we may be de facto stabilizing the folder structure we use. (This is probably true of any system that allows out of tree experimentation.)
4. This may not be an efficient way to store the data. (It does have the advantage of not needing different cargos to manipulate the same file. But if you have a better idea please make a suggestion.)
@bors
Copy link
Collaborator

bors commented Jan 16, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: ehuss
Pushing 513d230 to master...

@bors bors merged commit 97363ca into rust-lang:master Jan 16, 2019
bors added a commit to rust-lang/rust that referenced this pull request Jan 19, 2019
Update cargo

Unblocks #56884  cc @euclio

6 commits in 2b4a5f1f0bb6e13759e88ea9512527b0beba154f..ffe65875fd05018599ad07e7389e99050c7915be
2019-01-12 04:13:12 +0000 to 2019-01-17 23:57:50 +0000
- Better error message for bad manifest with `cargo install`. (rust-lang/cargo#6560)
- relax rustdoc output assertion (rust-lang/cargo#6559)
- touch some files when we use them (rust-lang/cargo#6477)
- Add documentation for new package/publish feature flags. (rust-lang/cargo#6553)
- Update chat link to Discord. (rust-lang/cargo#6554)
- Fix typo (rust-lang/cargo#6552)

r? @alexcrichton
bors added a commit that referenced this pull request Jan 20, 2019
Put mtime-on-use behind a feature flag.

This places #6477 behind the `-Z mtime-on-use` feature flag.

The change to update the mtime each time a crate is used has caused a performance regression on the rust playground (rust-lang/rust#57774). It is using about 241 pre-built crates in a Docker container. Due to the copy-on-write nature of Docker, it can take a significant amount of time to update the timestamps (over 10 seconds on slower systems).

cc @Mark-Simulacrum
@Eh2406 Eh2406 deleted the add-a-timestamp-file branch January 23, 2019 02:08
@ehuss ehuss added this to the 1.34.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants