From 9b9a0f4c0c0d603888d6f69789d0ed31bbef187d Mon Sep 17 00:00:00 2001 From: Elliot Saba Date: Tue, 19 May 2020 18:29:39 -0700 Subject: [PATCH] Implement Scratchspaces This implements functionality and tests for a new `Spaces` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package, or shared among multiple packages. Closes https://github.com/JuliaLang/Pkg.jl/issues/796 --- docs/make.jl | 1 + docs/src/api.md | 18 +- docs/src/glossary.md | 2 +- docs/src/scratch.md | 150 ++++++++++ docs/src/toml-files.md | 2 +- src/API.jl | 269 +++++++++++++----- src/Pkg.jl | 1 + src/Scratch.jl | 227 +++++++++++++++ src/Types.jl | 7 +- test/runtests.jl | 2 +- test/scratch.jl | 156 ++++++++++ test/test_packages/ScratchUsage/Project.toml | 7 + .../ScratchUsage/src/ScratchUsage.jl | 26 ++ .../ScratchUsage/test/runtests.jl | 4 + 14 files changed, 796 insertions(+), 76 deletions(-) create mode 100644 docs/src/scratch.md create mode 100644 src/Scratch.jl create mode 100644 test/scratch.jl create mode 100644 test/test_packages/ScratchUsage/Project.toml create mode 100644 test/test_packages/ScratchUsage/src/ScratchUsage.jl create mode 100644 test/test_packages/ScratchUsage/test/runtests.jl diff --git a/docs/make.jl b/docs/make.jl index 193b300578..d64b4d907b 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -37,6 +37,7 @@ makedocs( "compatibility.md", "registries.md", "artifacts.md", + "spaces.md", # "faq.md", "glossary.md", "toml-files.md", diff --git a/docs/src/api.md b/docs/src/api.md index 15814b193c..702f965f92 100644 --- a/docs/src/api.md +++ b/docs/src/api.md @@ -1,4 +1,4 @@ -# [**12.** API Reference](@id API-Reference) +# [**13.** API Reference](@id API-Reference) This section describes the function interface, or "API mode", for interacting with Pkg.jl. The function API is recommended @@ -87,3 +87,19 @@ Pkg.Artifacts.ensure_all_artifacts_installed Pkg.Artifacts.@artifact_str Pkg.Artifacts.archive_artifact ``` + +## [Scratch Space API Reference](@id Scratch-Space-Reference) + +!!! compat "Julia 1.6" + Pkg's Scratch space API requires at least Julia 1.6. + +```@docs +Pkg.Scratch.get_scratch! +Pkg.Scratch.@get_scratch! +Pkg.Scratch.delete_scratch! +Pkg.Scratch.clear_scratchspaces! +Pkg.Scratch.with_scratch_directory +Pkg.Scratch.scratch_dir +Pkg.Scratch.scratch_path +Pkg.Scratch.track_scratch_access +``` diff --git a/docs/src/glossary.md b/docs/src/glossary.md index f632140de3..2d5d840b45 100644 --- a/docs/src/glossary.md +++ b/docs/src/glossary.md @@ -1,4 +1,4 @@ -# [**9.** Glossary](@id Glossary) +# [**10.** Glossary](@id Glossary) **Project:** a source tree with a standard layout, including a `src` directory for the main body of Julia code, a `test` directory for testing the project, diff --git a/docs/src/scratch.md b/docs/src/scratch.md new file mode 100644 index 0000000000..726b147a35 --- /dev/null +++ b/docs/src/scratch.md @@ -0,0 +1,150 @@ +# [**9.** Scratch Spaces](@id Scratch-Spaces) + +!!! compat "Julia 1.6" + Pkg's Scratch Spaces functionality requires at least Julia 1.6. + +`Pkg` can manage and automatically garbage collet scratch spaces of temporary or readily-recreatable data. +These spaces can contain datasets, text, binaries, or any other kind of data that would be convenient to store, but which is non-fatal to have garbage collected if it has not been accessed recently, or if the owning package has been uninstalled. +As compared to [Artifacts](@ref), these containers of data are mutable and should be treated as ephemeral; all usage of scratch spaces should assume that the data stored within them could be gone by the next time your code is run. +In the current implementation, scratch spaces are removed during Pkg garbage collection if the scratch space has not been accessed for a period of time (see the `scratch_cleanup_period` keyword argument to [Pkg.gc](@ref)), or if the owning package has been removed. +Users can also request a full wipe of all scratch spaces to clean up unused disk space through `Pkg.Scratch.clear_scratchspaces!()`. + +## API overview + +Scratch space usage is performed primarily through one function: `get_scratch!()`. +It provides a single interface for creating and getting previously-created spaces, either tied to a package by its UUID, or as a global scratch space that can be accessed by any package. +Here is an example where a package creates a scratch space that is namespaced to its own UUID: + +```julia +module ScratchExample +using Pkg, Pkg.Scratch + +# This will be filled in inside `__init__()` +download_cache = "" + +# Downloads a resource, stores it within a scratchspace +function download_dataset(url) + fname = joinpath(download_cache, basename(url)) + if !isfile(fname) + download(url, fname) + end + return fname +end + +function __init__() + global download_cache = @get_scratch!("downloaded_files") +end + +end # module ScratchExample +``` + +Note that we initialize the `download_cache` within `__init__()` so that our packages are as relocatable as possible; we typically do not want to bake absolute paths into our precompiled files. +This makes use of the `@get_scratch!()` macro, which is identical to the `get_scratch!()` method, except it automatically determines the UUID of the calling module, if possible. The user can manually pass in a `Module` as well for a slightly more verbose incantation: +```julia +function __init__() + global download_cache = get_scratch!("downloaded_files", @__MODULE__) +end +``` + +If a user wishes to manually delete a scratch space, the method `delete_scratch!(key; pkg_uuid)` is the natural analog to `get_scratch!()`, however in general users will not need to do so, the scratch spaces will be garbage collected by `Pkg` automatically. + +For a full listing of docstrings and methods, see the [Scratch Space Reference](@ref) section. + +## Use cases + +Good use cases for a Pkg scratch space include: + +* Caching downloads of files that must be routinely accessed and modified by a package. Files that must be modified are a bad fit for the immutable [Artifacts](@ref) abstraction, and files can always be re-downloaded if the cache is wiped by the user. + +* Generated data that depends on the characteristics of the host system. Examples are compiled binaries, fontcache system font folder inspection output, generated CUDA bitcode files, etc... Objects that would be difficult to compute off of the user's machine, and that can be recreated without user intervention are a great fit. + +* Directories that should be shared between multiple packages in a single depot. The scratch space keying mechanism makes it simple to provide scratch spaces that can be shared between different versions of a package, or even between different packages. This allows packages to provide a scratch space where other packages can easily find the generated data, however the typical race condition warnings apply here; always design your access patterns assuming another process could be reading or writing to this scratch space at any time. + +Bad use cases for a Pkg scratch space include (but are not limited to): + +* Anything that requires user input to regenerate. Because scratch spaces can disappear, it is a bad experience for the user to need to answer questions at seemingly random times when the space must be rebuilt. + +* Storing data that is write-once, read-many times. We suggest you use [Artifacts](@ref) for that, as they are much more persistent and are built to become portable (so that other machines do not have to generate the data, they can simple make use of the artifact by downloading it from a hosted location). Scratch spaces generally should follow a write-many read-many access pattern. + +## Tips and Tricks + +> Can I trigger data regeneration if the scratch space is found to be empty/files are missing? + +Yes, this is quite simple; just check the contents of the directory when you first call `get_scratch!()`, and if it's empty, run your generation function: + +```julia +using Pkg, Pkg.Scratch + +function get_dataset_dir() + dataset_dir = @get_scratch!("dataset") + if isempty(readdir(dataset_dir)) + perform_expensive_dataset_generation(dataset_dir) + end + return dataset_dir +end +``` + +> Can I create a scratchs pace that is not shared across versions of my package? + +Yes! Make use of the `key` parameter and Pkg's ability to look up the current version of your package at compile-time: + +```julia +module VersionSpecificExample +using Pkg, Pkg.Scratch + +# Get the current version at compile-time, that's fine it's not going to change. ;) +const pkg_version = Pkg.API.get_version(Pkg.API.get_uuid(@__MODULE__)) + +# This will be filled in by `__init__()`; it might change if we get deployed somewhere +version_specific_scratch = Ref{String}() + +function __init__() + # This space will be unique between versions of my package that different major and + # minor versions, but allows patch releases to share the same. + scratch_name = "data_for_version-$(pkg_version.major).$(pkg_version.minor)" + global version_specific_scratch[] = @get_scratch!(scratch_name) +end + +end # module +``` + +> Can I use a scratch space as a temporary workspace, then turn it into an Artifact? + +Yes! Once you're satisfied with your dataset that has been cooking inside a space, and you're ready to share it with the world as an immutable artifact, you can use `create_artifact()` to create an artifact from the space, `archive_artifact()` to get a tarball that you can upload somewhere, and `bind_artifact!()` to write out an `Artifacts.toml` that allows others to download and use it: + +```julia +using Pkg, Pkg.Scratch, Pkg.Artifacts + +function export_scratch(scratch_name::String, github_repo::String) + scratch_dir = @get_scratch!(scratch_name) + + # Copy space directory over to an Artifact + hash = create_artifact() do artifact_dir + rm(artifact_dir) + cp(scratch_dir, artifact_dir) + end + + # Archive artifact out to a tarball. Since `upload_tarball()` is not a function that + # exists, users must either write it themselves (uploading to whatever hosting + # provider they prefer), or run each line of this `do`-block manually, upload the + # tarball manually, record its URL, and pass that to `bind_artifact!()`. + mktempdir() do upload_dir + tarball_path = joinpath(upload_dir, "$(scratch_name).tar.gz") + tarball_hash = archive_artifact(hash, tarball_path) + + # Upload tarball to a hosted site somewhere. Note; this function does not + # exist, it's put here simply to show the flow of events. + tarball_url = upload_tarball(tarball_path) + + # Bind artifact to an Artifacts.toml file in the current directory; this file can + # be used by others to download and use your newly-created Artifact! + bind_artifact!( + joinpath(@__DIR__, "./Artifacts.toml"), + scratch_name, + hash; + download_info=[(tarball_url, tarball_hash)], + force=true, + ) + end +end +``` \ No newline at end of file diff --git a/docs/src/toml-files.md b/docs/src/toml-files.md index d79f9ef53a..7ae8d626b0 100644 --- a/docs/src/toml-files.md +++ b/docs/src/toml-files.md @@ -1,4 +1,4 @@ -# [**10.** `Project.toml` and `Manifest.toml`](@id Project-and-Manifest) +# [**11.** `Project.toml` and `Manifest.toml`](@id Project-and-Manifest) Two files that are central to Pkg are `Project.toml` and `Manifest.toml`. `Project.toml` and `Manifest.toml` are written in [TOML](https://github.com/toml-lang/toml) (hence the diff --git a/src/API.jl b/src/API.jl index c1619b6dc8..8b28faea3c 100644 --- a/src/API.jl +++ b/src/API.jl @@ -329,15 +329,24 @@ function test(ctx::Context, pkgs::Vector{PackageSpec}; end """ - gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) + gc(ctx::Context=Context(); collect_delay::Period=Day(7), + scratch_cleanup_period::Period=Day(31), + kwargs...) Garbage-collect package and artifact installations by sweeping over all known `Manifest.toml` and `Artifacts.toml` files, noting those that have been deleted, and then -finding artifacts and packages that are thereafter not used by any other projects. This -method will only remove package versions and artifacts that have been continually un-used -for a period of `collect_delay`; which defaults to seven days. +finding artifacts and packages that are thereafter not used by any other projects, +marking them as "orphaned". This method will only remove orphaned objects (package +versions, artifacts, and scratch spaces) that have been continually un-used for a period +of `collect_delay`; which defaults to seven days. + +This method will automatically mark as orphaned any scratch spaces that have not been +accessed for at least `scratch_cleanup_period` days, defaulting to twenty-one days. The +orphaned spaces will then be removed after the typical `collect_delay` timeperiod. """ -function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) +function gc(ctx::Context=Context(); collect_delay::Period=Day(7), + scratch_cleanup_period::Period=Day(21), + kwargs...) Context!(ctx; kwargs...) env = ctx.env @@ -352,61 +361,83 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) manifest_usage_by_depot = Dict{String, Dict{String, DateTime}}() artifact_usage_by_depot = Dict{String, Dict{String, DateTime}}() + # Collect both last know usage dates, as well as parent projects for each scratch space + scratch_usage_by_depot = Dict{String, Dict{String, DateTime}}() + scratch_parents_by_depot = Dict{String, Dict{String, Set{String}}}() + # Load manifest files from all depots for depot in depots() # When a manifest/artifact.toml is installed/used, we log it within the # `manifest_usage.toml` files within `write_env_usage()` and `bind_artifact!()` - function collect_usage!(usage_data::Dict, usage_filepath) + function reduce_usage!(f::Function, usage_filepath) if !isfile(usage_filepath) - return usage_data + return end for (filename, infos) in TOML.parse(String(read(usage_filepath))) - # If this file was already listed in this index, update it with the later - # information - for info in infos - usage_data[filename] = max( - get(usage_data, filename, DateTime(0)), - DateTime(info["time"]), - ) - end + f.(Ref(filename), infos) end - return usage_data end # Extract usage data from this depot, (taking only the latest state for each # tracked manifest/artifact.toml), then merge the usage values from each file # into the overall list across depots to create a single, coherent view across # all depots. - manifest_usage_by_depot[depot] = Dict{String, DateTime}() - artifact_usage_by_depot[depot] = Dict{String, DateTime}() - collect_usage!( - manifest_usage_by_depot[depot], - joinpath(logdir(depot), "manifest_usage.toml"), - ) - collect_usage!( - artifact_usage_by_depot[depot], - joinpath(logdir(depot), "artifact_usage.toml"), - ) + usage = Dict{String, DateTime}() + reduce_usage!(joinpath(logdir(depot), "manifest_usage.toml")) do filename, info + # For Manifest usage, store only the last DateTime for each filename found + usage[filename] = max(get(usage, filename, DateTime(0)), DateTime(info["time"])) + end + manifest_usage_by_depot[depot] = usage + + usage = Dict{String, DateTime}() + reduce_usage!(joinpath(logdir(depot), "artifact_usage.toml")) do filename, info + # For Artifact usage, store only the last DateTime for each filename found + usage[filename] = max(get(usage, filename, DateTime(0)), DateTime(info["time"])) + end + artifact_usage_by_depot[depot] = usage + + # track last-used + usage = Dict{String, DateTime}() + parents = Dict{String, Set{String}}() + reduce_usage!(joinpath(logdir(depot), "scratch_usage.toml")) do filename, info + # For Artifact usage, store only the last DateTime for each filename found + usage[filename] = max(get(usage, filename, DateTime(0)), DateTime(info["time"])) + if !haskey(parents, filename) + parents[filename] = Set{String}() + end + for parent in info["parent_projects"] + push!(parents[filename], parent) + end + end + scratch_usage_by_depot[depot] = usage + scratch_parents_by_depot[depot] = parents end - # Next, figure out which files are still extant - all_index_files = vcat( - unique(f for (_, files) in manifest_usage_by_depot for f in keys(files)), - unique(f for (_, files) in artifact_usage_by_depot for f in keys(files)), - ) - all_index_files = Set(filter(Pkg.isfile_nothrow, all_index_files)) + # Next, figure out which files are still existent + all_manifest_tomls = unique(f for (_, files) in manifest_usage_by_depot for f in keys(files)) + all_artifact_tomls = unique(f for (_, files) in artifact_usage_by_depot for f in keys(files)) + all_scratch_dirs = unique(f for (_, dirs) in scratch_usage_by_depot for f in keys(dirs)) + all_scratch_parents = Set{String}() + for (depot, parents) in scratch_parents_by_depot + for parent in values(parents) + union!(all_scratch_parents, parent) + end + end + #all_scratch_parents = union!(all_scratch_parents, (union(values(parents)...) for (_, parents) in scratch_parents_by_depot)...) - # Immediately write this back as condensed manifest_usage.toml files - function write_condensed_usage(usage_by_depot, fname) - for (depot, usage) in usage_by_depot - # Keep only the keys of the files that are still extant - usage = filter(p -> p[1] in all_index_files, usage) + all_manifest_tomls = Set(filter(Pkg.isfile_nothrow, all_manifest_tomls)) + all_artifact_tomls = Set(filter(Pkg.isfile_nothrow, all_artifact_tomls)) + all_scratch_dirs = Set(filter(Pkg.isdir_nothrow, all_scratch_dirs)) + all_scratch_parents = Set(filter(Pkg.isfile_nothrow, all_scratch_parents)) - # Expand it back into a dict of arrays-of-dicts - usage = Dict(k => [Dict("time" => v)] for (k, v) in usage) + # Immediately write these back as condensed toml files + function write_condensed_toml(f::Function, usage_by_depot, fname) + for (depot, usage) in usage_by_depot + # Run through user-provided filter/condenser + usage = f(depot, usage) - # Write it out to disk within this depot + # Write out the TOML file for this depot usage_path = joinpath(logdir(depot), fname) if !isempty(usage) || isfile(usage_path) open(usage_path, "w") do io @@ -415,13 +446,44 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) end end end - write_condensed_usage(manifest_usage_by_depot, "manifest_usage.toml") - write_condensed_usage(artifact_usage_by_depot, "artifact_usage.toml") - # Next, we will process the manifest.toml and artifacts.toml files separately, - # extracting from them the paths of the packages and artifacts that they reference. - all_manifest_files = filter(f -> endswith(f, "Manifest.toml"), all_index_files) - all_artifacts_files = filter(f -> !endswith(f, "Manifest.toml"), all_index_files) + # Write condensed Manifest usage + write_condensed_toml(manifest_usage_by_depot, "manifest_usage.toml") do depot, usage + # Keep only manifest usage markers that are still existent + filter!(((k,v),) -> k in all_manifest_tomls, usage) + + # Expand it back into a dict-of-dicts + return Dict(k => [Dict("time" => v)] for (k, v) in usage) + end + + # Write condensed Artifact usage + write_condensed_toml(artifact_usage_by_depot, "artifact_usage.toml") do depot, usage + filter!(((k,v),) -> k in all_artifact_tomls, usage) + return Dict(k => [Dict("time" => v)] for (k, v) in usage) + end + + # Write condensed scratch space usage + write_condensed_toml(scratch_usage_by_depot, "scratch_usage.toml") do depot, usage + # Keep only scratch directories that still exist + filter!(((k,v),) -> k in all_scratch_dirs, usage) + + # Expand it back into a dict-of-dicts + expanded_usage = Dict{String,Vector{Dict}}() + for (k, v) in usage + # Drop scratch spaces whose parents are all non-existant + parents = scratch_parents_by_depot[depot][k] + filter!(p -> p in all_scratch_parents, parents) + if isempty(parents) + continue + end + + expanded_usage[k] = [Dict( + "time" => v, + "parent_projects" => collect(parents), + )] + end + return expanded_usage + end function process_manifest_pkgs(path) # Read the manifest in @@ -454,7 +516,7 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) function process_artifacts_toml(path, packages_to_delete) # Not only do we need to check if this file doesn't exist, we also need to check # to see if it this artifact is contained within a package that is going to go - # away. This places an inherent ordering between marking packages and marking + # away. This places an implicit ordering between marking packages and marking # artifacts; the package marking must be done first so that we can ensure that # all artifacts that are solely bound within such packages also get reaped. if any(startswith(path, package_dir) for package_dir in packages_to_delete) @@ -482,7 +544,39 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) return artifact_path_list end - # Mark packages/artifacts as active or not by calling the appropriate + function process_scratchspace(path, packages_to_delete) + # Find all parents of this path and its latest access time + parents = String[] + last_access = DateTime(0) + + # It is slightly awkward that we need to reach out to our `*_by_depot` + # datastructures here; that's because unlike Artifacts and Manifests we're not + # parsing a TOML file to find paths within it here, we're actually doing the + # inverse, finding files that point to this directory. + for (depot, parent_map) in scratch_parents_by_depot + if haskey(parent_map, path) + append!(parents, parent_map[path]) + end + if haskey(scratch_usage_by_depot[depot], path) + last_access = max(last_access, scratch_usage_by_depot[depot][path]) + end + end + + # Look to see if all parents are packages that will be removed + filter!(p -> !any(startswith(path, package_dir) for package_dir in packages_to_delete), parents) + if isempty(parents) + return nothing + end + + # Check to see what the last access time was; if it's father back than our + # `scratch_cleanup_period`, then do not mark this path + if now() - last_access > scratch_cleanup_period + return nothing + end + return [path] + end + + # Mark packages/artifacts as active or not by calling the appropriate user function function mark(process_func::Function, index_files; do_print=true) marked_paths = String[] for index_file in index_files @@ -528,7 +622,7 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) # Scan manifests, parse them, read in all UUIDs listed and mark those as active printpkgstyle(ctx, :Active, "manifests:") - packages_to_keep = mark(process_manifest_pkgs, all_manifest_files) + packages_to_keep = mark(process_manifest_pkgs, all_manifest_tomls) # Do an initial scan of our depots to get a preliminary `packages_to_delete`. packages_to_delete = String[] @@ -557,8 +651,10 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) # `packages_to_delete`, as `process_artifacts_toml()` uses it internally to discount # `Artifacts.toml` files that will be deleted by the future culling operation. printpkgstyle(ctx, :Active, "artifacts:") - artifacts_to_keep = mark(x -> process_artifacts_toml(x, packages_to_delete), all_artifacts_files) - repos_to_keep = mark(process_manifest_repos, all_manifest_files; do_print=false) + artifacts_to_keep = mark(x -> process_artifacts_toml(x, packages_to_delete), all_artifact_tomls) + repos_to_keep = mark(process_manifest_repos, all_manifest_tomls; do_print=false) + printpkgstyle(ctx, :Active, "scratchspaces:") + spaces_to_keep = mark(x -> process_scratchspace(x, packages_to_delete), all_scratch_dirs) # Collect all orphaned paths (packages, artifacts and repos that are not reachable). These # are implicitly defined in that we walk all packages/artifacts installed, then if @@ -566,6 +662,7 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) packages_to_delete = String[] artifacts_to_delete = String[] repos_to_delete = String[] + spaces_to_delete = String[] for depot in depots() # We track orphaned objects on a per-depot basis, writing out our `orphaned.toml` @@ -574,8 +671,8 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) depot_orphaned_packages = String[] depot_orphaned_artifacts = String[] depot_orphaned_repos = String[] + depot_orphaned_scratchspaces = String[] - # ??: This code block is identical to one a bit above packagedir = abspath(depot, "packages") if isdir(packagedir) for name in readdir(packagedir) @@ -615,6 +712,21 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) end end + scratchdir = abspath(depot, "scratchspaces") + if isdir(scratchdir) + for uuid in readdir(scratchdir) + uuid_dir = joinpath(scratchdir, uuid) + !isdir(uuid_dir) && continue + for space in readdir(uuid_dir) + space_dir = joinpath(uuid_dir, space) + !isdir(space_dir) && continue + if !(space_dir in spaces_to_keep) + push!(depot_orphaned_scratchspaces, space_dir) + end + end + end + end + # Read in this depot's `orphaned.toml` file: orphanage_file = joinpath(logdir(depot), "orphaned.toml") new_orphanage = Dict{String, DateTime}() @@ -629,6 +741,7 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) merge_orphanages!(new_orphanage, depot_orphaned_packages, packages_to_delete, old_orphanage) merge_orphanages!(new_orphanage, depot_orphaned_artifacts, artifacts_to_delete, old_orphanage) merge_orphanages!(new_orphanage, depot_orphaned_repos, repos_to_delete, old_orphanage) + merge_orphanages!(new_orphanage, depot_orphaned_scratchspaces, spaces_to_delete, old_orphanage) # Write out the `new_orphanage` for this depot if !isempty(new_orphanage) || isfile(orphanage_file) @@ -675,6 +788,7 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) package_space_freed = 0 repo_space_freed = 0 artifact_space_freed = 0 + scratch_space_freed = 0 for path in packages_to_delete package_space_freed += delete_path(path) end @@ -684,6 +798,9 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) for path in artifacts_to_delete artifact_space_freed += delete_path(path) end + for path in spaces_to_delete + scratch_space_freed += delete_path(path) + end # Prune package paths that are now empty for depot in depots() @@ -699,27 +816,41 @@ function gc(ctx::Context=Context(); collect_delay::Period=Day(7), kwargs...) end end + # Prune scratch space UUID folders that are now empty + for depot in depots() + scratch_dir = abspath(depot, "scratchspaces") + !isdir(scratch_dir) && continue + + for uuid in readdir(scratch_dir) + uuid_dir = joinpath(scratch_dir, uuid) + !isdir(uuid_dir) && continue + if isempty(readdir(uuid_dir)) + Base.rm(uuid_dir) + end + end + end + ndel_pkg = length(packages_to_delete) - ndel_repos = length(repos_to_delete) + ndel_repo = length(repos_to_delete) ndel_art = length(artifacts_to_delete) + ndel_space = length(spaces_to_delete) - if ndel_pkg > 0 - s = ndel_pkg == 1 ? "" : "s" - bytes_saved_string = pretty_byte_str(package_space_freed) - printpkgstyle(ctx, :Deleted, "$(ndel_pkg) package installation$(s) ($bytes_saved_string)") - end - if ndel_repos > 0 - s = ndel_repos == 1 ? "" : "s" - bytes_saved_string = pretty_byte_str(repo_space_freed) - printpkgstyle(ctx, :Deleted, "$(ndel_repos) repo$(s) ($bytes_saved_string)") - end - if ndel_art > 0 - s = ndel_art == 1 ? "" : "s" - bytes_saved_string = pretty_byte_str(artifact_space_freed) - printpkgstyle(ctx, :Deleted, "$(ndel_art) artifact installation$(s) ($bytes_saved_string)") + function print_deleted(ndel, freed, name) + if ndel <= 0 + return + end + + s = ndel == 1 ? "" : "s" + bytes_saved_string = pretty_byte_str(freed) + printpkgstyle(ctx, :Deleted, "$(ndel) $(name)$(s) ($bytes_saved_string)") end - if ndel_pkg == 0 & ndel_art == 0 && ndel_repos == 0 - printpkgstyle(ctx, :Deleted, "no artifacts, repos or packages") + print_deleted(ndel_pkg, package_space_freed, "package installation") + print_deleted(ndel_repo, repo_space_freed, "repo") + print_deleted(ndel_art, artifact_space_freed, "artifact installation") + print_deleted(ndel_space, scratch_space_freed, "scratchspace") + + if ndel_pkg == 0 && ndel_art == 0 && ndel_repo == 0 && ndel_space == 0 + printpkgstyle(ctx, :Deleted, "no artifacts, repos, packages or scratchspaces") end return diff --git a/src/Pkg.jl b/src/Pkg.jl index 7c1b997972..f7da06be5f 100644 --- a/src/Pkg.jl +++ b/src/Pkg.jl @@ -49,6 +49,7 @@ include("BinaryPlatforms.jl") include("Types.jl") include("Resolve/Resolve.jl") include("Artifacts.jl") +include("Scratch.jl") include("Operations.jl") include("API.jl") include("Registry.jl") diff --git a/src/Scratch.jl b/src/Scratch.jl new file mode 100644 index 0000000000..089b132cbc --- /dev/null +++ b/src/Scratch.jl @@ -0,0 +1,227 @@ +module Scratch +import ...Pkg +import Base: UUID +using ...Pkg.TOML, Dates + +export with_scratch_directory, scratch_dir, get_scratch!, delete_scratch!, clear_scratchspaces!, @get_scratch! + +const scratch_dir_OVERRIDE = Ref{Union{String,Nothing}}(nothing) +""" + with_scratch_directory(f::Function, scratch_dir::String) + +Helper function to allow temporarily changing the scratch space directory. When this is +set, no other directory will be searched for spaces, and new spaces will be created +within this directory. Similarly, removing a scratch space will only effect the given +scratch directory. +""" +function with_scratch_directory(f::Function, scratch_dir::String) + try + scratch_dir_OVERRIDE[] = scratch_dir + f() + finally + scratch_dir_OVERRIDE[] = nothing + end +end + +""" + scratch_dir(args...) + +Returns a path within the current depot's `scratchspaces` directory. This location can +be overridden via `with_scratch_directory()`. +""" +function scratch_dir(args...) + if scratch_dir_OVERRIDE[] === nothing + return abspath(Pkg.depots1(), "scratchspaces", args...) + else + # If we've been given an override, use _only_ that directory. + return abspath(scratch_dir_OVERRIDE[], args...) + end +end + +""" + scratch_path(key, pkg_uuid) + +Common utility function to return the path of a scratch space, keyed by the given +parameters. Users should use `get_scratch!()` for most user-facing usage. +""" +function scratch_path(key::AbstractString, pkg_uuid::Union{UUID,Nothing} = nothing) + # If we were not given a UUID, we use the "global space" UUID: + if pkg_uuid === nothing + pkg_uuid = UUID(UInt128(0)) + end + + return scratch_dir(string(pkg_uuid), key) +end + +# Session-based space access time tracker +scratch_access_timers = Dict{String,Float64}() +""" + track_scratch_access(pkg_uuid, scratch_path) + +We need to keep track of who is using which spaces, so we know when it is advisable to +remove them during a GC. We do this by attributing accesses of spaces to `Manifest.toml` +files in much the same way that package versions themselves are logged upon install, only +instead of having the manifest information implicitly available, we must rescue it out +from the currently-active Pkg Env. If we cannot do that, it is because someone is doing +something weird like opening a space for a Pkg UUID that is not loadable, which we will +simply not track; that space will be reaped after the appropriate time in an orphanage. + +If `pkg_uuid` is explicitly set to `nothing`, this space is treated as belonging to the +default global manifest next to the global project at `Base.load_path_expand("@v#.#")`. + +While package and artifact access tracking can be done at `add()`/`instantiate()` time, +we must do it at access time for spaces, as we have no declarative list of spaces that +a package may or may not access throughout its lifetime. To avoid building up a +ludicrously large number of accesses through programs that e.g. call `get_scratch!()` in a +loop, we only write out usage information for each space once per day at most. +""" +function track_scratch_access(pkg_uuid::Union{UUID,Nothing}, scratch_path::AbstractString) + # Don't write this out more than once per day within the same Julia session. + curr_time = time() + if get(scratch_access_timers, scratch_path, 0.0) >= curr_time - 60*60*24 + return + end + + function find_project_file(pkg_uuid) + # The simplest case (`pkg_uuid` == `nothing`) simply attributes the space to + # the global depot environment, which will never cause the space to be GC'ed + # because it has been removed, as long as the depot itself is intact. + if pkg_uuid === nothing + return Base.load_path_expand("@v#.#") + end + + # The slightly more complicated case inspects the currently-loaded Pkg env + # to find the project file that we should tie our lifetime to. If we can't + # find it, we'll return `nothing` and skip tracking access. + ctx = Pkg.Types.Context() + + # Check to see if the UUID is the overall project itself: + if ctx.env.pkg !== nothing && ctx.env.pkg.uuid == pkg_uuid + return ctx.env.project_file + end + + # Finally, check to see if the package is loadable from the current environment + if haskey(ctx.env.manifest, pkg_uuid) + pkg_entry = ctx.env.manifest[pkg_uuid] + pkg_path = Pkg.Operations.source_path( + ctx, + Pkg.Types.PackageSpec( + name=pkg_entry.name, + uuid=pkg_uuid, + tree_hash=pkg_entry.tree_hash, + path=pkg_entry.path, + ) + ) + project_path = joinpath(pkg_path, "Project.toml") + if isfile(project_path) + return project_path + end + end + + # If we couldn't find anything to attribute the space to, return `nothing`. + return nothing + end + + # We must decide which manifest to attribute this space to. + project_file = abspath(find_project_file(pkg_uuid)) + + # If we couldn't find one, skip out. + if project_file === nothing + return + end + + entry = Dict( + "time" => now(), + "parent_projects" => [project_file], + ) + Pkg.Types.write_env_usage(abspath(scratch_path), "scratch_usage.toml", entry) + + # Record that we did, in fact, write out the space access time + scratch_access_timers[scratch_path] = curr_time +end + + +const VersionConstraint = Union{VersionNumber,AbstractString,Nothing} + +""" + get_scratch!(key::AbstractString, parent_pkg = nothing) + +Returns the path to (or creates) a space. + +If `parent_pkg` is given (either as a `UUID` or as a `Module`), the scratch space is +namespaced with that package's UUID, so that it will not conflict with any other space +with the same name but a different parent package UUID. The space's lifecycle is tied +to that parent package, allowing the space to be garbage collected if all versions of the +package that used it have been removed. + +If `parent_pkg` is not defined, or is a `Module` without a root UUID (e.g. `Main`, +`Base`, an anonymous module, etc...) a global scratch space that does not have any +explicit parent will be created. + +In the current implementation, scratch spaces (both parented and global) are removed if +they have not been accessed for a predetermined amount of time. Parented scratch spaces +can be removed sooner if their parent package has been garbage collected. See `Pkg.gc()` +and `track_scratch_access()` for more details. + +!!! note + Scratch spaces should never be treated as persistent storage; all content within them + must be nonessential or easily recreatable. All lifecycle guarantees set a maximum + lifetime for the space, never a minimum. +""" +function get_scratch!(key::AbstractString, parent_pkg::Union{UUID,Nothing} = nothing) + # Calculate the path and create the containing folder + path = scratch_path(key, parent_pkg) + mkpath(path) + + # We need to keep track of who is using which spaces, so we track usage in a log + track_scratch_access(parent_pkg, path) + return path +end +function get_scratch!(key::AbstractString, parent_pkg::Module) + return get_scratch!(key, Base.PkgId(parent_pkg).uuid) +end + +""" + delete_scratch!(key, parent_pkg) + +Explicitly deletes a scratch space created through `get_scratch!()`. +""" +function delete_scratch!(key::AbstractString, parent_pkg::Union{UUID,Nothing} = nothing) + path = scratch_path(key, parent_pkg) + rm(path; force=true, recursive=true) + delete!(scratch_access_timers, path) + return nothing +end +function delete_scratch!(key::AbstractString, parent_pkg::Module) + return delete_scratch!(key, Base.PkgId(parent_pkg).uuid) +end + +""" + clear_scratchspaces!() + +Delete all scratch spaces in the current depot. +""" +function clear_scratchspaces!() + rm(scratch_dir(); force=true, recursive=true) + empty!(scratch_access_timers) + return nothing +end + +""" + @get_scratch!(key) + +Convenience macro that gets/creates a scratch space with the given key and parented to +the package the calling module belongs to. If the calling module does not belong to a +package, (e.g. it is `Main`, `Base`, an anonymous module, etc...) the UUID will be taken +to be `nothing`, creating a global scratchspace. +""" +macro get_scratch!(key) + # Note that if someone uses this in the REPL, it will return `nothing`, and thereby + # create a global scratch space. + uuid = Base.PkgId(__module__).uuid + return quote + get_scratch!($(esc(key)), parent_pkg=$(esc(uuid))) + end +end + +end # module Scratch \ No newline at end of file diff --git a/src/Types.jl b/src/Types.jl index 610c4ec9ff..2e9ca5da38 100644 --- a/src/Types.jl +++ b/src/Types.jl @@ -386,16 +386,17 @@ function Context!(ctx::Context; kwargs...) return ctx end -function write_env_usage(source_file::AbstractString, usage_filepath::AbstractString) +function write_env_usage(source_file::AbstractString, usage_filepath::AbstractString, + entry::Dict = Dict("time" => now())) # Don't record ghost usage - !isfile(source_file) && return + !ispath(source_file) && return # Ensure that log dir exists !ispath(logdir()) && mkpath(logdir()) # Generate entire entry as a string first entry = sprint() do io - TOML.print(io, Dict(source_file => [Dict("time" => now())])) + TOML.print(io, Dict(source_file => [entry])) end # Append entry to log file in one chunk diff --git a/test/runtests.jl b/test/runtests.jl index 7901f79d5b..cf646c9a7f 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -15,6 +15,7 @@ include("api.jl") include("registry.jl") include("subdir.jl") include("artifacts.jl") +include("scratch.jl") include("binaryplatforms.jl") include("platformengines.jl") include("sandbox.jl") @@ -22,5 +23,4 @@ include("resolve.jl") # clean up locally cached registry rm(joinpath(@__DIR__, "registries"); force = true, recursive = true) - end # module diff --git a/test/scratch.jl b/test/scratch.jl new file mode 100644 index 0000000000..ebab3880cd --- /dev/null +++ b/test/scratch.jl @@ -0,0 +1,156 @@ +module SpacesTests +# Ensure we are using the correct Pkg, and that we get our testing utils +import ..Pkg +using Pkg.Scratch, Test, Dates +using ..Utils + + +function install_test_ScratchUsage(project_path::String, version::VersionNumber) + # Clear out any previously-installed ScratchUsage versions + rm(joinpath(project_path, "ScratchUsage"); force=true, recursive=true) + + # Copy ScratchUsage into our temporary project path + copy_test_package(project_path, "ScratchUsage") + + # Overwrite the version with the given version (So that we can test our version-specific + # code within `ScratchUsage`) + fpath = joinpath(project_path, "ScratchUsage", "Project.toml") + write(fpath, replace(read(fpath, String), "1.2.3" => string(version))) + + # dev() that path, to add it to the environment, then test it! + Pkg.develop(path=joinpath(project_path, "ScratchUsage")) + Pkg.test("ScratchUsage") +end + +@testset "Spaces Basics" begin + # Run everything in a separate depot, so that we can test GC'ing and whatnot + temp_pkg_dir() do project_path + # Create a global scratch space, ensure it exists and is writable + dir = get_scratch!("test") + @test isdir(dir) + @test startswith(dir, scratch_dir()) + touch(joinpath(dir, "foo")) + @test readdir(dir) == ["foo"] + + # Test that this created a `scratch_usage.toml` file, and that accessing it + # again does not increase the size of the scratch_usage.toml file, since we + # only mark usage once every so often per julia session. + usage_path = joinpath(Pkg.logdir(), "scratch_usage.toml") + @test isfile(usage_path) + size = filesize(usage_path) + dir = get_scratch!("test") + @test size == filesize(usage_path) + + # But accessing it from a new Julia instance WILL increase its size: + code = "import Pkg; Pkg.Scratch.get_scratch!(\"test\")" + run(setenv( + `$(Base.julia_cmd()) --project=$(dirname(@__DIR__)) -e $code`, + "JULIA_DEPOT_PATH" => Pkg.depots1(), + )) + @test size < filesize(usage_path) + + # Delete the scratch space, ensure it's gone. + delete_scratch!("test") + @test !isdir(dir) + end +end + +@testset "Spaces Namespacing" begin + temp_pkg_dir() do project_path + # Add this Pkg so that any usage of `Pkg` by a julia started in this + # environment will use it. + add_this_pkg() + su_uuid = "93485645-17f1-6f3b-45bc-419db53815ea" + global_uuid = string(Base.UUID(UInt128(0))) + + # Touch the spaces of a ScratchUsage v1.0.0 + install_test_ScratchUsage(project_path, v"1.0.0") + + # Ensure that the files were created for v1.0.0 + @test isfile(scratch_dir(su_uuid, "1.0.0", "ScratchUsage-1.0.0")) + @test length(readdir(scratch_dir(su_uuid, "1.0.0"))) == 1 + @test isfile(scratch_dir(su_uuid, "1", "ScratchUsage-1.0.0")) + @test length(readdir(scratch_dir(su_uuid, "1"))) == 1 + @test isfile(scratch_dir(global_uuid, "GlobalSpace", "ScratchUsage-1.0.0")) + @test length(readdir(scratch_dir(global_uuid, "GlobalSpace"))) == 1 + + # Next, do the same but for more versions + install_test_ScratchUsage(project_path, v"1.1.0") + install_test_ScratchUsage(project_path, v"2.0.0") + + # Check the spaces were shared when they should have been, and not when they shouldn't + @test isfile(scratch_dir(su_uuid, "1.0.0", "ScratchUsage-1.0.0")) + @test length(readdir(scratch_dir(su_uuid, "1.0.0"))) == 1 + @test isfile(scratch_dir(su_uuid, "1.1.0", "ScratchUsage-1.1.0")) + @test length(readdir(scratch_dir(su_uuid, "1.1.0"))) == 1 + @test isfile(scratch_dir(su_uuid, "2.0.0", "ScratchUsage-2.0.0")) + @test length(readdir(scratch_dir(su_uuid, "2.0.0"))) == 1 + @test isfile(scratch_dir(su_uuid, "1", "ScratchUsage-1.0.0")) + @test isfile(scratch_dir(su_uuid, "1", "ScratchUsage-1.1.0")) + @test length(readdir(scratch_dir(su_uuid, "1"))) == 2 + @test isfile(scratch_dir(su_uuid, "2", "ScratchUsage-2.0.0")) + @test length(readdir(scratch_dir(su_uuid, "2"))) == 1 + @test isfile(scratch_dir(global_uuid, "GlobalSpace", "ScratchUsage-1.0.0")) + @test isfile(scratch_dir(global_uuid, "GlobalSpace", "ScratchUsage-1.1.0")) + @test isfile(scratch_dir(global_uuid, "GlobalSpace", "ScratchUsage-2.0.0")) + @test length(readdir(scratch_dir(global_uuid, "GlobalSpace"))) == 3 + end +end + + +@testset "Spaces Lifecycling" begin + temp_pkg_dir() do project_path + # First, install ScratchUsage + add_this_pkg() + su_uuid = "93485645-17f1-6f3b-45bc-419db53815ea" + global_uuid = string(Base.UUID(UInt128(0))) + install_test_ScratchUsage(project_path, v"1.0.0") + + # Ensure that a few files were created + @test isfile(scratch_dir(su_uuid, "1.0.0", "ScratchUsage-1.0.0")) + @test length(readdir(scratch_dir(su_uuid, "1.0.0"))) == 1 + @test isfile(scratch_dir(global_uuid, "GlobalSpace", "ScratchUsage-1.0.0")) + @test length(readdir(scratch_dir(global_uuid, "GlobalSpace"))) == 1 + + # Test that a gc() doesn't remove anything, and that there is no orphanage + Pkg.gc() + orphaned_path = joinpath(Pkg.logdir(), "orphaned.toml") + @test isfile(scratch_dir(su_uuid, "1.0.0", "ScratchUsage-1.0.0")) + @test isfile(scratch_dir(global_uuid, "GlobalSpace", "ScratchUsage-1.0.0")) + @test !isfile(orphaned_path) + + # Remove ScrachUsage, which causes the package (but not the scratch dirs) + # to move to the orphanage + Pkg.rm("ScratchUsage") + rm(joinpath(project_path, "ScratchUsage"); force=true, recursive=true) + Pkg.gc() + + @test isfile(scratch_dir(su_uuid, "1.0.0", "ScratchUsage-1.0.0")) + @test isfile(scratch_dir(global_uuid, "GlobalSpace", "ScratchUsage-1.0.0")) + @test isfile(orphaned_path) + orphanage = Pkg.TOML.parse(String(read(orphaned_path))) + @test haskey(orphanage, scratch_dir(su_uuid, "1.0.0")) + @test haskey(orphanage, scratch_dir(su_uuid, "1")) + @test !haskey(orphanage, scratch_dir(global_uuid, "GlobalSpace")) + + # Run a GC, forcing collection to ensure that everything in the SpaceUsage + # namespace gets removed (but still appears in the orphanage) + sleep(0.2) + Pkg.gc(;collect_delay=Millisecond(100)) + @test !isdir(scratch_dir(su_uuid)) + @test isdir(scratch_dir(global_uuid, "GlobalSpace")) + orphanage = Pkg.TOML.parse(String(read(orphaned_path))) + @test haskey(orphanage, scratch_dir(su_uuid, "1.0.0")) + @test haskey(orphanage, scratch_dir(su_uuid, "1")) + @test !haskey(orphanage, scratch_dir(global_uuid, "GlobalSpace")) + + # Finally, run a GC with the `scratch_cleanup_period` set low, to force + # the global space to get orphaned and reaped immediately: + Pkg.gc(;scratch_cleanup_period=Second(0),collect_delay=Second(0)) + orphanage = Pkg.TOML.parse(String(read(orphaned_path))) + @test haskey(orphanage, scratch_dir(global_uuid, "GlobalSpace")) + @test !isdir(scratch_dir(global_uuid, "GlobalSpace")) + end +end + +end # module ScratchTests \ No newline at end of file diff --git a/test/test_packages/ScratchUsage/Project.toml b/test/test_packages/ScratchUsage/Project.toml new file mode 100644 index 0000000000..9c79afb93c --- /dev/null +++ b/test/test_packages/ScratchUsage/Project.toml @@ -0,0 +1,7 @@ +name = "ScratchUsage" +uuid = "93485645-17f1-6f3b-45bc-419db53815ea" +# This version will get automatically replaced within `test/scratchspaces.jl` +version = "1.2.3" + +[deps] +Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f" diff --git a/test/test_packages/ScratchUsage/src/ScratchUsage.jl b/test/test_packages/ScratchUsage/src/ScratchUsage.jl new file mode 100644 index 0000000000..fc981210f5 --- /dev/null +++ b/test/test_packages/ScratchUsage/src/ScratchUsage.jl @@ -0,0 +1,26 @@ +module ScratchUsage +using Pkg, Pkg.Scratch + +const my_uuid = Pkg.API.get_uuid(@__MODULE__) +const my_version = Pkg.API.get_version(my_uuid) + +# This function will create a bevy of spaces here +function touch_scratch() + # Create an explicitly version-specific space + private_space = get_scratch!( + string(my_version.major, ".", my_version.minor, ".", my_version.patch); + my_uuid, + ) + touch(joinpath(private_space, string("ScratchUsage-", my_version))) + + # Create a space shared between all instances of the same major version, + # using the `@get_scratch!` macro which automatically looks up the UUID + major_space = @get_scratch!(string(my_version.major)) + touch(joinpath(major_space, string("ScratchUsage-", my_version))) + + # Create a global space that is not locked to this package at all + global_space = get_scratch!("GlobalSpace") + touch(joinpath(global_space, string("ScratchUsage-", my_version))) +end + +end # module ScratchUsage \ No newline at end of file diff --git a/test/test_packages/ScratchUsage/test/runtests.jl b/test/test_packages/ScratchUsage/test/runtests.jl new file mode 100644 index 0000000000..eb94421185 --- /dev/null +++ b/test/test_packages/ScratchUsage/test/runtests.jl @@ -0,0 +1,4 @@ +using ScratchUsage + +# Touch the spaces and call it good +ScratchUsage.touch_scratch()