Skip to content

Commit

Permalink
Implement Scratchspaces
Browse files Browse the repository at this point in the history
This implements functionality and tests for a new `Spaces`
subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this
provides an abstraction for a mutable datastore that can be explicitly
lifecycled to an owning package, or shared among multiple packages.

Closes #796
  • Loading branch information
staticfloat committed May 22, 2020
1 parent 6679131 commit 9b9a0f4
Show file tree
Hide file tree
Showing 14 changed files with 796 additions and 76 deletions.
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ makedocs(
"compatibility.md",
"registries.md",
"artifacts.md",
"spaces.md",
# "faq.md",
"glossary.md",
"toml-files.md",
Expand Down
18 changes: 17 additions & 1 deletion docs/src/api.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# [**12.** API Reference](@id API-Reference)
# [**13.** API Reference](@id API-Reference)

This section describes the function interface, or "API mode",
for interacting with Pkg.jl. The function API is recommended
Expand Down Expand Up @@ -87,3 +87,19 @@ Pkg.Artifacts.ensure_all_artifacts_installed
Pkg.Artifacts.@artifact_str
Pkg.Artifacts.archive_artifact
```

## [Scratch Space API Reference](@id Scratch-Space-Reference)

!!! compat "Julia 1.6"
Pkg's Scratch space API requires at least Julia 1.6.

```@docs
Pkg.Scratch.get_scratch!
Pkg.Scratch.@get_scratch!
Pkg.Scratch.delete_scratch!
Pkg.Scratch.clear_scratchspaces!
Pkg.Scratch.with_scratch_directory
Pkg.Scratch.scratch_dir
Pkg.Scratch.scratch_path
Pkg.Scratch.track_scratch_access
```
2 changes: 1 addition & 1 deletion docs/src/glossary.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# [**9.** Glossary](@id Glossary)
# [**10.** Glossary](@id Glossary)

**Project:** a source tree with a standard layout, including a `src` directory
for the main body of Julia code, a `test` directory for testing the project,
Expand Down
150 changes: 150 additions & 0 deletions docs/src/scratch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# [**9.** Scratch Spaces](@id Scratch-Spaces)

!!! compat "Julia 1.6"
Pkg's Scratch Spaces functionality requires at least Julia 1.6.

`Pkg` can manage and automatically garbage collet scratch spaces of temporary or readily-recreatable data.
These spaces can contain datasets, text, binaries, or any other kind of data that would be convenient to store, but which is non-fatal to have garbage collected if it has not been accessed recently, or if the owning package has been uninstalled.
As compared to [Artifacts](@ref), these containers of data are mutable and should be treated as ephemeral; all usage of scratch spaces should assume that the data stored within them could be gone by the next time your code is run.
In the current implementation, scratch spaces are removed during Pkg garbage collection if the scratch space has not been accessed for a period of time (see the `scratch_cleanup_period` keyword argument to [Pkg.gc](@ref)), or if the owning package has been removed.
Users can also request a full wipe of all scratch spaces to clean up unused disk space through `Pkg.Scratch.clear_scratchspaces!()`.

## API overview

Scratch space usage is performed primarily through one function: `get_scratch!()`.
It provides a single interface for creating and getting previously-created spaces, either tied to a package by its UUID, or as a global scratch space that can be accessed by any package.
Here is an example where a package creates a scratch space that is namespaced to its own UUID:

```julia
module ScratchExample
using Pkg, Pkg.Scratch

# This will be filled in inside `__init__()`
download_cache = ""

# Downloads a resource, stores it within a scratchspace
function download_dataset(url)
fname = joinpath(download_cache, basename(url))
if !isfile(fname)
download(url, fname)
end
return fname
end

function __init__()
global download_cache = @get_scratch!("downloaded_files")
end

end # module ScratchExample
```

Note that we initialize the `download_cache` within `__init__()` so that our packages are as relocatable as possible; we typically do not want to bake absolute paths into our precompiled files.
This makes use of the `@get_scratch!()` macro, which is identical to the `get_scratch!()` method, except it automatically determines the UUID of the calling module, if possible. The user can manually pass in a `Module` as well for a slightly more verbose incantation:
```julia
function __init__()
global download_cache = get_scratch!("downloaded_files", @__MODULE__)
end
```

If a user wishes to manually delete a scratch space, the method `delete_scratch!(key; pkg_uuid)` is the natural analog to `get_scratch!()`, however in general users will not need to do so, the scratch spaces will be garbage collected by `Pkg` automatically.

For a full listing of docstrings and methods, see the [Scratch Space Reference](@ref) section.

## Use cases

Good use cases for a Pkg scratch space include:

* Caching downloads of files that must be routinely accessed and modified by a package. Files that must be modified are a bad fit for the immutable [Artifacts](@ref) abstraction, and files can always be re-downloaded if the cache is wiped by the user.

* Generated data that depends on the characteristics of the host system. Examples are compiled binaries, fontcache system font folder inspection output, generated CUDA bitcode files, etc... Objects that would be difficult to compute off of the user's machine, and that can be recreated without user intervention are a great fit.

* Directories that should be shared between multiple packages in a single depot. The scratch space keying mechanism makes it simple to provide scratch spaces that can be shared between different versions of a package, or even between different packages. This allows packages to provide a scratch space where other packages can easily find the generated data, however the typical race condition warnings apply here; always design your access patterns assuming another process could be reading or writing to this scratch space at any time.

Bad use cases for a Pkg scratch space include (but are not limited to):

* Anything that requires user input to regenerate. Because scratch spaces can disappear, it is a bad experience for the user to need to answer questions at seemingly random times when the space must be rebuilt.

* Storing data that is write-once, read-many times. We suggest you use [Artifacts](@ref) for that, as they are much more persistent and are built to become portable (so that other machines do not have to generate the data, they can simple make use of the artifact by downloading it from a hosted location). Scratch spaces generally should follow a write-many read-many access pattern.

## Tips and Tricks

> Can I trigger data regeneration if the scratch space is found to be empty/files are missing?
Yes, this is quite simple; just check the contents of the directory when you first call `get_scratch!()`, and if it's empty, run your generation function:

```julia
using Pkg, Pkg.Scratch

function get_dataset_dir()
dataset_dir = @get_scratch!("dataset")
if isempty(readdir(dataset_dir))
perform_expensive_dataset_generation(dataset_dir)
end
return dataset_dir
end
```

> Can I create a scratchs pace that is not shared across versions of my package?
Yes! Make use of the `key` parameter and Pkg's ability to look up the current version of your package at compile-time:

```julia
module VersionSpecificExample
using Pkg, Pkg.Scratch

# Get the current version at compile-time, that's fine it's not going to change. ;)
const pkg_version = Pkg.API.get_version(Pkg.API.get_uuid(@__MODULE__))

# This will be filled in by `__init__()`; it might change if we get deployed somewhere
version_specific_scratch = Ref{String}()

function __init__()
# This space will be unique between versions of my package that different major and
# minor versions, but allows patch releases to share the same.
scratch_name = "data_for_version-$(pkg_version.major).$(pkg_version.minor)"
global version_specific_scratch[] = @get_scratch!(scratch_name)
end

end # module
```

> Can I use a scratch space as a temporary workspace, then turn it into an Artifact?
Yes! Once you're satisfied with your dataset that has been cooking inside a space, and you're ready to share it with the world as an immutable artifact, you can use `create_artifact()` to create an artifact from the space, `archive_artifact()` to get a tarball that you can upload somewhere, and `bind_artifact!()` to write out an `Artifacts.toml` that allows others to download and use it:

```julia
using Pkg, Pkg.Scratch, Pkg.Artifacts

function export_scratch(scratch_name::String, github_repo::String)
scratch_dir = @get_scratch!(scratch_name)

# Copy space directory over to an Artifact
hash = create_artifact() do artifact_dir
rm(artifact_dir)
cp(scratch_dir, artifact_dir)
end

# Archive artifact out to a tarball. Since `upload_tarball()` is not a function that
# exists, users must either write it themselves (uploading to whatever hosting
# provider they prefer), or run each line of this `do`-block manually, upload the
# tarball manually, record its URL, and pass that to `bind_artifact!()`.
mktempdir() do upload_dir
tarball_path = joinpath(upload_dir, "$(scratch_name).tar.gz")
tarball_hash = archive_artifact(hash, tarball_path)

# Upload tarball to a hosted site somewhere. Note; this function does not
# exist, it's put here simply to show the flow of events.
tarball_url = upload_tarball(tarball_path)

# Bind artifact to an Artifacts.toml file in the current directory; this file can
# be used by others to download and use your newly-created Artifact!
bind_artifact!(
joinpath(@__DIR__, "./Artifacts.toml"),
scratch_name,
hash;
download_info=[(tarball_url, tarball_hash)],
force=true,
)
end
end
```
2 changes: 1 addition & 1 deletion docs/src/toml-files.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# [**10.** `Project.toml` and `Manifest.toml`](@id Project-and-Manifest)
# [**11.** `Project.toml` and `Manifest.toml`](@id Project-and-Manifest)

Two files that are central to Pkg are `Project.toml` and `Manifest.toml`. `Project.toml`
and `Manifest.toml` are written in [TOML](https://github.com/toml-lang/toml) (hence the
Expand Down
Loading

0 comments on commit 9b9a0f4

Please sign in to comment.