Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Scratch Spaces #1833

Closed
wants to merge 2 commits into from
Closed

Implement Scratch Spaces #1833

wants to merge 2 commits into from

Conversation

staticfloat
Copy link
Sponsor Member

This implements functionality and tests for a new Spaces
subsystem in Pkg; analogous to the Artifacts added in 1.3, this
provides an abstraction for a mutable datastore that can be explicitly
lifecycled to an owning package, or shared among multiple packages.

Closes #796

Copy link
Sponsor Member

@KristofferC KristofferC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the docs

docs/src/spaces.md Outdated Show resolved Hide resolved
docs/src/spaces.md Outdated Show resolved Hide resolved
docs/src/spaces.md Outdated Show resolved Hide resolved
docs/src/spaces.md Outdated Show resolved Hide resolved
docs/src/spaces.md Outdated Show resolved Hide resolved
docs/src/spaces.md Outdated Show resolved Hide resolved
docs/src/spaces.md Outdated Show resolved Hide resolved
docs/src/spaces.md Outdated Show resolved Hide resolved
docs/src/spaces.md Outdated Show resolved Hide resolved
docs/src/spaces.md Outdated Show resolved Hide resolved
docs/src/api.md Outdated Show resolved Hide resolved
@staticfloat
Copy link
Sponsor Member Author

Thanks for the review guys, I think I have addressed all comments so far.

These functions provide a nice interface for determining the
currently-running package's UUID and VersionNumber, identified by loaded
`Module`.  We explicitly do not include a lookup by name, in expectation
of potential future naming clashes due to having multiple packages with
the same name but different UUIDs.
@staticfloat staticfloat force-pushed the sf/spaces branch 2 times, most recently from 9b9a0f4 to f8856ab Compare May 22, 2020 00:08
@staticfloat staticfloat changed the title Implement Scratchspaces Implement Scratch Spaces May 22, 2020
docs/make.jl Outdated Show resolved Hide resolved
@staticfloat
Copy link
Sponsor Member Author

This is good to go on my end. :)

@tkf
Copy link
Member

tkf commented May 27, 2020

Can this be used as a stable place for external "virtual environment" for each Julia project? I'm thinking to optionally create a conda environment or Python venv for each Project.toml path. Would something like this

project_slug = Base.slug(Base._crc32c(Base.active_project()), 5)
scratch_name = "$other_things-$project_slug"
scratch_path = @get_scratch!(scratch_name)

work?

docs/src/scratchspaces.md Outdated Show resolved Hide resolved

* Caching downloads of files that must be routinely accessed and modified by a package. Files that must be modified are a bad fit for the immutable [Artifacts](@ref) abstraction, and files can always be re-downloaded if the cache is wiped by the user.

* Generated data that depends on the characteristics of the host system. Examples are compiled binaries, fontcache system font folder inspection output, generated CUDA bitcode files, etc... Objects that would be difficult to compute off of the user's machine, and that can be recreated without user intervention are a great fit.
Copy link
Contributor

@fingolfin fingolfin May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"compiled binaries" is the usecase I'd have in mind; but I am somewhat worried about the eager GC behavior, as recompilation can take a non-trivial amount of time. I guess I could print a message like "parts of this package were garbage collected and need to be regenerated, please wait" ?

UPDATE: OK, sorry, I should have read more first: I guess in that case, I might be better of creating an artifact with the compiled data on the fly

@staticfloat
Copy link
Sponsor Member Author

Would something like this work?

Yes! You can see that I actually do this in my compile time preferences package.

@staticfloat
Copy link
Sponsor Member Author

I think the main sticking point on this PR is the GC time; I move that we discuss this on the next Pkg call (this coming Tuesday, at 2:15pm EST) and figure out what we do and don't want to support.

Possibilities include:

  • Keep the current behavior; things get GC'ed after a set period of inactivity.
  • Extend the timeline
  • Allow for per-scratchspace control over when things get GC'ed by default.
  • Only GC scratch spaces once the owning package has been removed (and in the case of no owning package; never)

@StefanKarpinski
Copy link
Sponsor Member

Only GC scratch spaces once the owning package has been removed

This seems like it might be the right default.

(and in the case of no owning package; never)

When would there be no owning package?

@staticfloat
Copy link
Sponsor Member Author

When would there be no owning package?

In the event that someone uses @get_scratch!() from a module like Main, or Base. E.g. in the REPL.

@tkf
Copy link
Member

tkf commented May 29, 2020

  • Only GC scratch spaces once the owning package has been removed (and in the case of no owning package; never)

What is "owning package" in this context? For example, if

  1. PkgA 1.0 creates a scratch space (which is an optional behavior)
  2. PkgA 2.0 is installed in another project
  3. PkgA 1.0 is removed

then, does PkgA own a scratch space?

Also, why not let projects own the scratch space? Wouldn't it work with above scenario? It'd mean ~/.julia/logs/manifest_usage.toml becomes the ultimate GC root, I guess? I don't know the primary usecase of the log file, though.

@staticfloat
Copy link
Sponsor Member Author

Right, so that means the scratch space would persist until ALL versions of PkgA are removed.

We don't want scratch spaces to be project-specific unless the user wants them to be; that would unnecessarily cripple them. The files within the logs directory do end up being kind of like a GC root though. :P

@tkf
Copy link
Member

tkf commented May 30, 2020

We don't want scratch spaces to be project-specific unless the user wants them to be; that would unnecessarily cripple them.

I don't know the implementation detail but, in principle, isn't it possible to reference the same scratch space from multiple projects? With Julia as an analogy, I'm imagining something like

                                      # Analogy
scratch = Ref(:scratch)               # @get_scratch!(scratch_name) from PkgA
julia_logs = Dict()                   # ~/.julia/logs/manifest_usage.toml
julia_logs[:Project_A] = []           # PATH/TO/Project_A/Manifest.toml with PkgA
push!(julia_logs[:Project_A],         # PkgA uses/creates the scratch space
      :PkgA => scratch)
julia_logs[:Project_B] = []           # PATH/TO/Project_B/Manifest.toml with PkgA
push!(julia_logs[:Project_B],         # PkgA re-uses the scratch space
      :PkgA => scratch)
julia_logs[:Project_C] = []           # PATH/TO/Project_C/Manifest.toml with PkgA
# push!(julia_logs[:Project_C],       # (PkgA does not use the scratch space inside
#       :PkgA => scratch)             # Project C.)
scratch = nothing

So, at this point, reference graph is:

julia_logs -> Project_A -> scratch
julia_logs -> Project_B -> scratch
julia_logs -> Project_C

So, removing Project A and B would remove the scratch space:

pop!(julia_logs, :Project_A)          # Project A removed
GC.gc()                               # => `scratch` still exists
pop!(julia_logs, :Project_B)          # Project B removed
GC.gc()                               # => `scratch` can be CG'ed

This is in contrast with the model where PkgA directly owns the scratch space:

PkgA = [Ref(:scratch)]
julia_logs = Dict()
julia_logs[:Project_A] = [PkgA]
julia_logs[:Project_B] = [PkgA]
julia_logs[:Project_C] = [PkgA]

pop!(julia_logs, :Project_A)          # Project A removed
GC.gc()                               # => `scratch` still exists
pop!(julia_logs, :Project_B)          # Project B removed
GC.gc()                               # => `scratch` still exists due to :Project_C

@staticfloat
Copy link
Sponsor Member Author

We discussed on the Pkg call how to make the lifecycle more reliable here, and we came to the decision that it would be best to just persist scratch spaces for as long as the owning manifest exists. This makes it easy and should reduce fear of package authors, while still allowing old versions of packages whose scratch spaces are no longer used by newer versions of packages to get GC'ed.

This implements functionality and tests for a new `Scratch`
subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this
provides an abstraction for a mutable datastore that can be explicitly
lifecycled to an owning package, or shared among multiple packages.

Closes #796
@staticfloat
Copy link
Sponsor Member Author

I have altered the docs and implementation, so this is ready for final review.

@staticfloat staticfloat closed this Aug 7, 2020
@DilumAluthge DilumAluthge deleted the sf/spaces branch March 19, 2021 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sequester mutable state outside of package directories
5 participants