Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifacts: disallow downloading via reflection hacks on Pkg #37844

Merged
merged 1 commit into from
Nov 23, 2020

Conversation

vtjnash
Copy link
Sponsor Member

@vtjnash vtjnash commented Oct 1, 2020

Packages should never access Base.loaded_modules() to call functions from it. This avoids #37731 (in Artifacts, though the new LazyArtifacts test still would have triggered it), though doesn't fix the Pkg bug that causes it (which is now fixed).

Example:

julia> using Artifacts; artifact"c_simple"
ERROR: Artifact "c_simple" is missing. Try `using Pkg; Pkg.build("Main"); Pkg.Artifacts.ensure_all_artifacts_installed("/Users/jameson/julia/stdlib/Artifacts/test/Artifacts.toml")`?
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] _artifact_str(__module__::Module, artifacts_toml::String, name::SubString{String}, path_tail::String, artifact_dict::Dict{String, Any}, hash::Base.SHA1, platform::Base.BinaryPlatforms.Platform)
   @ Artifacts ~/julia/usr/share/julia/stdlib/v1.6/Artifacts/src/Artifacts.jl:495
 [3] invokelatest(::Any, ::Any, ::Vararg{Any, N} where N; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Base ./essentials.jl:709
 [4] invokelatest(::Any, ::Any, ::Vararg{Any, N} where N)
   @ Base ./essentials.jl:708
 [5] top-level scope
   @ ~/julia/usr/share/julia/stdlib/v1.6/Artifacts/src/Artifacts.jl:610

@staticfloat
Copy link
Sponsor Member

We can’t do this; this is necessary functionality for lazy artifacts.

@staticfloat
Copy link
Sponsor Member

What is the issue with calling Base.loaded_modules()? One other possibility is to do what Distributed does and Pkg = require(PkgId(pkg_uuid, "Pkg")), but this will cause some strangeness when running the Pkg tests.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Oct 1, 2020

We simply cannot be depending on the presence of Pkg at runtime. And that example is for LinearAlgebra, which should also not be doing that, but it's for peakflops which nobody cares about, and is legacy so it is not currently a priority.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Oct 1, 2020

How about if Artifacts allow one to set a "download backed" and Pkg sets that and if it isn't set it errors? Same way how Base gets a handle to the REPL.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Oct 1, 2020

Nothing should depend on the existence of REPL at runtime either (aside from the REPL haha), so that is possibly not ideal, but also not a similar situation (except in that no package should be trying to access REPL through this hook, as it may not be set if you aren't at the REPL).

@staticfloat
Copy link
Sponsor Member

We simply cannot be depending on the presence of Pkg at runtime.

I assume this is because we want to prepare for a future where Pkg is not available due to PackageCompiler trimming it out, or something like that. So in that case, why not just error out if Pkg is not available?

e.g.

Pkg_modules = filter(p-> p[1].name == "Pkg", Base.loaded_modules)
if isempty(Pkg_modules)
    error(...)
end
return jointail(first(Pkg_modules)[2].Artifacts.ensure_artifact_installed(string(name), artifacts_toml; platform), path_tail)

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Oct 1, 2020

I'm planning for a future (hopefully near, since we've talked about it every month this summer on our internal compiler call), where Pkg is never available, except if you explicitly load it (and, unless you are Pkg or the REPL, take the corresponding performance hit). So that suggestion is identical to this one.

@staticfloat
Copy link
Sponsor Member

So in that case, the Pkg = require(PkgId(pkg_uuid, "Pkg")) would be better, because it would explicitly load Pkg (and incur the necessary performance hit). It's fine to be slow on this path since we're going to have to hit the network anyway.

@vtjnash vtjnash added the triage This should be discussed on a triage call label Oct 22, 2020
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Oct 22, 2020

I'd like to get this merged for the v1.6 release, before anyone is depending on this hack, as this is preparation for various future improvements (as demonstrated in #38119 for example) where Pkg cannot be accessed by packages during compilation. As can be seen there, this PR is necessary first to be able to continue to pass tests in such as world, where Pkg cannot be assumed to be reliably accessed.

@Keno
Copy link
Member

Keno commented Oct 22, 2020

From triage: We should have artifact_str expand to explicitly do an isdefined(:Pkg) and pass in the package module. That way the user of the lazy artifact has to explicitly depend on package to make it work and the error message can be actionable for the developer.

@staticfloat
Copy link
Sponsor Member

So the behavior is that if you want to support lazy artifacts, you import Pkg in your package and artifact"lazy_artifact" will continue to "just work", but if you don't, you'll get a descriptive error message? Sounds good to me.

@Keno
Copy link
Member

Keno commented Oct 22, 2020

yes, that's the idea

@vtjnash vtjnash force-pushed the jn/artifacts-remove-download-hack branch from 9999d5d to 6552d7a Compare October 28, 2020 16:45
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Oct 28, 2020

Yes, that's what this PR should do now

Pkg = first(filter(p-> p[1].name == "Pkg", Base.loaded_modules))[2]
return jointail(Pkg.Artifacts.ensure_artifact_installed(string(name), artifacts_toml; platform), path_tail)
if ensure_artifact_installed === nothing
error("Artifact $(repr(name)) was not installed correctly. Try `using Pkg; Pkg.build(\"$__module__\"); Pkg.Artifacts.ensure_all_artifacts_installed($(repr(artifacts_toml)))`?")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this error tell the package author to add using Pkg to their package?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be telling them to prefer worse behavior. However, I like staticfloat's suggestion of examining the toml to figure out which was probably intended.

@@ -605,14 +607,17 @@ macro artifact_str(name, platform=nothing)
# Invalidate calling .ji file if Artifacts.toml file changes
Base.include_dependency(artifacts_toml)

# Check if the user has run `using Pkg.Artifacts: ensure_artifact_installed`, and thus supports lazy artifacts
ensure_artifact_installed = isdefined(__module__, :ensure_artifact_installed) ? GlobalRef(__module__, :ensure_artifact_installed) : nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to be this explicit? Why not just using Pkg or using Pkg.Artifacts?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of looking for :ensure_artifact_installed I think it's better to look for :Pkg, so that users can use import Pkg to avoid polluting their namespace.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that initially, but eventually felt this polluted the namespace the least and gives the most forward-compatibility. using Pkg.Artifacts is a non-starter (since it'll conflict with using Artifacts), while Pkg is just a rather short name.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure_artifact_installed is a private API, and we can change that name at any time. It would be better to ask the user to import something that will not change, like Pkg.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I'm wrong, ensure_artifact_installed is not private, it's exported. I still think it's better (in the event that we switch names in the future, after an appropriate deprecation period) to depend on Pkg though.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I'm wrong, ensure_artifact_installed is not private, it's exported.

Well, it should really have been private.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It's not exported from Pkg, but it is exported from Artifacts because it's used in other parts of Pkg.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing using Pkg: Pkg also pollutes the namespace, but uses a more common variable name than using Pkg: Artifacts.ensure_artifact_installed uses. Since the existing lazy feature here already means that changing the name of this would be breaking (once we release Artifacts v1.6 and users are depending on it), so you've already committed yourself to having Pkg provide this function. So the question here is whether you want the API to be requiring the existence of a global variable named Pkg to signal the existence of Pkg.Artifactsand Pkg.Artifacts.ensure_artifact_installed (with possibility for there to be additional functions added later that this macro may select instead), or whether the user should be more explicit about the name ensure_artifact_installed (with possibility that the user can provide other implementations of this function in the future, and thus aren't forced to always depend upon Pkg in the future).

Or as a combination, we could rename Pkg.Artifacts to Pkg.ArtifactsDownloader (or some other name of your liking), so that it doesn't conflict with the name of this package, and thus gives you the flexibility to make this also a toplevel package in the future (of the same name, or with using ArtifactsNext: ArtifactsNext as ArtifactsDownloader)

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's a clear choice: looking at Pkg gives something that users will most likely already have (much more likely than ensure_artifact_installed, for instance) and also gives us the flexibility in code to polymorph to future Pkg releases where we re-organize things. It pushes the work of backwards-compatibility onto Artifacts instead of onto user code, which is always a good thing when it has no performance impact.

@@ -605,14 +607,17 @@ macro artifact_str(name, platform=nothing)
# Invalidate calling .ji file if Artifacts.toml file changes
Base.include_dependency(artifacts_toml)

# Check if the user has run `using Pkg.Artifacts: ensure_artifact_installed`, and thus supports lazy artifacts
ensure_artifact_installed = isdefined(__module__, :ensure_artifact_installed) ? GlobalRef(__module__, :ensure_artifact_installed) : nothing
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of looking for :ensure_artifact_installed I think it's better to look for :Pkg, so that users can use import Pkg to avoid polluting their namespace.

Pkg = first(filter(p-> p[1].name == "Pkg", Base.loaded_modules))[2]
return jointail(Pkg.Artifacts.ensure_artifact_installed(string(name), artifacts_toml; platform), path_tail)
if ensure_artifact_installed === nothing
error("Artifact $(repr(name)) was not installed correctly. Try `using Pkg; Pkg.build(\"$__module__\"); Pkg.Artifacts.ensure_all_artifacts_installed($(repr(artifacts_toml)))`?")
Copy link
Sponsor Member

@staticfloat staticfloat Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error("Artifact $(repr(name)) was not installed correctly. Try `using Pkg; Pkg.build(\"$__module__\"); Pkg.Artifacts.ensure_all_artifacts_installed($(repr(artifacts_toml)))`?")
meta = artifact_meta(name, artifact_dict, artifacts_toml; platform)
if meta["lazy"]
error("Artifact $(repr(name)) is a lazy artifact; package developers must `import Pkg` before using lazy artifacts.")
else
error("Artifact $(repr(name)) was not installed correctly. Try `using Pkg; Pkg.instantiate()` to re-install all missing resources.")
end

There are two cases where this can go wrong; the first is something terrible happened during Pkg.add() and we're in an inconsistent state. As far as I know, this has never happened, but in the event that it does, Pkg.build() isn't the right fix, Pkg.instantiate() is, I think.

The second is that this is a package developer adding an artifact to their package and being surprised when lazy artifacts don't work.

We can disambiguate the two by looking to see if the corresponding entry in artifact_dict marks the artifact as lazy.

with_artifacts_directory(tempdir) do
ex = @test_throws ErrorException artifact"socrates"
@test startswith(ex.value.msg, "Artifact \"socrates\" was not installed correctly.")
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test both the "not installed correctly" branch and the "must import Pkg" branch, and the branch that actually succeeds in downloading the lazy artifact.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm only testing the new code here, but someone should write/have written a test for the lazy code too

@staticfloat
Copy link
Sponsor Member

If we go with the LazyArtifacts approach, doesn't this break all packages that currently exist that use lazy artifacts? Wasn't that the benefit with just looking to see if Pkg has been imported instead; that it would continue to work for packages written in the v1.3-v1.5 era?

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Nov 18, 2020

Ah, right, I forget from triage that we were intending to add a deprecation warning for that situation

@vtjnash vtjnash force-pushed the jn/artifacts-remove-download-hack branch from c5d4168 to 4379411 Compare November 20, 2020 19:49
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Nov 20, 2020

Okay, added support for the old solution of loading all of Pkg. We don't want to require people to always load all of Pkg, so it's marked depwarn. I think this now fully covers what we decided on the triage call?

@vtjnash vtjnash added this to the 1.6 features milestone Nov 20, 2020
@StefanKarpinski
Copy link
Sponsor Member

I've lost track, what's the recommended way to use artifacts now?

Packages should never access Base.loaded_modules() to call functions
from it, as it can be brittle and create future incompatibilities, so
instead we require the user to explicitly declare a dependency on the
lazy-download machinery, if they requiring the ability to use it (for
lazy artifacts).

As a deprecation, if the user has `using Pkg`, that will be used
instead, with a depwarn.
@vtjnash vtjnash force-pushed the jn/artifacts-remove-download-hack branch from 4379411 to 2ef080e Compare November 20, 2020 22:27
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Nov 21, 2020

There's no major changes for typical users. Lazy users now need to explicitly add LazyArtifacts to your Manifest, rather than just hoping it's available at runtime.

@vtjnash vtjnash merged commit 7d5dbf5 into master Nov 23, 2020
@vtjnash vtjnash deleted the jn/artifacts-remove-download-hack branch November 23, 2020 17:56
@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Jan 7, 2021
vchuravy pushed a commit to JuliaPackaging/LazyArtifacts.jl that referenced this pull request Oct 2, 2023
…JuliaLang/julia#37844)

Packages should never access Base.loaded_modules() to call functions
from it, as it can be brittle and create future incompatibilities, so
instead we require the user to explicitly declare a dependency on the
lazy-download machinery, if they requiring the ability to use it (for
lazy artifacts).

As a deprecation, if the user has `using Pkg`, that will be used
instead, with a depwarn.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants