Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Artifacts and BinaryPlatforms into Base #37320

Merged
merged 2 commits into from
Sep 12, 2020
Merged

Conversation

staticfloat
Copy link
Member

This is the fundamental infrastructure needed for the 1.6 Artifacts rework.

  • Imports Pkg.BinaryPlatforms to Base.BinaryPlatforms. I didn't think it was useful enough to be exported from Base, so it's currently usable through using Base.BinaryPlatforms. It has been more or less completely rewritten to be more flexible, more compiler-friendly and to power us into the age of Expand artifact selection beyond Platform/CompilerABI Pkg.jl#1780

  • Imports Pkg.Artifacts to its own stdlib, Artifacts. Includes locating pieces of Pkg.Artifacts, not the installation pieces, which are still within Pkg. Scripts that want to generate artifacts should still using Pkg.Artifacts, but scripts that just want to use artifacts that have already been generated are free to simply using Artifacts, which should cut down on time to first load by a nice chunk. If a package attempts to @artifact"foo" on an artifact that does not exist, the Artifacts module will dynamically load Pkg and invoke the downloading routines through an inference barrier.

@staticfloat staticfloat force-pushed the sf/artifacts_rework branch 3 times, most recently from 42fd8ef to 0a80c34 Compare September 1, 2020 18:49
base/binaryplatforms.jl Outdated Show resolved Hide resolved
base/binaryplatforms.jl Outdated Show resolved Hide resolved
NEWS.md Show resolved Hide resolved
end

# Other `Platform` types can override this (I'm looking at you, `AnyPlatform`)
tags(p::Platform) = p.tags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to use @timholy s "interface" trick like in

48fc8b4

here. Basically

function Base.getproperty(platform::AbstractPlatform, fieldname::Symbol)
    if fieldname === :tags
        return getfield(platform, :state)::Dict{String, String}
    else
        getfield(platform, fieldname)
    end
end

and then use .tags everywhere. That ensures inference know that .tags has to be a Dict{String, String}.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this kind of overriding of getproperty() was frowned upon? Is that an outdated Julia style?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't about adding OOP-like behavior; it's really about fixing inference problems you have abstract types but can't infer the specific concrete subtype. (E.g., if you extract dict[key] when dict::Dict{String,Platform}, you know the result is a Platform but not the concrete subtype.) Inference will generally return ::Any as the inferred type of field accesses to abstract types. Defining a getproperty lets you add the type-asserts to fix it. See https://timholy.github.io/SnoopCompile.jl/stable/snoopr/#Inferrable-field-access-for-abstract-types-1 for more explanation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused then because there are no abstract types here. Platform is concrete, and tags is also concrete, it's a Dict{String,String}. I don't want to define a tags() for other AbstractPlatform types because they will likely do something completely different (e.g. AnyPlatform won't have any tags, it will just return a constant empty set, and other objects like the compatibility Linux type will sub out to a tags() on a wrapped Platform object).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it doesn't matter. I was going by the current implementation for which

julia> isabstracttype(Pkg.BinaryPlatforms.Platform)
true

Copy link
Member

@KristofferC KristofferC Sep 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to define a tags() for other AbstractPlatform types because they will likely do something completely different

I assumed you would since there is a bunch of code that calls tags on AbstractPlatform e.g.

 Base.haskey(p::AbstractPlatform, k::String) = haskey(tags(p), k)

and

# Simple equality definition; for compatibility testing, use `platforms_match()`
Base.:(==)(a::AbstractPlatform, b::AbstractPlatform) = tags(a) == tags(b)

etc. Should these be restricted to Platform then?

The reason I suggested this in the first place was indeed due to the tags call on AbstractPlatform.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it doesn't matter. I was going by the current implementation for which

Gotcha! Makes sense.

I assumed you would since there is a bunch of code that calls tags on AbstractPlatform

So the semantics of what tags does is stable; tags(p::AbstractPlatform) can be assumed to return a Dict{String,String}, but how it generates that dict is completely up to the AbstractPlatform object. For an AnyPlatform object it will likely return an empty Dict. For a Linux object (a compatibility wrapper for older BinaryProvider releases and such) it will sub out to an internal Platform object, etc... So I still want to call tags() on other AbstractPlatforms, but I'm leaving it up to those other AbstractPlatform types to define just what needs to happen inside of tags().

@KristofferC
Copy link
Member

KristofferC commented Sep 1, 2020

One general comment is that it is quite hard for me to understand what is the public API and what is internals.

For example, there is the function split_artifact_slash which has full documentation with docstrings and everything but it seems to just be called internally in another function. Is that really something that is going to be a public API and subject to non-breakage etc?

I had some comments about possible things to do to reduce the vulnerability to invalidations but I removed them since I felt they got a bit noisy. Those things can be tweaked later (assuming it is easy to know what is internals and can be more or less freely changed).

@staticfloat
Copy link
Member Author

One general comment is that it is quite hard for me to understand what is the public API and what is internals.

For example, there is the function split_artifact_slash which has full documentation with docstrings and everything but it seems to just be called internally in another function. Is that really something that is going to be a public API and subject to non-breakage etc?

In my opinion, anything that is not completely trivial should have docstrings to aid others (and, honestly, myself) when editing this file in the future. The demarcator for public/private isn't how well documented it is, it's whether it's exported or not. Since both of these modules had certain exports in Pkg, I've defaulted to mostly exporting the same things from them (So that e.g. users can switch from Pkg.Artifacts to Artifacts with as little disruption as possible) but especially in the case of BinaryPlatforms there's not much need for that.

@KristofferC
Copy link
Member

KristofferC commented Sep 4, 2020

The demarcator for public/private isn't how well documented it is, it's whether it's exported or not.

This is not really true. There are many things that are public that are not exported (and cannot be exported due to e.g. name clashes). For example Profile.print, TOML.parse etc. Also, in packages, you might now want to export something that has a very usual name that is likely to be too generic to be understandable on its own. What matters for public/private is documentation. With few exceptions, only public-facing API in the Julia repo has docstrings.

However, I don't really worry so much about extra docstrings but when reading this file it is just very hard to know what one needs to be careful to keep backward compat with and what is used just internally. It also makes it hard to review since you might not care about the internals so much but more about the interface you expose. You would have to comb through the doc reference and cross-check what has been put there to know. Say if a lot of these things cause invalidations, what can we narrow types on and what can we not due to someone calling it with some weird abstract type?

Since both of these modules had certain exports in Pkg, I've defaulted to mostly exporting the same things from the

What we should be backward compatible with is the stuff fin https://julialang.github.io/Pkg.jl/dev/api/#Artifacts-Reference-1. For example, BinaryPlatforms is not mentioned anywhere in the Pkg documentation so from that point of view you can do whatever you want with it.

We really should just try to expose as little as possible that is not strictly necessary as a public interface. If that means BB has to use some stuff that is considered "internal" that is fine because at least we can change it and just upgrade BB. I guess it is mostly what the jll packages use that is important to really nail because those guys we likely want to keep working.

@staticfloat
Copy link
Member Author

This is not really true. There are many things that are public that are not exported (and cannot be exported due to e.g. name clashes). For example Profile.print, TOML.parse etc.

These are exceptions due to name clashes, it's not good practice to have public APIs that are unexported. In fact, it is then even more important to have documentation stating that TOML.parse is a public API.

Also, in packages, you might now want to export something that has a very usual name that is likely to be too generic to be understandable on its own.

I'm not sure what you're saying here; but naming of documented, non-exported symbols can be changed before exporting.

What matters for public/private is documentation. With few exceptions, only public-facing API in the Julia repo has docstrings.

I strongly disagree. While it may be the case that developers document public APIs more than internal ones, I argue that's a bad situation we accept because of limited developer time, not an ideal to be strived for. We should never look down on more comments and documentation to help understand code. Documentation is a tool to help people understand code, it is never desirable to make code less easy to understand.

@timholy
Copy link
Member

timholy commented Sep 4, 2020

FWIW I like docstrings on internal methods too.

@giordano
Copy link
Contributor

giordano commented Sep 4, 2020

I heard a few times the claim that public methods are those that have the docstring. Don't we really have a better way to identify public and internal methods than having or not a docstring?

@timholy
Copy link
Member

timholy commented Sep 4, 2020

Yeah, it seems silly to distinguish them that way. When I write Documenter docs, I only "publish" the ones I expect people to call, and I ignore the warnings about those internal functions that I've written a docstring for but which are really the internal interface. But even that is pretty lame as a means of distinguishing "things that shouldn't change without a depwarn" from things that I might change without warning.

@KristofferC
Copy link
Member

KristofferC commented Sep 7, 2020

These are exceptions due to name clashes, it's not good practice to have public APIs that are unexported.

Where does this notion come from? export is used to put something in the global namespace which is a shared resource and as you use more packages it will inherently start erroring. Many packages break as soon as we export something more from Base.
Also, for code readers, it makes it impossible to locally (syntactically) see where a symbol is coming from without help from an IDE which is awful when you are trying to get familiar with a new codebase. Anyway, this is a side discussion.

I strongly disagree. While it may be the case that developers document public APIs more than internal ones, I argue that's a bad situation we accept because of limited developer time, not an ideal to be strived for. We should never look down on more comments and documentation to help understand code. Documentation is a tool to help people understand code, it is never desirable to make code less easy to understand.

When I said "documentation determines public/private", I should have been more specific and said, "public-facing documentation determines public/private". Adding a docstring to something indeed doesn't make it public, putting it into the manual does.

@KristofferC
Copy link
Member

KristofferC commented Sep 7, 2020

I mean, the reason why I bring this up is that I see quite a few places which e.g. could have problems with invalidations (and one of the main reasons this code movement is being done is AFAIU due to compilation time). However, I don't want to nitpick every single place when it might not be a problem in practice. Therefore, if we have the freedom to change things at a later point, then it is probably better to merge, and to these fixes later.

But now e.g. parse_dl_name_version(path::AbstractString, platform::AbstractPlatform) is exported with a nice docstring specifically saying that the types are AbstractString, so if we find that there are a bunch of invalidations from that and want to tightent it to path::Union{String, SubString{String}} are we then breaking the interface? It's exported so maybe? But the module Base.BinaryPlatforms itself is not so maybe not? But the jll packages will use BinaryPlatforms so I guess it is effectively public? But the jll packages might only call this with String? So is it ok to change? Etc. If I search on JuliaHub for parse_dl_name_version it seems to only be used withing BB (but BB also exports it so it is a part of the public API for BB).

If you tell me nothing here is sacred then I'm much calmer, heh.

base/binaryplatforms.jl Outdated Show resolved Hide resolved
@KristofferC
Copy link
Member

KristofferC commented Sep 7, 2020

Anyway, I didn't mean for my attempt at a review to start this type of a "heated" side discussion about stuff that is not directly related to the functional code in the PR. I just jotted down some of my thoughts while reading the code and it was probably mostly a reflection of me being unfamiliar with the code and how it is being used in combination with that it didn't really use the standard documentation style as the rest of Base / stdlibs. In hindsight, I should probably have taken a bit different strategy.

The best way forward is probably to try to get stuff merged without too much nitpicking, get stuff in place, regenerate some jlls that can be used to test things out on, and then do eventual adjustments based on that.

@giordano giordano mentioned this pull request Sep 8, 2020
@staticfloat staticfloat force-pushed the sf/artifacts_rework branch 3 times, most recently from 7b22e7c to 0ca9993 Compare September 10, 2020 07:26
@staticfloat staticfloat marked this pull request as ready for review September 10, 2020 07:27
@staticfloat staticfloat changed the title [WIP] Move Artifacts and BinaryPlatforms into Base Move Artifacts and BinaryPlatforms into Base Sep 10, 2020
@staticfloat staticfloat force-pushed the sf/artifacts_rework branch 3 times, most recently from b7563f1 to 44db8d8 Compare September 10, 2020 16:14
@staticfloat
Copy link
Member Author

Alright, I've dropped the commit that bumps Pkg; this is now ready to merge. Once this is merged, I'll merge the PR against Pkg, then we can bump Pkg in a future PR which will switch Julia over to a Pkg version that drops its duplicate Artifacts and BinaryPlatforms code.

Adds a new `Platform` object that represents a computing platform as a
collection of tags mapping names to values.  Typical tags are things
such as processor `arch`, operating system, which libc is being used on
the platform, etc...  However more specific tags such as what version of
libgfortran the current Julia is linked against or what version of CUDA
is supported by the host are also representable.

Certain tags may require custom comparisons (such as Julia version
number, which should match using only the major and minor version
numbers) and so custom comparators are supported.
Imports `Pkg.Artifacts` into its own stdlib, Artifacts.  Includes the loading
and locating pieces of `Pkg.Artifacts`, but not the pieces for downloading and
installing artifacts, which still reside within Pkg.  Scripts that want to
generate artifacts should still use `Pkg.Artifacts`, but scripts that just want
to use artifacts that have already been generated are free to simply
`using Artifacts`, which should cut down on time to first load by a nice chunk.
If a package attempts to `@artifact"foo"` on an artifact that does not exist
the `Artifacts` module will dynamically load `Pkg` and invoke the downloading
routines through an inference barrier.

Also adds two new (minor) features:

* Slash-indexing: by popular demand, supports indexing into files within
  artifacts in one convenient syntax: `@artifact"FFMPEG/bin/ffmpeg"`
  will automatically perform the `joinpath()` for you after looking up
  the `FFMPEG` artifact.

* Compile-time constancy; when `@artifact_str` is given a constant
  `String`, it optimizes the generated code by looking up the
  appropriate hash at compile-time and embedding just that hash into the
  resultant code, saving on both lookup time and objects that must be
  serialized to the eventual precompilation `.ji` file.
@stevengj
Copy link
Member

stevengj commented Feb 9, 2021

a git bisect seems to indicate that this PR is triggering JuliaLang/IJulia.jl#968 … any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants