Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what should be in LOAD_PATH, DEPOT_PATH #25709

Closed
StefanKarpinski opened this issue Jan 23, 2018 · 13 comments
Closed

what should be in LOAD_PATH, DEPOT_PATH #25709

StefanKarpinski opened this issue Jan 23, 2018 · 13 comments
Assignees
Labels
design Design of APIs or of the language itself packages Package management and loading
Milestone

Comments

@StefanKarpinski
Copy link
Sponsor Member

LOAD_PATH

In #25455, I tried to keep things as compatible as possible, leaving the contents of LOAD_PATH as a superset of what was in it previously. The current default contents is:

[ [ Base.CurrentEnv(),
    Base.NamedEnv("v0.7.0"),
    Base.NamedEnv("v0.7"),
    Base.NamedEnv("v0"),
    Base.NamedEnv("default"),
    Base.NamedEnv("v0.7", create=true) ],
  Base.Pkg.dir,
  "/Users/stefan/projects/julia/usr/local/share/julia/site/v0.7",
  "/Users/stefan/projects/julia/usr/share/julia/site/v0.7" ]

This has the effect of looking in the following places:

  1. The first of the following which succeeds:
    1. parent directories of the current working directory with a Project.toml file, if any
    2. joinpath(DEPOT_PATH[1], "environments", "v0.7.0") if it exists
    3. joinpath(DEPOT_PATH[1], "environments", "v0.7") if it exists
    4. joinpath(DEPOT_PATH[1], "environments", "v0") if it exists
    5. joinpath(DEPOT_PATH[1], "environments", "default") if it exists
    6. joinpath(DEPOT_PATH[1], "environments", "v0.7") even if it doesn't exist
  2. The Pkg2 package directory indicated by JULIA_PKGDIR
  3. The local site package directory (architecture-specific) – currently not created by default
  4. The site package directory (architecture-independent) – this is where stdlib packages live

The first entry can be replaced and number of items by setting the JULIA_LOAD_PATH environment variable. The other three LOAD_PATH entries are added no matter what you do, which is not ideal since we would like for users to be able to control their load path entirely. However, that does mean that if the user does not include the stdlib environment in their load path, they will not be able to use stdlib packages.

The second entry is a legacy thing that allows us to continue loading Pkg2-installed packages. We should deprecate this whenever we replace Pkg2 with Pkg3: at that point, we can have a DeprecatedEnv loader that allows this to continue working but prints a warning message if the target directory actually exists. That way people can keep using Pkg2-installed packages during the 0.7 transition, but they'll get a warning that they should reinstall packages using Pkg3.

DEPOT_PATH

The new code loading system separates how to resolve what package names mean from where to look for installed versions of packages. The LOAD_PATH is involved in deciding what import Foo means in various contexts. A package name can mean one thing in your code while meaning a different thing to each of your dependencies; this is essential in Pkg3 since there is no longer a single global namespace of package names: METADATA is replaced with a federated system of package registries, which can be public or private.

Once the identity and version of Foo is resolved by looking in the LOAD_PATH, finding the code for that version of the package is a separate step, which involves looking through the DEPOT_PATH in the packages directory of each until an installed copy of the package is found (or isn't). Currently, the DEPOT_PATH defaults to only containing joinpath(homedir(), ".julia") – i.e. your "home depot" (I did not anticipate this pun when choosing the name "depot" but there it is). However, we'll want to support looking for installed packages in multiple different places, most likely including:

  1. You home depot, typically ~/.julia
  2. A platform-specific shared system directory, e.g. /usr/local/share/julia/site/
  3. A platform-independent shared system directory, e.g. /usr/share/julia/site/
  4. The standard library for the Julia binary you're using, maybe /usr/share/julia/stdlib?

I'm decreasingly sure what these paths should be so input and thoughts would be helpful here.

LOAD_PATH defaults

Since "what" is now addressed by LOAD_PATH while "where" is addressed by DEPOT_PATH, we generally want far fewer things in LOAD_PATH. In fact, when testing a project, you generally only want that project's environment – and maybe a a "test environment" – in the LOAD_PATH so that if you try to load anything that isn't recorded and identified in the project's Project.toml file, you'll get a failure. However, it's handy when developing to be able to augment the project environment with tools like debuggers, profilers, Revise.jl, etc.

What should be in the default load path in 1.0? Maybe just this:

[ [ Base.CurrentEnv(),
    Base.NamedEnv("v0.7.0"),
    Base.NamedEnv("v0.7"),
    Base.NamedEnv("v0"),
    Base.NamedEnv("default"),
    Base.NamedEnv("v0.7", create=true) ],
  "/usr/share/julia/stdlib" ]

If you start Julia in a project directory the, you'll only be able to load the project's dependencies and standard libraries – which, incidentally should probably also be recorded as project dependencies. Or maybe we want something more like this:

[ Base.CurrentEnv(),
  "/usr/share/julia/stdlib",
  [ Base.NamedEnv("v0.7.0"),
    Base.NamedEnv("v0.7"),
    Base.NamedEnv("v0"),
    Base.NamedEnv("default"),
    Base.NamedEnv("v0.7", create=true) ] ]

That would mean that you can load whatever's in the current project if it exists, the standard library, and whatever's in your default named environment, which would presumably include all your dev tools and other favorite packages.

Controlling the LOAD_PATH

We'll want some convenient ways to manipulate the LOAD_PATH, e.g. to run with only the current environment visible. Design ideas are welcomed here, but my thought was that we'd have some command-line options to control this such as:

julia --env=@ # just the current environment
julia --env=@devtools # just the named "devtools" environment
julia --env+@devtools # add the named "devtools" environment to the load path

It's a little weird to call this option --env when it manipulates the LOAD_PATH but maybe that's ok. We could also write it julia --load-path=... but I still like --env as a name for this since the LOAD_PATH is a list of environments to look in for dependencies.

@StefanKarpinski StefanKarpinski added packages Package management and loading design Design of APIs or of the language itself labels Jan 23, 2018
@StefanKarpinski StefanKarpinski added this to the 1.0 milestone Jan 23, 2018
@StefanKarpinski
Copy link
Sponsor Member Author

cc @KristofferC, @stevengj (among others)

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Jan 23, 2018

Note: for convenience and compatibility reasons, if you have a directory with package entry-points in the LOAD_PATH then code will be loaded from that directory without needing to be installed anywhere in the DEPOT_PATH. This allows people to continue to use LOAD_PATH as they previously have and is just generally convenient since sometimes you just want to have a directory with packages in it and be able to use them without having to go through the process of putting specific versions (identified by git SHA-1 tree hash) in the right place in a package depot. This is, in fact, exactly how package checkout works in Pkg3: there's a JULIA_DEVDIR location – joinpath(DEPOT_PATH[1], "dev") by default – where the package manager checks out packages. That should probably also be in the default LOAD_PATH so we need to consider that as well, in which case we have a potential default LOAD_PATH something like this:

[ Base.CurrentEnv(),
  joinpath(DEPOT_PATH[1], "dev")
  [ Base.NamedEnv("v0.7.0"),
    Base.NamedEnv("v0.7"),
    Base.NamedEnv("v0"),
    Base.NamedEnv("default"),
    Base.NamedEnv("v0.7", create=true) ],
  "/usr/share/julia/stdlib" ]

This would mean package name resolution would look in this order:

  1. current project (if it exists)
  2. checked out development packages
  3. default named environment
  4. stdlib packages

Reasoning for the order: the project should override anything it wants; if you're developing some package, that should come next; if you have standard things you like to use, that comes next, and finally whatever the system supplies by default should come last.

Alternately, when you do pkg> checkout Foo you check it out in joinpath(DEPOT_PATH[1], "dev") and then modify the first environment in which that package appears to point to the checked out copy. That way the dev dir doesn't need to be in the load path since checking something out modifies load path environments to find the checked out package instead.

@JeffBezanson
Copy link
Sponsor Member

What do nested arrays in LOAD_PATH mean? Is it just a convenient way to insert and remove several items at once?

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Jan 23, 2018

It resolves to a single item – the first one for which Base.find_env doesn't return nothing. So the

[ Base.NamedEnv("v0.7.0"),
  Base.NamedEnv("v0.7"),
  Base.NamedEnv("v0"),
  Base.NamedEnv("default"),
  Base.NamedEnv("v0.7", create=true) ]

entry evaluates to:

  • ~/.julia/environments/v0.7.0 if it exists, or
  • ~/.julia/environments/v0.7 if it exists, or
  • ~/.julia/environments/v0 if it exists, or
  • ~/.julia/environments/default if it exists, or
  • ~/.julia/environments/v0.7 whether it exists or not (that's what create=true means)

@JeffBezanson
Copy link
Sponsor Member

How does that differ from the behavior you'd get if the arrays were flattened?

@StefanKarpinski
Copy link
Sponsor Member Author

If ~/.julia/environments/v0.7.0 and ~/.julia/environments/v0.7 both exist, it doesn't matter what the contents of ~/.julia/environments/v0.7 is since it will never be looked in. Think of the top level of the array as AND and the next level as OR: you use all of the entries in LOAD_PATH to resolve names (first takes precedence), you only use one of the entries in an array entry to resolve names.

@StefanKarpinski
Copy link
Sponsor Member Author

Now that I've written up how the new code loading system works, it would be great if people could read that and then help me figure out the answer to this issue.

@stevengj
Copy link
Member

Note that /*/share directories are supposed to only contain architecture-independent files. .jl files qualify, but not binary dependencies (which should go in /*/lib or similar. Currently, Julia packages just install binary dependencies into their source directories under deps. Is there a mechanism for separating these two in the depot path?

@JeffBezanson
Copy link
Sponsor Member

I think the priority order for LOAD_PATH should be

Base.CurrentEnv()
0.7 named environment(s)
some system-wide package directory
stdlib

Then DEPOT_PATH only needs to contain ~/.julia. IIUC, the purpose of system-wide packages is to provide things available via using X by default, so the system needs to provide some sort of environment but not necessarily a depot.

How about --pkgenv for the proposed command-line option?

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Apr 13, 2018

Ok, that's good thinking—thanks for the input. I think that DEPOT_PATH should have a system location in it as well since you want the sysadmin to be able to install a bunch of packages at known-good versions so that users don't have to. In fact, if system packages are provided through the depot path, it's unclear to me that we even need the system-wide package directory. Although I guess they address separate problems:

  • A system-wide depot in the default depot path means that users can use the package versions in there without needing to download or install anything—they're already installed and working, just add them to a project and use them.

  • A system-wide package directory in the default load path provides tools that people can use by default in the REPL and other top-level code (scripts, etc.).

The reason I'm not sure about the latter is that the user can add those things to their own default named environment. Having a system-wide package directory at all just means that different people have different things available by default in the REPL, which I'm not sure is a good thing. I feel like the real reason sysadmins have pre-installed versions is so that everyone doesn't need to download, compile, install those things themselves since having many copies is a waste of space and they may be hard to compile and install correctly. The system-wide depot path entry already addresses that issue.

@StefanKarpinski
Copy link
Sponsor Member Author

A related concern is that we may want the package version resolver to favor versions that are already installed and available. Otherwise it won't really matter that there are pre-installed system versions of packages because people will do pkg> add XYZ and get the latest available version of XYZ instead of the one that's already on the system.

@JeffBezanson
Copy link
Sponsor Member

Ok, I can get on board with that. We just say that the purpose of system-wide packages is basically to download things for you (and de-duplicate), and not to muck with your environment. In that case LOAD_PATH is

Base.CurrentEnv()
0.7 named environment(s)
stdlib

and DEPOT_PATH is

~/.julia
system-wide depot

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Apr 13, 2018

Yes, I think that's the right answer. Excellent. I'll make a PR now that we have a decision.

StefanKarpinski added a commit that referenced this issue Apr 13, 2018
also:

- rename "site" directory to "stdlib" since that's what it is now

- use JULIA_LOAD_PATH as-is instead unconditionally appending the
  system package and stdlib directories to it

- change default DEPOT_PATH to include system paths: one for
  arch-specific and one for arch-independent packages

- delete comment about bundled code going in a versioned directory
  as it no longer applies since installed packages can and should
  be shared across different Julia versions
StefanKarpinski added a commit that referenced this issue Apr 18, 2018
also:

- rename "site" directory to "stdlib" since that's what it is now

- use JULIA_LOAD_PATH as-is instead unconditionally appending the
  system package and stdlib directories to it

- change default DEPOT_PATH to include system paths: one for
  arch-specific and one for arch-independent packages

- delete comment about bundled code going in a versioned directory
  as it no longer applies since installed packages can and should
  be shared across different Julia versions

- update Pkg3 and tests to work correctly when the stdlib directory
  isn't always included in LOAD_PATH

fix failing tests
StefanKarpinski added a commit that referenced this issue Apr 18, 2018
also:

- rename "site" directory to "stdlib" since that's what it is now

- use JULIA_LOAD_PATH as-is instead unconditionally appending the
  system package and stdlib directories to it

- change default DEPOT_PATH to include system paths: one for
  arch-specific and one for arch-independent packages

- delete comment about bundled code going in a versioned directory
  as it no longer applies since installed packages can and should
  be shared across different Julia versions

- update Pkg3 and tests to work correctly when the stdlib directory
  isn't always included in LOAD_PATH

fix failing tests
mbauman added a commit that referenced this issue Apr 19, 2018
* origin/master: (22 commits)
  separate `isbitstype(::Type)` from `isbits` (#26850)
  bugfix for regex matches ending with non-ASCII (#26831)
  [NewOptimizer] track inbounds state as a per-statement flag
  change default LOAD_PATH and DEPOT_PATH (#26804, fix #25709)
  Change url scheme to https (#26835)
  [NewOptimizer] inlining: Refactor todo object
  inference: enable CodeInfo method_for_inference_limit_heuristics support (#26822)
  [NewOptimizer] Fix _apply elision (#26821)
  add test case from issue #26607, cfunction with no args (#26838)
  add `do` in front-end deparser. fixes #17781 (#26840)
  Preserve CallInst metadata in LateLowerGCFrame pass.
  Improve differences from R documentation (#26810)
  reserve syntax that could be used for computed field types (#18466) (#26816)
  Add support for Atomic{Bool} (Fix #26542). (#26597)
  Remove argument restriction on dims2string and inds2string (#26799) (#26817)
  remove some unnecessary `eltype` methods (#26791)
  optimize: ensure merge_value_ssa doesn't drop PiNodes
  inference: improve tmerge for Conditional and Const
  ensure more iterators stay type-stable
  code loading docs (#26787)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself packages Package management and loading
Projects
None yet
Development

No branches or pull requests

3 participants