Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executable File Format, Yes or No? #10

Closed
dstufft opened this issue Nov 21, 2016 · 60 comments
Closed

Executable File Format, Yes or No? #10

dstufft opened this issue Nov 21, 2016 · 60 comments

Comments

@dstufft
Copy link
Member

dstufft commented Nov 21, 2016

There have been a number of issues/comments that tend to trace back to a single question.. Should Pipfile be executable or not?

The relevant comments are:

@defnull said in #8 (comment)
Parsing and editing Pipfile on the other hand is significantly more complicated that requirements.txt. IDEs will most likely not support Pipfile editing for a long time.

@jayfk said in #8 (comment)
On top of that, please don't forget server side tools working with dependencies (pyup.io, requires.io, etc.). There's no way to support Pipfiles if they allow to run arbitrary Python code. Local developer tools might have a chance to establish a working solution over time, server side tools don't.

@defnull said in #9 (comment)
Build or install tools (pip) cannot check if Pipfile and Pipfile.lock are out of sync without executing the Pipfile (which is bad, see #7) and cannot warn the user.

@defnull said in #7 (comment)
Executable build descriptions are a bad idea. See https://www.reddit.com/r/Python/comments/5e2vci/pipfile_a_new_and_much_better_way_to_declare/da9c2ku/ or uncountable blog posts and articles all over the Internet for very good examples. It should be common knowledge by now, still the same errors are made again and again.

Pipfile has a nice hybrid approach in that it allows developers to generate a static Pipfile.lock from a dynamic description. Tools can parse the static Pipfile.lock and know exactly what to do, while developers can work with the more convenient Pipfile and can be as smart or lazy as they want to. This approach might actually work very well.
For this to work, and for others like me to accept this idea, I thing the following point should be stressed out more: Tools work with Pipfile.lock exclusively. A build system should never automatically execute a Pipfile to generate a missing or outdated Pipfile.lock. The pipfile module should never be a build requirement.

@takluyver said in #6 (comment)
I don't like specifying metadata in executable files in general. #8 gives one reason why not - it's very hard to reliably modify scripts programmatically.

There's also some reaction on https://www.reddit.com/r/Python/comments/5e2vci/pipfile_a_new_and_much_better_way_to_declare/

@dstufft
Copy link
Member Author

dstufft commented Nov 21, 2016

Since the idea for this and the syntax came from me, here is my thoughts:

The first thing to remember is that this is not a replacement for setup.py and it is not a mechanism by which to build a package. The sole purpose of this file is the same purpose as requirements.txt, which is to make it possible to (re)create an environment with a particular set of installed packages. The most typical case you'd want to do this is if you have a particular instance of a Python application that you want to deploy. Thus any project that is publishing sdists and Wheels to PyPI will not have a mandatory Pipfile (they have one that they use for say, development that includes additional development utilities). It would be a good idea to read Yahuda Katz's Clarifying the Roles of the .gemspec and Gemfile blog post and just mentally replace Gemfile with Pipfile, Gemfile.lock with Pipfile.lock, and .gemspec with setup.py.

There are a few major problems with requirements.txt currently. The ones that I can think of off the top of my head are:

  • The file format is incredibly simplistic which makes the simple case very simple, but the complex case quite annoying. For instance, trying to shoe horn in the new hashes in pip 8 very quickly leads to multi line disasters like Warehouse's requirements/main.txt. You can also see this in things like how we attempt to cram more and more information into the url structure of a non named dependency (e.g. git+https://github.com/dstufft/pipfile.git@branch#egg=pipfile-1.0&subdirectory=mysubdir) and also in how we handle things like project specific build parameters (e.g. --install-option) and also in our environment markes. The current file format does not lend itself to a sane way to express option, named parameters to the dependency and that leads to some very crazy lines in the current format.
  • The format exists in a weird in between state where sometimes people use it with broad generic dependencies that represent the expected version ranges of the top level items they care about in recreating a specific environment and sometimes people use it with extremely locked down versions in order to exactly recreate a specific environment. Both have issues though.
    • If you're simply providing your top level requirements, then you're not likely using requirements.txt to your full benefit. This can make sense if you've got something that you expect people to only clone and then create a brand new environment but that shouldn't be your default case. This is the thing that requirements.txt makes easiest.
    • If you're version pinning every dependency (which you should in the common case) you're losing out on pip doing resolution for you. You have to go and manually manage dependencies, know when to remove some sub-dependency because it is no longer required, etc. There are a few mechanisms people have used to try and work around this like install and then pip freeze (although if you don't use a fresh environment you risk pulling in unrelated stuff you just happened to have installed) or using something like pip-compile.
  • The requirements.txt format appears deceptively simple to parse, leading people to do things like read it in their setup.py or developing tools that do that like pbr that don't (and cannot) actually support everything that requirements files support leaving those projects to either error out or silently ignore directives.
  • It is difficult to add additional new features to requirements.txt such as the ability to have named groups of dependencies because of the overly simplistic file format does not provide a reasonable way to add it.

Given all of the above, I set out to design a new format that would solve those issues. At the time I came up with three distinct options:

# Proposal #1
#  This attempts to closely match the Gemfiles from Bundler, this might mean that
#   people would find it easier to pick up. However it could also mean that people
#   will think down of it as a "copy" and the differences might be noticed more since
#   The syntax is matched better.
source "https://simple.crate.io"
source "https://pypi.python.org/simple/"

dist "requests"
dist "Django", "~>1.4"
dist "pinax", git = "git://github.com/pinax/pinax.git", branch = "1.4"
dist "crate", path = "~/blech"
dist "test", group = "development"
dist "test2", group = ["development", "testing"]

group "development":
    dist "blah"
# Proposal #2
#  This proposal takes the ideas from a Gemfile, and makes them more "Pythonic".
#   Instead of creating a DSL, simple functions are used. This shares a lot of the
#   same benefits as #1 in that people might find it easier to pick up as some will
#   already know Gemfiles. An obvious negative is that it tends to be a bit more verbose
#   than #1.
source("https://simple.crate.io")
source("https://pypi.python.org/simple/")

dist("requests")
dist("Django", "~>1.4")
dist("pinax", git="git://github.com/pinax/pinax.git", branch="1.4")
dist("crate", path="~/blech")
dist("test", group="development")
dist("test2", group=["development", "testing"])


group("development",
    dist("blah"),
)
# Proposal #3
# This proposal takes the power introduced via 1 and 2, but uses yaml to create a
#  (hopefully) easy to parse set of requirements. The given example makes heavy
#  use of inline style of declaration so that one line == one dependency. One area of
#  concern is how to handle the "default" group. It will need to exist somehow. As a simpler
#  way here the default group can be defined outside of a groups: tag, but the question remains
#  what should that tag be called?
sources:
  - https://simple.crate.io
  - https://pypi.python.org/simple/

require:
  - requests
  - [Django, ~>1.4]
  - [Pinax, {git: "git://github.com/pinax/pinax.git", branch: 1.4}]
  - [crate, {path: ~/blech}]
  - [test, {group: development}]
  - [test2, {group: [development, testing]}]

groups:
  development:
    - blah

My thoughts at the time were:

After writing these down and looking at them, my thoughts so far:

  1. No. This doesn't look like it belongs anywhere near a Python application.
  2. I'm on the fence about this one. It definitely looks like Python, and it would be easy to implement, however I worry that the fact it is/looks like Python would make people try to program in things that don't belong in this kind of file. On the other hand this file shouldn't ever get used as something like PyPI metadata so maybe the extra power would be useful.
  3. I like how clean this looks. I worry about finding a good name for the "require" (maybe "require" is fine?). I don't know exactly how I feel about the inline syntax but there's no denying it looks cleaner than (2).

Since then I had picked up on the second option and refined it to look like:

# This is designed to be edited by hand by python developers
# --index-url and friends look like command options, non inutive. No extra metadata available
source("https://simple.crate.io/")
source("https://pypi.python.org/simple/", verify_ssl=False)


# Design:
#  - Use real datastructures, make things clearer
#  - Seperate package name from version, using real strings.
#      Django==1.4 is a valid PyPI package, uninstallable from current files
#  - using kwargs creates a great way to provide optional options on a line by line basis
#     that python programmers are already familar with
#  - People should only have to worry about the things they depend on, the installer
#      should do the right thing
#  - People have different dependency based on environment
#  - Allow advanced usage for "wierd use cases" or "patterns not anticipated"

# Concerns:
#  - Using Python file might cause the same problems as setup.py
#    - This File not designed to be directly executed
#    - setup.py is typically sent to other people, requirements are typically for internal use
#    - Final result is still deterministic
#  - Somewhat more verbose
#    - Uses a syntax familar with all python programmers


dist("requests")  # Install the latest version of requests, and all dependency's
dist("Django", "==1.4")  # Install a version of Django matching the "==1.4" version spec and all dependencies
dist("pinax", git="git://github.com/pinax/pinax.git", branch="1.4") # Install pinax, using git and the 1.4 branch
dist("crate", path="~/blech", editable=True) # Install crate from the supplied path, and install it as an editable
dist("test", group="development")  # install test, but only if the development group is passed
dist("test2", group=["development", "testing"])  # install test2, but only if the development or testing group is passed

with group("development"):  # start a context that is equivilant to passing everything a certain group
    dist("another thing")   # install ``another thing`` if the development group is passed

with source("https://inhouse.example.com/"):
    # Things in here MUST be installed from the above source, the idea being that if you have forked, e.g. django-model-utils
    # you can uplaod to an inhouse server and force django-model-utils to install from this source, even in the case
    # there is another version higher then yours, or even the same version.
    # Additionally if packages installed inside of a with source(): have dependencies their deps are searched for
    # on source, and installed from there if possible. However if they are not available dependencies will fall back
    # to the global source()es.
    dist("django-model-utils")  # Not shown in the json lockfile.

# Details
#  - This file does NOT cause anything to directly get installed, the program uses it to build up an internal list of requirements
#  - All dist()'s are considered for the requirements, even if they are not going to get installed (e.g. they are in a not currently active group)
#    - This will allow things to work smoothly where the requirement in a group might modify the final installed version (e.g. a group might pin to Django<1.4)
#  - This file will be "compiled" down to json.
#    - When the compiled json exists, there is no need to do any sort of version resolving. We know exactly what it is we want to install. (e.g. --no-deps etc)
#    - We still need to find out where to download the packages from? (Unless we encode that in the json. Might be risky)
#  - If there's a corner case not thought of, file is still Python and allows people to easily extend
#  - You don't need to pin to exact versions, you just need to pin to what you want to support. e.g. if a pacakge follows semver, you could do package>1.2.1,<1.3 (you first started using the 1.2.1 version, and you expect 1.2.X to always be good for you.
#    - Exact versions are pinned automatically in the json file

Which is basically what you see as the idea of Pipfile now, modulo any adjusts that @kennethreitz has made.

My reasoning for going with Python instead of one of the other options are:

  • The syntax is something that's going to be familiar to all Python developers, so other than learning the "API" used in the file they don't need to learn anything special outside of that.
  • It provides good support for optional, named parameters that don't feel janky (e.g. adding them to the YAML option requires switching to using an inline list or dictionary or something or you make the common case much harder).
  • It's more flexible which means things like with source() and with group() are more easily done and they can be more easily combined without having to add an explicit way of it.
  • This file is generally not a security sensitive one, it's not something you're going to get from PyPI, it's not something that you're likely to be fetching from random people and executing.
    • At the time I originally spec'd this out projects like pyup.io, requires.io, etc didn't really exist so the only time this file was getting used was in an installation context.
  • Making it "Just Python" allows people to shim in things that we maybe forgot to add and then we can look at baking in more proper support down the road.

So where do we go from here? I see a few possible answers:

  • We just forge on ahead with the current assumptions and make a few of the not-primary cases harder (with the benefit that most of those cases are centered in other tools that can solve the problem once for their users, while requirement.txts current problems push the problems onto end users of this file).
  • We come up with something that is not Python like using YAML or something like it.
  • We make something that is Python like, but isn't actually Python. I think this would end up being a restricted subset of Python that is "safe". By safe I mean, not executing the file at all but parsing it using the AST parser and using that to extract out the information we need.
    • This gets tricky because the AST isn't going to provide us any of the runtime pieces of Python. Things like declaring a variable and then using them later on would need to either be coded ourselves to try and mimc what Python does or unsupported as not part of our restricted set of Python code.

Personally I think the correct way forward right now is to just roll with the Python syntax to nail down the API and execute the file in order to get the information out of it. However we should do the things mentioned by @ncoghlan in #6 to discourage people away from doing things that make this file not declarative. Before we make this a final thing that people actually are expected to use we can refactor out the exec() step and move to an AST based parser if we want to do that.

@defnull
Copy link

defnull commented Nov 21, 2016

Thanks for the clarification (your post should be a must-read for everyone participating in this discussion) but you did not address any of the issues in #8 and #9.

I also think that a system as powerful and flexible as the current idea of Pipfiles will be used to replace setup.py eventually, and no number of blog posts will prevent that.

Your YAML proposal actually looks way better to me that the executable approach, limited or not (#6).

@seemethere
Copy link

I agree with @defnull, the YAML approach does look better to me as well. Having to learn a whole new syntax just for a Pipfile seems a little out there.

@dstufft
Copy link
Member Author

dstufft commented Nov 21, 2016

@defnull It can't replace setup.py. It has no mechanism for doing anything other than listing dependencies to install. People can't list .py files in their own project to install, they can't select C files to compile, they can't list classifiers or a long description and it won't produce any sort of artifact they can upload to PyPI. When they go pip install . or pip install <thing> pip will not read this file in dependencies or anything. It only works at the top level when passed explicitly to pip.

@defnull
Copy link

defnull commented Nov 21, 2016

It actually sounds like you are going with the python-like syntax because it's easier for now. Please remember that removing features is way more difficult than adding features. Starting powerful and restrict it later is a very risky approach.

@dstufft
Copy link
Member Author

dstufft commented Nov 21, 2016

@defnull Going with the Python-like syntax because I think it presents the best API out of the current options, going with a simple exec based parser for now in leu of a more complex AST based parser (which would be completely safe) because it's easier at the moment.

@defnull
Copy link

defnull commented Nov 21, 2016

@dstufft So you suggest that Pidfiles should look like python, be parsed by the python AST parser, but not executed at all? Local variables, loops and other constructs won't work? Imports won't work?

Then you have a save DSL (which is nice) that is hard to parse (by anyone but python), non-standard, and has no advantage over existing formats like YAML or TOML.

@kennethreitz
Copy link
Contributor

It's important to remember that nothing is final until this lands in Pip proper — this is just a library to prototype the idea out and move forward with an implementation as opposed to debating forever :)

@kennethreitz
Copy link
Contributor

Static analysis should only be encouraged on the generated lock file. The Pipfile is simply a tool to generate that in a friendly manner (which doesn't exist today).

@takluyver
Copy link
Member

In support of @defnull: the advantages of using Python syntax seem to be:

  1. Easy to 'parse' by executing it.
  2. Allows arbitrary logic in generating dependencies

A safe AST-based parser would throw away both of those advantages.

1 is not much of an advantage in any case if the alternative is a common format like TOML/YAML/JSON, because there are already parsers for those.

@takluyver
Copy link
Member

Reading back through @dstufft's long post in more detail:

  • The syntax is something that's going to be familiar to all Python developers, so other than learning the "API" used in the file they don't need to learn anything special outside of that.

Fair enough. I agree that TOML and YAML are both less familiar, though many developers are used to using them in some other context. JSON is probably familiar to a lot of people, but its lack of comments make it a bad choice for human-written files.

  • It provides good support for optional, named parameters that don't feel janky (e.g. adding them to the YAML option requires switching to using an inline list or dictionary or something or you make the common case much harder).
  • It's more flexible which means things like with source() and with group() are more easily done and they can be more easily combined without having to add an explicit way of it.

These points seem to amount to a preference for Python syntax for declaring data. That's valid, and I agree that most declarative formats don't allow the mixture of positional and named values that function signatures do. But if this leads to having to scrape data out of a Python AST, I don't think the syntactic nicety is worth using a nonstandard format and special tooling.

  • This file is generally not a security sensitive one, it's not something you're going to get from PyPI, it's not something that you're likely to be fetching from random people and executing.
    • At the time I originally spec'd this out projects like pyup.io, requires.io, etc didn't really exist so the only time this file was getting used was in an installation context.

This counters an argument against using Python, but it doesn't really provide a reason to pick Python.

  • Making it "Just Python" allows people to shim in things that we maybe forgot to add and then we can look at baking in more proper support down the road.

If we're considering parsing data out of the AST instead of executing it, this advantage would presumably go away.

@jayfk
Copy link

jayfk commented Nov 21, 2016

Another point is that writing to a file with Python like syntax will be a lot harder than using something battle tested like YAML, TOML or JSON where parsers are widely available.

Since the Pipfile also contains the SpecifierSet, simply writing to the resulting lockfile won't be enough. There has to be a way to write to the Pipfile directly, or we'll have to give up a lot of tooling that has evolved around dependency management. (Auto-pinning, updates, sub dependency resolution, etc.)

@dstufft
Copy link
Member Author

dstufft commented Nov 21, 2016

I agree that if we move to an AST based parser then we're essentially just looking at a DSL that happens to be a restricted subset of Python. I care a whole lot less about how hard it is to write the pipfile library than I do about how hard it is to write an actual Pipfile for the simple reason that there is going to be a small number of people who ever touch the library while a massive number of people who will write Pipfiles.

So we can restrict this to simply trying to figure out what the best format is for a DSL or file format for this and shelve execution for right now.

I'm going to exclude JSON right off the bat because a lack of comments makes this unsuitable for a human writable file IMO.

When I look at YAML I can't help but feel like it's simply not powerful enough of a format. For instance, looking at my original sketch of:

sources:
  - https://simple.crate.io
  - https://pypi.python.org/simple/

require:
  - requests
  - [Django, ~>1.4]
  - [Pinax, {git: "git://github.com/pinax/pinax.git", branch: 1.4}]
  - [crate, {path: ~/blech}]
  - [test, {group: development}]
  - [test2, {group: [development, testing]}]

groups:
  development:
    - blah

The problem here becomes that composing becomes difficult. For instance, how do I require something that comes from a specific source in my development group? Currently the only ways I can see are things like:

require:
    - [mything, {group: development, source: "https://internal/"}

groups:
  development:
    - [mything, {source: "https://internal/"}

from_source:  # This can't be "sources" because that's already used
  "https://internal":
    - [mything, {group: development}

Whereas in the Python inspired DSL, you can do any of:

package("mything", group="development", "source="https://internal/")

with group("development"):
    package("mything", source="https://internal/")

with source("https://internal/"):
    package("mything", group="development")

with group("development"):
    with source("https://internal/"):
        package("mything")

This flexibility lets you make things a lot cleaner and easier to read in a lot of situations whereas the YAML/JSON (I haven't messed with trying to spec a TOML version out) feel like a slightly improved version of what we already have where it becomes really ugly to do anything beyond the simple case. However, the DSL provides a much nicer ability to add even more utility beyond what's in my simple example.

For example, what if there are a set of libraries that I want to install on Windows only?

package("library", environment="sys_platform == 'win32'")

with group("development", environment="sys_platform=='win32'"):
    package("library")

with source("https://internal/", environment="sys_platform == 'win32'"):
    package("library")

with environment("sys_platform == 'win32'"):
    package("library")

    with group("developlment"):
        package("other-library")

You can even imagine some baked in things like:

with environment(env.Windows):
    pass

Where env.Windows could just be a stand in for sys_platform == 'win32'. This too becomes harder to do in a format like YAML/JSON/TOML because the only real way of doing it is to either diverge from their spec, use something like YAMLs !!type support, or just include a magic string that is supposed to be interpreted as a stand in for something else.

I can imagine this being extended to other things as well, wanting to select binary versus source, turning on --pre for the entire set (or for just a subset of them), etc.

As one of the people who have to actually implement and maintain the thing that interprets this file, I am sympathetic to the goal of wanting to make it as easy as possible to implement it*. However, I think that making it easy and pleasant to actually write and read these files for a human matters far more. I will trade implementation complexity for a better end user experience every day of the week. Now I could be totally wrong and end users will hate using a Python inspired DSL, but certainly for my own use cases the YAML/JSON options feel like something I would not be very enthused about using.

* One thing I'd mention in addition is that I'm not entirely sure that the off-the-shelf libraries are even going to be particularly useful here. For instance, if you're writing a tool that will submit a PR that updates Pipfile, then you're going to want something that can handle round tripping comments and such correctly (inline versus not, quoted string versus not, etc). As best as I can tell, it appears that pyYAML does not support this and the fork ruamel.yaml only supports it if you're not using the C loader/writer. I'm struggling to find information about YAML parsers in other languages and what they support, but a quick search suggests that round tripping YAML files is the exception rather than the rule. If that's true then the "off the shelf" factor in the libraries only matters for tooling that needs to simply read the file and not write it at all and afaik all of the tooling indicated thus far includes features that would cause it to need to write out the file. If folks are generally stuck using ruamel.yaml in Python in order to get round tripping of Pipfile, then it's not really much of a burden to expect them to just use pipfile in Python instead.

@jayfk
Copy link

jayfk commented Nov 21, 2016

I tend to agree with you here @dstufft. Bundling a YAML parser with pip is also going to be a nightmare on Windows based platforms.

@defnull
Copy link

defnull commented Nov 21, 2016

YAML surely has its own drawbacks (that is why Cargo uses TOML instead), but you can simplify the structure a lot if you allow for some repetition, or just rearrange some things. The structure can also be dynamic (e.g. allow the same value to be a literal, a list of literals or a map, depending on the requirements) to make it more convenient for the easy case, but also support complex cases.

sources:
  default: https://pypi.python.org/simple/ # implicit default, may be skipped.
  internal:
    - "https://internal/"
    - "https://internal.backup/"
    - {url: "git://github.com/pinax/pinax.git", branch: "1.4"}
groups:
  develop:
    source: internal # This is now the default for members of that group
require:
  mything: ">1.4"
  "my-other-thing": {group:"bleeding-edge", source:"git://github.com/pinax/pinax.git", branch: "1.4"}

It's a stupid example, but I want to show that YAML can be very flexible. A source definition may be a single string, a list of strings, a map or even a list of mixed types, if it fits the use-case. Duck-typing is considered pythonic, after all.

Edit: Parsing YAML is not easy, but at least it's platform and language independent and there are a lot of battle-tested libraries out there. There is a reason why YAML is the defacto standard for complex configuration nowadays. (I'd still prefer TOML, by the way)

@tony
Copy link

tony commented Nov 21, 2016

I think I bring some experience of the pains of YAML at scale and pleasures of bundler to the table. Also, I've done some whacky customized bootstrap files. Also I've used PIP style URL's in vcspull, check out how I handle many VCS' in my .vcspull.yaml file.

YAML is a pain to scale. It's easy to mess up indentation of yaml and get unexpected results too. Saltstate's are a pretty tricked out form of YAML (add's jinja2). Handling requirements in the declarations can be very error prone and hard to debug.

See, in vcspull, I took the habit of being very flexible (like Salt was) in how you can use shorthand declarations in the yaml to make it look more brief. This leads to a lot of inferred behavior and complication in the parsing the yaml that creates confusion.

It would be a good idea to read Yahuda Katz's Clarifying the Roles of the .gemspec and Gemfile blog post and just mentally replace Gemfile with Pipfile, Gemfile.lock with Pipfile.lock, and .gemspec with setup.py.

Having done some ruby / rails (after 2 + years of straight python) Gemfile's felt like a dream. Will pipfile also handle virtualenv type stuff? Will bundle exec style commands be possible?

Prefer Example #2, @dstufft . I think Pipfile should resemble something close to Python. Even if it's being parsed custom in some way.

edit: after consideration the post below, I'm also open to JSON / YAML. Nothing wrong with json either. It should make it easier to export freeze/locks and snapshot stuff.

@tony
Copy link

tony commented Nov 21, 2016

There is a plus side to using a declarative format like YAML and JSON though, in https://github.com/tony/tmuxp and https://github.com/tony/vcspull, we're able to use JSON or YAML, since it pretty much is a python dictionary in the end. See https://github.com/emre/kaptan.

So while I'm sure you don't want the overhead of Pipfile having json and yaml to deal with, perhaps it's true packages will end up being represented internally as (a list of) dictionaries, which end up getting exported to json/yaml anyway

Also, json is part of the standard library. No external dependencies like yaml has with pyyaml.

Note: I'm not making suggestions or trying to sway anything, just relaying info. Would love to help this effort however possible (testings, docs, etc).

@takluyver
Copy link
Member

@dstufft you say we shouldn't worry about how hard it is to parse this format. But that only holds if the one parser you write is the only one that anyone ever uses. If e.g. PyCharm wanted to implement support for automatically adding dependencies to a Pipfile, they would either have to code their own machinery in Java, or shell out to a Python tool to deal with it. And would your tool aim to support programmatic modification, or only parsing?

Standard formats mean that parsing is a solved problem for most mainstream languages, and there are some tools already out there to do non-destructive modification. I think that's worth using a slightly less convenient syntax for.

@brianbruggeman
Copy link

I felt obligated to ask about a methodology/thought for identifying system level packages. There are more than a few packages with actual system dependencies, not just python dependencies. If you want massive adoption, adding this feature will be a huge net positive. Also, I understand the level of difficulty this ask is.

+1 to extending python and building out a better setup.py and not another *file, because quite frankly, declarative system installs are pretty terrible as solutions once you have a moving install target. I know everyone wants to make it simple, but it's just not.

-1 on developing yet another DSL with it's own set of rules when you have Python available.
-1 on using an esoteric and not well adopted format such as TOML.
-1 on JSON for the same obvious reasons

+1 on YAML. The same arguments about indentation can be said about python. YAML doesn't really change this aspect. Presumably, the vast majority of full-time python developers have indentation down. I'd argue some of the other problems related to YAML have more to do with poor or non-existent error handling within the yaml tooling and less to do with the format itself. Decomposing or composing YAML has been done in both salt and ansible. They're not shining examples, but given the other options, I really think YAML is second best after improving setup.py directly. Just like code, though, YAML can be written well or not.

@tony
Copy link

tony commented Nov 21, 2016

How far could we realistically push JSON or YAML?

What would be the shortcomings?

Admittedly, a “Gemfile” type of Pipfile is much prettier IMO.

But a pure JSON/YAML is rugged, utilitarian, ubiquitous. Has none of the downsites of
a custom parser.

You can just export a vanilla dictionary to pretty JSON/YAML.

Also, would it be worth it to create a Google Doc to weigh pros / cons
(sort of like how Golang does it?)

@nchammas
Copy link
Contributor

@brianbruggeman

+1 to extending python and building out a better setup.py and not another *file, because quite frankly, declarative system installs are pretty terrible as solutions once you have a moving install target. I know everyone wants to make it simple, but it's just not.

Pipfile is not a better setup.py. It's a better requirements.txt. The difference between setup.py and requirements.txt is explained in this post by Donald.

The proposed replacement for setup.py is something completely different. It's pyproject.toml, and you can read about it in PEP 518.

@dstufft
Copy link
Member Author

dstufft commented Nov 21, 2016

@takluyver pip will want to support programatic modification for things like --save and such, so we're going to end up supporting it one way or another.

My thing isn't that "nobody is ever going to have to implement this because I'm going to implement the one true parser and then everyone can use that!". My thing is that the number of people who are ever going to have to implement or work with the internals of an implementation are fairly low, possibly (probably?) fewer than 100 total people in the world, but even if it's 1000 people that's still VASTLY fewer people than will be writing, reading, and editing these files so I care a lot more about what presents the best interface for them over what makes it easier on the handful of people who will do the work here.

So folks doing things in Python? Most (hopefully all?) of them can just use the exact same library that pip itself uses. Folks doing things in other languages? Well someone in that language will need to write that parser, but that's a limited number of people. Presumably PyCharm already has some mechanism for parsing Python AST given that it needs to do that to do things like syntax highlighting, auto completion, etc, but at the very worst, yes they can also shell out to Python if they really need to.

@dstufft
Copy link
Member Author

dstufft commented Nov 21, 2016

@tony In the end this gets turned into some dictionaries (and in fact, our lockfile is currently proposed to be JSON, so it gets "compiled" into JSON either way, although our lockfile's structure is not designed with humans in mind but with computers). This means that it would be possible to have a DSL, a YAML, and a JSON representation of the same data and let end users pick. I don't think that is super valuable here though other to me so I don't have to make a decision ;).

I think that supporting N formats ends up making it even harder on people (now they need to track down or write N different parsing/emitting libraries that can round trip successfully) and most folks are going to settle on a single format anyways.

@tony
Copy link

tony commented Nov 21, 2016

and most folks are going to settle on a single format anyways.

Yep, agreed. Saltstack technically allows more than just salt's version of yaml, but in practice, the community almost always uses it, even with the opportunity to plug in JSON and other stuff,

Even with vcspull and tmuxp examples I used above, despite me offering the choices of JSON too, people tend to just want YAML.

So agreed, the community tends to agree on a single format in the end. 😄

and in fact, our lockfile is currently proposed to be JSON, so it gets "compiled" into JSON either way, although our lockfile's structure is not designed with humans in mind but with computers

got it.

Saying that, the python-inspired DSL (such as that in proposal 2) is just beautiful. very human friendly.

@brianbruggeman
Copy link

@nchammas I disagreed with that Pep when I saw it in May and I still do here in November. The arguments made there seemed rather biased from a very personal viewpoint and not from a well thought-out set of technical ones. That makes me think there really isn't much discussion or push-back at all when there really should be. This discussion feels the same way.

@flying-sheep
Copy link

flying-sheep commented Nov 21, 2016

Let's go for TOML:

  1. A python based syntax is 1. unnecessarily complex because it was created to support more than this needs 2. More to learn: you need to know what works and what doesn't when compared to Python 3. Not well documented: we need to write grammar and docs ourselves 4. Confusing: people already think it's executable on Reddit and in this thread
  2. YAML is 1. Complex: parsing is hard so people usually wrap a C based parser, which yields to 2. Bundling it with pip is harder. Other than that it's pretty good
  3. TOML is 1. Not well known but 2. It's relative INI is well known 3. Very easy to learn and use 4. Easy to parse and well documented (by far the best specs and documentation or of any option) 5. Can be cleanly used for both lock file and config file, and so on.

For me, TOML is by far the best option, and it being little known is easily offset by maturity, and simplicity in use and scope

@dstufft
Copy link
Member Author

dstufft commented Nov 21, 2016

This is not an appropriate venue to (re)legislate PEP 518 or the fundamental difference between setup.py/pyproject.toml and requirements.txt/Pipfile. The appropriate place for that is distutils-sig. As of now the successor to setup.py is going to occur via the mechanisms started in PEP 518 and there is a fundamental difference between the two systems.

@kennethreitz
Copy link
Contributor

@coderanger
Copy link

coderanger commented Nov 21, 2016

FWIW, on the Ruby side of life it has been very useful to me over the years to have Gemfiles be Ruby code. It allowed implementing a version of Gemfile inheritance (via instance_eval) years before it was added to bundler itself and I still use this snippet in basically all of my Gemfiles to simplify development setup and Travis testing.

@HolgerPeters
Copy link

I strongly suggest not to make Pipfile a Python file, nor a subset of Python, parsed by a specific DSL-parser. making it a pure-Python file opens the door to all these hacks we already have in the setup.py field in Python (it might have been a good idea when conceived, but it lead to so many problems. Let https://docs.scipy.org/doc/numpy/reference/distutils.html be a hint). Using a subset of Python as a DSL feels like reinventing the wheel and will mean that I as a user of python packaging tools have to learn a DSL that is surely similar like Python, but subtly different. And I won't be able to apply this to any other project.

I am convinced that Pypa should look into Rust's Cargo.toml as an inspiration. Rust's packaging infrastructure is not "right by accident", but by careful research and design. Then, either use, TOML, YAML or ConfigParser formats, but please do not roll your own DSL or make it a programmatic Python file.

@FRidh
Copy link

FRidh commented Nov 22, 2016

What makes specifying Python environments different then that of other languages? Are there differences? If so, do those differences warrant yet another format? How is this going to handle environment requirements that are not Python packages?

With Nix we have a DSL that is used for specifying packages and environments. While the Nix package manager and OS isn't a solution for everybody, I think using the expression language could be beneficial.

@HolgerPeters it seems Cargo.toml and Cargo.lock are also inspired by Gemfile and Gemfile.lock.

Anyway, please try to reuse another format when that's possible.

@k0nserv
Copy link

k0nserv commented Nov 22, 2016

Regarding

Parsing and editing Pipfile on the other hand is significantly more complicated that requirements.txt. IDEs will most likely not support Pipfile editing for a long time.

I don't agree that requirements.txt is simpler, when trying to manually edit requirements.txt I often find myself having to lookup the syntax to get it right. This is why I think a Python executable file is a better idea. In a Python executable file the syntax doesn't need to be learnt aside from the API. The benefits of having a language level API is that it can be loaded into document browser such as Dash and configured IDEs will autocomplete and let the user browse the docs directly in their editor. A well designed Python API is simpler and more discoverable than a YAML/TOML/JSON based syntax.

@HolgerPeters
Copy link

HolgerPeters commented Nov 22, 2016

@HolgerPeters it seems Cargo.toml and Cargo.lock are also inspired by Gemfile and Gemfile.lock.

@FRidh It is, but IIRC it also incorporates learnings from stuff that wasn't optimal in Gemfile and Gemfile.lock i.e. it improves upon them.

@sirex
Copy link

sirex commented Nov 22, 2016

I think, Pipfile should use configparser format and instead of using separate Pipfile file, existing setup.cfg file should be reused, so that all package metadate and dependencies would be stored in one place.

A Pipfile section in setup.cfg file could look like this:

[requirements]
sources =
  https://simple.crate.io
  https://pypi.python.org/simple/

require =
  requests
  Django ~= 1.4
  Pinax = git://github.com/pinax/pinax.git?branch=1.4
  crate = ~/blech
  test = development
  test2 = development, testing

groups.development =
  blah

groups.testing =
  pytest

See also: https://pypi.org/project/d2to1/

@FRidh
Copy link

FRidh commented Nov 22, 2016

Here's an example with an expression written in a functional language, inspired by Nix (actually, it is valid Nix).

{pythonTools}:

with pythonTools; 

let
  pandas = package {name='pandas'; version='19.1', src='https://github.com/pandas-dev/pandas/archive/v0.19.1.tar.gz';};
  django = package {name='numpy';};
  cffi = package {name='cffi'};
  enum34 = package {name='enum34'};
  pytest = package {name='pytest'};
in rec {
  dev = env {name='devEnv'; packages=[ pandas django ] ++ !(isPyPy) [ cffi ] ++ pythonOlder '3.4' [ enum34 ];);
  testing = env {name='testEnv'; packages= dev.packages ++ [ pytest ];};
}

So what do we have here? Our file is actually a function returning a set of environments. It takes one argument, pythonTools, which is a set (or Python dictionary) containing helper functions. With the with statement, we bring these functions in scope.

In the let expression we define 5 packages. We have a function package to indicate it is a package and the function takes a set as argument.

We then use these items in another set ( which is the return value of this function) to define two environments. A function env is used to indicate/construct the environment. An environment has a name (which I think we could drop) and a list of packages. cffi is a built-in of PyPy interpreter, so it is not always needed. enum34 is also only needed for certain Python versions. The second env requires all packages the other has as well, so we refer to the packages in that env. This is possible because we've defined a recursive (rec) set.

A functional language (like Nix) allows more than a serialization language (like JSON) while still being entirely declarative and requiring only a single file.

@takluyver
Copy link
Member

To summarise the suggestions so far:

  1. Full executable Python code
  2. Python syntax with data parsed from the AST, not executable
  3. Nix functional syntax
  4. TOML
  5. TOML with an optional executable Python Pipfile.in (Nick's idea)
  6. YAML
  7. Configparser INI

So far, no suggestions for RDF or Lisp S-expressions. ;-)

@flying-sheep
Copy link

there was also a suggestion on reddit: strict YAML

@domenkozar
Copy link

domenkozar commented Nov 22, 2016

Please, let's not repeat our mistakes again based on our bad habit of having too much power when specifying dependencies.

Take a look at Haskell's Stack example to see it's fully declarative, easily parsable and simple:

flags: {}

packages:
- '.'
- location:
    git: https://github.com/serokell/universum
    commit: 8e33495fd58c5443e0c2b0f1b6646516a47bd8d6
  extra-dep: true
- location:
    git: https://github.com/serokell/time-warp.git
    commit: 1758ce25ab96f01e8979379e66dea3c7dae6c8c4
  extra-dep: true
- location:
    git: https://github.com/serokell/log-warper.git
    commit: 5de577c3ab25e6f9a4350a9646050a88b2b8996e
  extra-dep: true
- location:
    git: https://github.com/serokell/acid-state.git
    commit: 95fce1dbada62020a0b2d6aa2dd7e88eadd7214b
  extra-dep: true
- location:
    git: https://github.com/input-output-hk/pvss-haskell.git
    commit: 1b898a222341116d210f2d3a5566028e14a335ae
  extra-dep: true
- location:
    git: https://github.com/serokell/kademlia.git
    commit: 062053ed11b92c8e25d4d61ea943506fd0482fa6
  extra-dep: true


extra-deps:
- pqueue-1.3.2
- data-msgpack-0.0.8
- time-units-1.0.0
- aeson-extra-0.4.0.0
- recursion-schemes-5
- QuickCheck-2.9.2
- cryptonite-openssl-0.2
- UtilityTM-0.0.4
- serokell-util-0.1.1.1

resolver: lts-7.9

The key part is resolver, which tells what upstream defines as a list of packages that work together. Something like requirements.txt, but crowdsourced. extra-deps then just overrides the list.

I've failed to find a good reason (besides conditionals that can be part of the syntax) in this discussion for a full blown language and we'll regret this decision in 10 years. On paper the Python syntax looks clean, but as soon as people start being creative you'll get the real world. A mess always starts simple and clean.

my 2c.

@FRidh
Copy link

FRidh commented Nov 22, 2016

As developer you want to specify which packages you need, but you might want to keep open the exact version because that changes over time. Even so, you want to share your current environment with others. Therefore, you want to take your initial spec (Pipfile), and then resolve the dependencies to go to your final spec, Pipfile.lock.

@domenkozar, am I correct that in your example stack.yaml would correspond to Pipfile.lock? How would you then deal with the case where versions and such haven't been resolved yet (Pipfile)? That's the resolver then, right? We don't have such a thing for Python yet.

@domenkozar
Copy link

domenkozar commented Nov 22, 2016

@FRidh I'm talking about the lock file indeed. That's the part which matters for any use including development. If you have a static file lock, your tool can update the file if you install a new package.

The Pipefile to me really looks like setup.py without some metadata. As soon as you're adding your dependencies there, people won't want to also duplicate that in setup.py, meaning you'll either end up having setup.py next-generation in some non-standard format (or even worse Python) or people will come up with their own ad-hoc solutions.

Let's say you want to add Django as dependency. Saying pip2 add Django it would update Pipfile with Django dependency and also write Django==1.6.8 to the Pipfile.lock. There, you're already benefiting from the static file yourself. As soon as it's a language you can't do that.

@FRidh
Copy link

FRidh commented Nov 22, 2016

Let's say you want to add Django as dependency. Saying pip2 add Django it would update Pipfile with Django dependency and also write Django==1.6.8 to the Pipfile.lock. There, you're already benefiting from the static file yourself. As soon as it's a language you can't do that.

How would you deal with other interpreter versions and operating systems than the ones you used to pip2 add (see #20)? E.g. when you want in your environment cffi which is provided by PyPy but should be installed separately when using CPython. Or you need in your env enum34. For a reproducible environment you need the exact package set, and the exact package set depends on the interpreter (version) you're using.

@ncoghlan
Copy link
Member

There are only two real options on the table for the abstract dependency declarations: the Python-based DSL and TOML. Other config languages are out at the ecosystem level due either to simple inconsistency with the decision in PEP 518, or else for the specific reasons that PEP 518 chose TOML over the other alternatives that were considered.

As pretty as I think the Python-based DSL is, I'm struggling to see it ending well given the other factors at play (most notably, the fact that we're dealing with a 25+ year old language ecosystem, and an 18+ year old packaging ecosystem):

  • if we don't restrict it, then we can expect lots of folks to treat it the way they treat setup.py today, with all of the problems that entails (i.e. the 18+ year old packaging ecosystem problem)
  • if we do restrict it, then we break the intuitions of a lot of folks that are going to expect a Python-like configuration syntax used in a Python project to actually be Python (i.e. the 25+ year old language ecosystem problem)

User inertia is an incredibly powerful force, and we've seen plenty of ecosystem improvement efforts run into problems by failing to grant it adequate respect (most notably distutils2).

The other problem I haven't seen being considered so far with the Python-based DSL is "Which version of Python?". The examples given so far have all been Python 2/3 compatible, but what if someone wanted to write:

INTERNAL_HOST = "https://our.repository.manager.example.com/python/"
source(f"{INTERNAL_HOST}/production")
source(f"{INTERNAL_HOST}/staging")
source(f"{INTERNAL_HOST}/dev")

People doing things like that would mean that not only does a consumer of the file need to be using Python 3.6 to generate Pipfile.lock, they would also need to be using it for any analysis of the abstract dependencies at all.

By contrast, if the baseline format for abstract dependency declarations is TOML 0.4.0, then folks that really want an imperative format can still do that (e.g. by defining a Pipfile.in or Pipfile.py file), but the clear expectation would be that they still need to generate a declarative Pipfile and check it in. Since the decision to use such tools (or not) would be made on a project-by-project basis, then it would be up to the tool authors to decide whether or not they maintained sole control over the generated Pipfile, or whether they supported interoperability with both other automated tools and manual editing.

(Note that to facilitate the latter, we could potentially borrow a concept from pyproject.toml and require tools to put their automatically managed entity declarations under a tool.<name> namespace. It would require folks actually processing the raw Pipfile to merge the different tool contributions together, but anyone solely consuming Pipfile.lock still wouldn't need to worry about where the different dependencies originated)

@Profpatsch
Copy link

Profpatsch commented Nov 22, 2016

From the Haskell perspective:

Cabal files:

  • - own syntax ↦ only the Cabal library can parse them correctly
  • + not Turing complete ↦ We can parse them and their semantics is well-specified
  • As an escape-hatch, there’s Setup.hs files, which call the Cabal library but could do arbitrary computations. In practice nobody does this, though, except for a very few outliers.

This means that in nixpkgs we are actually able to generate build instructions for all of Hackage in one fell swoop, there are only a few dozen packages that need overrides, which is absolutely amazing.

Stack files:

  • + they use Yaml, so they are immediately parsable in a lot of languages
  • + this enables every developer to trivially write them without having to look up syntax
  • + the specification is very clear and small

By the way XML had all of these properties twenty years ago, but sadly the syntax was too verbose for developers to like it. Now that there are formats like YAML maybe we can get it right this time. It should also be possible to do schema verification with XML tools by translating the YAML files to an XML tree and specifying a RelaxNG for that. Free candy for everyone.

Edit: All points about YAML apply to TOML as well of course. They are basically the same from a high-level view.

@domenkozar
Copy link

@ncoghlan as you mentioned, for those 0.1% of packages needing such complexity they can either:

a) Have hooks to override the staticness and call a function to change flow of the logic

b) Generate the config file in any possible way they want to

I think the important bit here is, do we want to design something easy for 99.9% of packages and have knobs for the rest, or do we say let's support everything by turing completeness.

Developing an DSL sounds like a very bad idea to me. Maybe an EDSL if that was possible in Python. It's hard to get traction and it's not trivial to implement and maintain. Also, you'll have to fight feature freak requests for years to come.

@k0nserv
Copy link

k0nserv commented Nov 22, 2016

@ncoghlan regarding the python version used don't all python project already have an explicit/implicit requirement/contract for what version(s) of python they support? Thus if you have a 3.6 project than 3.6 syntax would be okay in the Pipfilebecause that is the target version of the project.

@Profpatsch
Copy link

For your consideration, a discussion we just had on #nixos.
We are very interested in what happens in language tooling, and we have strong opinions. :)

Profpatsch | Even with the current pythonPackages in nixpkgs cross-testing is a charm.                                                                                                                                                                     
Profpatsch | I remember cross-testing requests2 and 3 in multiple versions against python 2.7.x, 2.7.y and two version 3s.                     
Profpatsch | And it took five minutes.                                                                                                         
Profpatsch | It was awesome.                                                                                                                   
Profpatsch | clever: YES                                                                                                                       
Profpatsch | That’s the whole frigging problem.                                                                                                
Profpatsch | 1. Turing complete configuration.                                                                                                                     
Profpatsch | 2. Side-effecting configuration                                                                                                   
Profpatsch | It just hits us harder with nix.                                                                                                  
    clever | Profpatsch: here is half of what i had to do, to get a single package to compile under a nix sandbox:                             
           | https://github.com/mcpkg/mcpkg-server/blob/master/root.nix#L110-L146                                                              
    clever | the gradle stuff also can specify "the latest version within 1.2.x" as a dependency                                               
    clever | nix cant handle that at all                                                                                                       
Profpatsch | And it’s also the reason why people think nix is not worth the effort.                                                            
    clever | Profpatsch: and here is the entire dep tree, https://github.com/mcpkg/mcpkg-server/blob/master/default.nix                        
  goibhniu | having declarative pipfiles on pypi could be wonderful for generating nix expressions                                             
    clever | i had to do all of that by hand                                                                                                   
    clever | and it only supports a single package                                                                                                                                                                                                                                                                                  
Profpatsch | clever: As long as you can generate that automatically, it is okay.                                                     
Profpatsch | But if you have to check every part manually, it gets tedious.                                                          
    clever | Profpatsch: that was the plan, but there are no tools to load a gradle file and spit the dep tree out recursively       
Profpatsch | That’s why it would be an *absolute disaster* if the pip guys settled on a custom DSL and reinvented all their tooling. 
Profpatsch | Because that’s what you get.                                                                                            
Profpatsch | What’s so hard in using TOML and creating a schema file for that?!?                                                     
Profpatsch | And why is that even a question?                                                                                                                                                   
     ronny | also toml is appearantly a mess                                                                                                                                                                                                      
Profpatsch | clever: Is it okay with you if I cite that discussion here verbatim in the github issue?                                
    clever | Profpatsch: go ahead :)                                                                                                 

@dstufft
Copy link
Member Author

dstufft commented Nov 22, 2016

Pipfiles will not go on PyPI, they are not a replacement for setup.py. It would be rare for something like Nix or Debian or RedHat to actually consume them (pretty much only in the case someone released some software that happened to be written in Python and didn't bother to actually package it up).

@sseg
Copy link

sseg commented Nov 22, 2016

The procedural Python-style format implies a state which is being updated, which allows for syntactically valid but semantically conflicting declarations:

package("requests", ">=2.10", group="staging_env_1")
package("requests", group="staging_env_2")
package("requests", "<2", group="staging_env_1")

By making groupings a higher-level property which collects package declarations the state becomes explicitly declared and conflicts are more obvious:

[groups.staging_env_1.requests]
version = ">=2.10"

[groups.staging_env_2.requests]

[groups.staging_env_1.requests]  # invalid, raises a decode error in toml (duplicate keys error in strict yaml)
version = "<2"

It makes sense to use this arrangement while taking advantage of the built-in set characteristics of the StrictYAML or TOML parsers.

The verbosity introduced by nesting objects in TOML leaves a little to be desired. Perhaps an example in StrictYAML would be more readable.

@asottile
Copy link

My 2c from maintaining some pre-commit hooks that deal with requirements.txt

We maintain several tools which make dealing with application requirements easier:

  • A pre-commit hook (linked above) which sorts requirements to make merge conflicts much less likely
  • A git merge-tool (sorry, closed source for now) which can automatically resolve conflicts in requirements
  • Some additional (will hopefully be OS soon!) tools which assert pinning, encourage pruning, and assist in upgrading.

For the most part, these tools would be easy to adapt to the new lockfile proposed as it is an easy to parse format (json) in a regular structure (and in an ideal world, probably obsoletes the sorting hook since pip would only generate sorted json in a regular structure).

In the current world, these tools are very simple, mostly owing that to the dead simple nature of requirements.txt files (easy to parse, easy to manipulate, syntax is dead simple).

As soon as we move to a python syntax (or python-like syntax) a number of issues arise:

  • There is no standard library way to round-trip parse and rewrite syntax (lib2to3 comes to mind but falls short for a number of reasons and has lead to a bunch of different implementations of ast rewriting, all seemingly with different goals and issues).
  • Automatic reordering is not necessarily a safe operation due to arbitrary code (nor possible?)

I think the wheel format made some great decisions (and really should be a strong precedent) on not making things executable. When comparing to setup.py, environment specifiers over arbitrary code execution keeps metadata as just that, data.

@jo-sm
Copy link

jo-sm commented Nov 22, 2016

I think it's important to consider IDE and human readability implications of the Pipfile. In terms of IDE, if it's a Python-like syntax, the IDE (or text editor like Sublime) would have to embed an interpreter if it doesn't already have one, and would have to hope that the syntax stays valid for the foreseeable future (if, for example, there was a bug due to different Python versions, an end user would be very confused if their IDE gave them errors). This is probably not much of a concern, but something to be aware of. I think one big benefit to making the Pipfile easily parseable is that an IDE could infer requirements and alert the user that they're attempting to require something that isn't in their Pipfile.

In terms of human readability, I know this may be contentious but I believe that a config file like Pipfile should be easily human readable and I don't think having them use a Python like syntax would accomplish this. It would mean another syntax to something that's already been solved over and over (YAML, TOML, INI...) and would just be another overhead to users who want to develop in Python on a bigger project. The Pipfile.lock can be in whatever format necessary because end users aren't going to really be touching it, but the Pipfile should be easily edited.

Whichever config language is chosen, I think choosing a standard one like TOML or YAML would really benefit not only programmers but also project maintainers who will have to deal with Pipfile issues over the course of the project lifetime.

Also, slightly off topic: please make importing from a git repo dead simple. I love that in package.json, you just point it to a Github repo and it basically works, but in requirements.txt you have to prepend the repo with -eand add some egg info for it to not try to install each time you do install using requirements.txt. It's a minor concern but because I push to CI multiple times per day, that installation time matters and onboarding someone who doesn't know about the -e flag and needs to import a private library can be headache prone and cause wasted time during CI and deployments.

@FFX01
Copy link

FFX01 commented Nov 22, 2016

I like the idea of using TOML. I've been playing around with Rust a lot lately and I find that it's package management is really quite pleasant.

I think that the pipfile should be 100% declarative and be in a pre-existing format. I think allowing it to be executable will lead to a slippery slope of dirty hacks in order to get unsupported use cases to work. I think creating a new syntax, whether it be Python inspired or not, reduces readability and increases cognitive load on the developer.

In my personal experience, I find that the standard requirements.txt approach works just fine. I think taking the approach of improving the current work flow is a good idea. Maybe just make requirements.txt more flexible and require it to be written in a common format(like TOML). It would be nice to be able to easily separate dev dependencies from deployment/release dependencies. Perhaps allow for per-environment and per-language version dependencies.

Something along the lines of:

# ~/requirements.TOML

[meta]
name = "my-application"
authors = ["John Doe <jdoe@email.com", "Jane Doe <janed@email.com>"]
license = "BSD2"
version = "1.1.0"

[default]
py-version = ">=3.4"

    [production]
    django = ">=1.9"

    [dev]
    requests = "latest"
    django-debug-toolbar = "latest"

[legacy]
py-version = "<=2.7"

    [production]
    django = ">=1.8"

    [dev]
    requests = "<=2.0.0"
    django-debug-toolbar = "<=1.5.0"

# Possible system dependency declarations? Though, I don't know how we would be
# able to reliably resolve these.
[system-deps]
ffmpeg = "any"

@ncoghlan
Copy link
Member

Regarding redistributor consumption of these kinds of files:

  • distros don't just ship libraries and frameworks, we ship applications and services as well, so these kinds of files may be used directly when creating layered Linux application containers directly from upstream repositories and/or tarballs. In such cases, we need to extract the abstract dependency information in order to automate respinning and retesting those containers for security updates.
  • through things like the Language Server Protocol we aim to make it easier for IDEs to support a wide range of programming languages, and that includes teaching language server implementations to understand the dependency declaration formats used in different ecosystems
  • we (and services like libraries.io) also pull deployment centric files from public and private source control repositories in order to analyse and report on real world component usage.

While we can (and do) handle language specific dependency declaration formats by analysing them with tools written in those languages, it's much nicer (and typically more reliable) when we don't have to and can instead just use standard file format parsers (with TOML and JSON being preferred to XML and YAML due to the security problems arising with the latter pair in their default parsing configurations).

That said, I don't think making our lives easier should be the primary consideration here - as open source consumer facing service providers we pull in the largest share of open source funding, and are hence better equipped than most to adapt to whatever upstream communities decide is in their best interests.

Instead, I think the key point is the one raised by @asottile and others: that regardless of the format chosen Python developers are likely to want to write custom tools to help them manage their Pipfile, just as they've written tools to help manage requirements.txt files.

With TOML, the knowledge of how to perform safe automated file manipulations would be applicable to both Pipfile and pyproject.toml, while a top-to-bottom imperative format would categorically prohibit a lot of automated transformations, while making others significantly more difficult (and Python version dependent).

@pradyunsg
Copy link
Member

pradyunsg commented Jan 11, 2017

This issue got decided. It is going to be a non-executable file.

So, maybe

  • close this issue
  • open a new issue to discuss the exact format of the Pipfile - A DSL, TOML or something else?

Oh, and I'm a +1 to getting a TOML format Pipfile.

  • Not Executable (as decided above)
  • It'll make the packaging UX consistent
    • using the same file format for the human-editable files - pyproject.toml (once "complete") and Pipfile.
  • TOML is well specified and also used by Rust.
    • Has it's issues (with datetimes and all) but considering the whole Rust community is using it, there's enough weight behind it.
  • If someone wants to generate their Pipfile, TOML is pretty easy to generate and there's Open Source packages for handling TOML files already.
    • It strikes a nice balance between human-editable and computer parse/generate able that if anyone wants to be more "nice" they can build their own tooling on top of it.

(Damn, I pressed comment too early, this should have been 2 separate comments.)

@pradyunsg
Copy link
Member

And let's not forget:

It's important to remember that nothing is final until this lands in Pip proper — this is just a library to prototype the idea out and move forward with an implementation as opposed to debating forever :)

@pradyunsg
Copy link
Member

I took the liberty of opening an issue (#46) for discussion on the format of Pipfile. Hopefully, no one minds that.

@kennethreitz
Copy link
Contributor

closing this for #46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests