Add support for TOML config files (`pants.toml`) #9052

Eric-Arellano · 2020-02-02T05:27:58Z

Problem

INI files have some issues that make them frustrating for users:

Gotcha with when to use quotes:
- If the option is a string, you cannot use quotes, e.g. version: ipython==4.8.
- If the value appears in a list, you must use quotes, e.g. config: [".isort.cfg"]
Gotcha with indenting lists and dict values:
- The 7th paragraph of our instructions for new users warns:

Note that the formatting of the plugins list is important; all lines below the plugins: line must be indented by at least one white space to form logical continuation lines. This is standard for Python ini files. See Options for a guide on modifying your pants.ini.

Awkard to append and filter to a list option at the same time:

backend_packages: +[
  'pants.backend.scala.lint.scalafmt'],-[
  'pants.backend.codegen.antlr.java',
  'pants.backend.codegen.antlr.python',
  'pants.backend.codegen.wire.java']

No defined standard for INI. Users may try using INI features not supported or interpreted differently by Python's configparser.
No syntax highlighting or validation from editors, given that there's no standard.

Why TOML?

PEP 518's thoughts on file formats

Python core developers recently researched modern config values to choose the format for pyproject.toml. They compiled their results into PEP 518, comparing INI, JSON, YAML, and TOML.

Reasons they rejected INI:

No standard.

Reasons they rejected JSON:

Difficult for humans to edit, e.g. no support for comments.

Reasons they rejected YAML:

Specification is 86 pages when printed!
Not safe by default due to arbitrary execution of code.
PyYAML is large and has a C extension.

Reasons they liked TOML:

Simple while still being flexible enough.
Rust core developers said they have been "quite happy" with using TOML for Cargo.

Reasons for us to use TOML

Very similar syntax to INI, as TOML was inspired by INI.
- Less churn for users.
Simpler than YAML.
- I don't think we want users using anchors/aliases, for example.
Alignment with Rust and Python ecosystems.

Solution

Implement a config._TomlValues class that behaves identically to _IniValues. Both of these are subclassses of _ConfigValues, meaning that the two config formats may be interchanged without any file outside of config.py knowing which config format was originally used.

We use extremely comprehensive unit tests to ensure that _TomlValues behaves identically to _IniValues.

How we implement interpolation

TOML does not have native support for string interpolation, i.e. substituting %(foo)s for the DEFAULT value foo. So, we provide our own implementation that behaves identically.

How we implement list options

TOML has native list support, but does not have syntax for +[] and -[].

We could require users to use multiline strings, like config = """+[".isort.cfg"],-["bad.cfg"]""". This will work, and in fact the format we convert into in config.py so that the rest of Pants understands the value.

But, config = """+[".isort.cfg"],-["bad.cfg"]""" is clunky, so we introduce syntatic sugar:

[jvm]
options = ["-Xmx1g"]

[isort]
config.add= [".isort.cfg"]
config.remove = ["bad.cfg"]

This sugar allows users to use native TOML lists (more ergonomic than strings) and makes it easy to both add to and remove from a list option at the same time.

How we implement dict options

TOML has native support for dicts through both Tables and Inline Tables.

However, we already use Tables to implement the distinct option scopes. So, if we also used Tables to implement dictionary values, we would introduce an ambiguity whether a table refers to an options scope or a dictionary value. Originally, this PR tried to take this approach, but quickly rejected support for using Tables because it dramatically complicates the solution and would cause a leaky API.

Instead, users should use a multiline string for dictionary options:

[jvm-platform]
default_platform = "java8"
platforms = """
{
  "java7": {"source": 7, "target": 7, "args": []},
  "java8": {"source": 8, "target": 8, "args": []},
  "java9": {"source": 9, "target": 9, "args": []},
}
"""

(Note the indentation - this would error in INI because of being indented too much to the left, but it works in TOML 🎉)

Remaining followup

Update our own usage to use TOML for dogfooding.
- We'll also update Toolchain's config.
Create a script that will automatically convert 80%-90% of INI files to TOML.
- See Add migrate_to_toml_config.py script to automatically update INI config files to TOML #9054.
Update our docs to use TOML.
Deprecate INI.
- Will allow us to remove a lot of awkwardness around the TOML implementation.
- Will allow us to confidently generate suggested config values in help/deprecation/error messages, whereas now we have to deal with things like indentation of lists.

This reverts commit 008bacc.

Eric-Arellano

Reviewers: I recommend starting by looking at 008bacc as an example of what our config files look like with TOML. (Those work now when loaded with --pants-config-files - I only reverted them for a smaller diff.)

Then, move on to config_test.py to see what the PR "Solution" section means by being able to swap between _IniValues and _TomlValues without the rest of Pants having any knowledge of what format was used. I tried to be incredibly exhaustive but possibly missed some edge cases. Please feel free to propose more tests if you can think of any!

Finally, move on to config.py to see how we implement this all. I tried to keep it as readable as possible, but the implementation is a bit tricky due to how TOML stores config as a single nested dictionary + all the edge cases we must handle to behave identically to _IniValues. Lots of recursion and special casing.

Even though the TOML implementation is fairly involved, I hope that that does not result in us deciding to not add TOML support. I tried to encapsulate the complexity, and argue that the complexity in our implementation is justified by the benefits this will bring our users.

Eric-Arellano · 2020-02-02T05:33:27Z

src/python/pants/option/config.py

  def defaults(self) -> Mapping[str, str]:
    return self.parser.defaults()


+_TomlPrimitve = Union[bool, int, float, str]


We don't (yet?) account for TOML's native support of date times. We don't have any options that use that type so I left it off.

Eric-Arellano · 2020-02-02T05:34:25Z

src/python/pants/option/config.py

  def defaults(self) -> Mapping[str, str]:
    return self.parser.defaults()


+_TomlPrimitve = Union[bool, int, float, str]
+_TomlValue = Union[_TomlPrimitve, List[_TomlPrimitve]]


This could also technically be a Dict to represent native Tables, but I left this off because we restrict Tables to solely being used for distinct option scopes/sections, rather than also for dict values. See the PR description.

Eric-Arellano · 2020-02-02T05:36:57Z

src/python/pants/option/config.py

+        pattern=r"%\((?P<interpolated>[a-zA-Z_0-9]*)\)s",
+        repl=r"{\g<interpolated>}",


Generally, it's best practice to use re.compile. But it looks like that's not necessary in modern Python due to automatic caching? https://docs.python.org/3/library/re.html#re.compile

Lmk if I should use re.compile and any tips for where to put that - I'm not sure if it needs to be declared as a module constant, rather than a variable belonging to the method.

If python3 automatically caches regexes, and these regexes are only used once on options parsing and then not needed for the rest of the run, then it's probably best not to re.compile at the module level, so I think this is good as-is.

Eric-Arellano · 2020-02-02T05:38:44Z

src/python/pants/option/config.py

+  def _stringify_val(
+    self, raw_value: _TomlValue, *, interpolate: bool = True, list_prefix: Optional[str] = None,
+  ) -> str:
+    """For parity with configparser, we convert all values back to strings, which allows us to
+    avoid upstream changes to files like parser.py.
+
+    This is clunky. If we drop INI support, we should remove this and use native values."""


Originally, I did not take this approach and preserved the parsed data types. But, I realized for the initial prototype that this complicates things too much because it means we have a leaky API. For example, it would be really hard to preserve the my_list_option.append and my_list_option.filter information.

If we end up removing INI, we could remove this all.

Eric-Arellano · 2020-02-02T05:40:07Z

src/python/pants/option/config.py

+
+  def has_option(self, section: str, option: str) -> bool:
+    try:
+      self.get_value(section, option)


Technically, has_option is doing more work than necessary by relying on the get_value() implementation, e.g. it will interpolate values and _stringify them. But, I think that's fine to make this method much simpler. Otherwise, it has extremely high duplication of get_value().

Eric-Arellano · 2020-02-02T05:42:15Z

src/python/pants/option/config_test.py

+  [a]
+  list: [1, 2, 3, %(answer)s]
+  list2: +[7, 8, 9]
+  list3: -["x", "y", "z"]


This is new. It's meant to test:

Filter syntax

Correctly quoting list members. Things don't work with -[x, y, z].

Eric-Arellano · 2020-02-02T05:42:46Z

src/python/pants/option/config_test.py

+  list: [1, 2, 3, %(answer)s]
+  list2: +[7, 8, 9]
+  list3: -["x", "y", "z"]
+  list4: +[0, 1],-[8, 9]


This is new. It's meant to test that we correctly handle both appending and filtering to a list option at the same time.

Eric-Arellano · 2020-02-02T05:45:10Z

src/python/pants/option/config_test.py

+  @property
+  def default_file1_values(self):
+    return {**super().default_file1_values, "disclaimer": "Let it be known\nthat."}
+
+  @property
+  def expected_file1_options(self):
+    return {
+      **super().expected_file1_options,
+      "a": {
+        **super().expected_file1_options["a"], "list": '["1", "2", "3", "42"]',
+      },
+      "b.nested": {
+        "dict": '{\n  "a": 1,\n  "b": "42",\n  "c": ["42", "42"],\n}'
+      },
+    }


These only vary because of formatting of the multiline strings. We could change the TOML value to mirror the INI values, but I wanted the test TOML file to look like how we'd actually configure things in the wild, i.e. with sane indentation.

src/python/pants/option/config.py

…se `.append` and `.filter`

Without this change, users would need to explicitly set `--pants-config-files=pants.toml` every time. If `pants.toml` exists, we will ignore the default of `pants.ini` unless the user explicitly adds `pants.ini` via `--pants-config-files`.

illicitonion · 2020-02-03T16:34:20Z

The list and dict stuff here is pretty weird and magical...

The list stuff is particularly weird - rather than inventing a new mechanism, I'd be inclined to say that if people want to be doing layered modification of data structures, they should probably be generating a file, rather than doing un-coordinated modifications to accumulated state across files...

For dicts, is there a way we could do dictionaries to be more toml-native and less backwards-compatible with ini files? If we were designing this from scratch, what would dicts look like?

Eric-Arellano · 2020-02-03T16:49:04Z

I'd be inclined to say that if people want to be doing layered modification of data structures, they should probably be generating a file, rather than doing un-coordinated modifications to accumulated state across files...

I don't follow what you mean. Pants has a long history of allowing you to either replace, append, or filter a list value, including from the command line, env vars, and config files, or a mix of the three. See https://www.pantsbuild.org/options.html#list-options

What do you mean by generating a file? From what I can tell, we must keep support for appending to and filtering from a list option in the config file, even if we don't think that feature is a good one because of "un-coordinated modifications to accumulated state across files".

We could make users wrap appends and filters in quotes:

[jvm]
options = ["-Xmx1g"]

[isort]
config = """+[".isort.cfg"],-["bad.cfg"]"""

But that seems much worse to me. We lose the benefits of native support of TOML lists (like syntax highlighting) and there's a gotcha why you use quotes for appending and filtering but not for replacing.

is there a way we could do dictionaries to be more toml-native and less backwards-compatible with ini files?

Yes, there is, through tables and inline tables. That was the original approach I took. But it means that we have no structured way to disambiguate between option scopes vs. dict values, because tables would be used for both purposes. We could make this work, but it means the Config API will become very leaky.

The other reason I didn't like using tables for dict values is that it creates ambiguity for the human reader. Right now, the human reader knows unambiguously that a table header like [cache.java] refers to the option scopes cache.java and not the option --java belonging to the subsystem cache. If we also use tables for dict values, then they'd have table headers like [gen.scrooge.service_deps]and [jvm-platform.platforms]. How do you know from a quick scan if either of these are dict options or are distinct option scopes?

[jvm-platform.platforms]
java7 = { source = 7, target = 7, args = [] }
java8 = { source = 8, target = 8, args = [] }
java9 = { source = 9, target = 9, args = [] }

Worth repeating that the TOML dict support is still an improvement from the INI dict support because of indentation. Possible in TOML:

[jvm-platform]
platforms = """
{
  "java7": {"source": 7, "target": 7, "args": []},
  "java8": {"source": 8, "target": 8, "args": []},
  "java9": {"source": 9, "target": 9, "args": []},
}
"""

You could also the below in TOML. In INI, you must use the below :

[jvm-platform]
platforms = {
    "java7": {"source": 7, "target": 7, "args": []},
    "java8": {"source": 8, "target": 8, "args": []},
    "java9": {"source": 9, "target": 9, "args": []},
  }

benjyw · 2020-02-03T19:44:18Z

The +[".isort.cfg"],-["bad.cfg"] syntax will need to continue work with flags and env vars, so it should probably still be supported in config files, for uniformity. The +/- thing is processed outside of config parsing, in the options processing itself, so I think it should still "just work" here, no?

And at that point, I'm not sure we need the append/filter functionality. It's just more ways to do things. Also, I think +/- works for dict-valued options as well?

benjyw · 2020-02-03T19:49:42Z

Re a conversion script, wouldn't this be easy to get right basically 100% of the time, just by parsing the ini file using the standard python ini parser and writing out the resulting data as properly formatted TOML? Why the lowball estimate of 65%-75%?

Eric-Arellano · 2020-02-03T19:49:50Z

The +[".isort.cfg"],-["bad.cfg"] syntax will need to continue work with flags and env vars, so it should probably still be supported in config files, for uniformity. The +/- thing is processed outside of config parsing, in the options processing itself, so I think it should still "just work" here, no?

FWIT, there's nothing stopping a user from using config: """+['foo'],-['filter']""". In fact, that is what our TOML implementation converts the config.append and config.filter values to in config.py before being passed to the rest of Pants.

config.append and config.filter are simply sugar to allow using native lists, because the quotes are clunky.

Also, I think +/- works for dict-valued options as well?

I wasn't sure of this, as it's not mentioned in our docs. https://www.pantsbuild.org/options.html#dict-options

Regardless, it's irrelevant to offer syntatic sugar for dict appending/filtering because we always use strings to represent dict values, given the ambiguity of sections vs. dict values. So, users can still use this without issue:

[jvm-platform]
platforms = """
+{
  "java7": {"source": 7, "target": 7, "args": []},
  "java8": {"source": 8, "target": 8, "args": []},
  "java9": {"source": 9, "target": 9, "args": []},
}
"""

Eric-Arellano · 2020-02-03T19:51:59Z

Re a conversion script, wouldn't this be easy to get right basically 100% of the time, just by parsing the ini file using the standard python ini parser and writing out the resulting data as properly formatted TOML? Why the lowball estimate of 65%-75%?

We could do this, but we would end up stripping all comments and whitespace.

See #9054 for the implementation. After running it on all of Pants' INI files, I estimate it at converting 80-85%, rather than 60-70%. The remaining issues are pretty simple to fix thanks to https://www.toml-lint.com/ and validation from editors like Pycharm.

benjyw · 2020-02-03T19:52:32Z

Regardless of niggles on the list stuff, this change is overall awesome! .ini is awful.

benjyw · 2020-02-03T19:55:31Z

I'm also fine not having such a script, TBH. It is very little work to manually port an ini file to toml. I wouldn't put too much effort into perfecting it.

…

On Mon, Feb 3, 2020 at 11:52 AM Eric Arellano ***@***.***> wrote: Re a conversion script, wouldn't this be easy to get right basically 100% of the time, just by parsing the ini file using the standard python ini parser and writing out the resulting data as properly formatted TOML? Why the lowball estimate of 65%-75%? We could do this, but we would end up stripping all comments and whitespace. See #9054 <#9054> for the implementation. After running it on all of Pants' INI files, I estimate it at converting 80-85%, rather than 60-70%. The remaining issues are pretty simple to fix thanks to https://www.toml-lint.com/ and validation from editors like Pycharm. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#9052?email_source=notifications&email_token=AAD5F7HTY33AHCWXKI2ZQFDRBBYWBA5CNFSM4KOXSA4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKVFI7Y#issuecomment-581588095>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAD5F7EBV3OW56A6DUWX6YDRBBYWBANCNFSM4KOXSA4A> .

Eric-Arellano · 2020-02-03T19:57:56Z

I'm also fine not having such a script, TBH. It is very little work to
manually port an ini file to toml. I wouldn't put too much effort into
perfecting it.

Hence settling for 80% conversion, rather than trying to get to 100% conversion. Pareto principle.

Originally, I converted all of our INI files by hand and it was frustratingly tedious. Took probably 10-15 minutes and lots of copy and paste + find and replace. With the script in #9054, took about 3 minutes and only fixing a couple small, remaining issues with multiline options.

codealchemy

👍

illicitonion · 2020-02-04T13:00:57Z

(Really important note: I'm not saying we shouldn't do the things currently in this PR, and getting away from ini is a great thing, I'm just trying to explore the design space - the easiest way to not have to support and document and teach custom conventions is to simply not have them!)

I'd be inclined to say that if people want to be doing layered modification of data structures, they should probably be generating a file, rather than doing un-coordinated modifications to accumulated state across files...

I don't follow what you mean. Pants has a long history of allowing you to either replace, append, or filter a list value, including from the command line, env vars, and config files, or a mix of the three. See https://www.pantsbuild.org/options.html#list-options

What do you mean by generating a file? From what I can tell, we must keep support for appending to and filtering from a list option in the config file, even if we don't think that feature is a good one because of "un-coordinated modifications to accumulated state across files".

We could make users wrap appends and filters in quotes:
[jvm]
options = ["-Xmx1g"]

[isort]
config = """+[".isort.cfg"],-["bad.cfg"]"""
But that seems much worse to me. We lose the benefits of native support of TOML lists (like syntax highlighting) and there's a gotcha why you use quotes for appending and filtering but not for replacing.

Yeah, I understand that we have that history. I'm trying to look at this from a perspective of "If I approached Pants for the first time (without the history of knowing about how we had ini files), and I was reading the documentation of how to write a pants.toml file, how would I feel?" and this feels to me like pretty magical non-standard syntax... I've not used another tool which added its own DSL for flexible list manipulation like this inside config files, and I'm not sure what's unique about Pants in this respect. So my real question here is: Is this feature so important that we want to diverge from the standard for it?

If we were designing this from scratch, for a completely new system, I'm not sure what we would choose it to look like; personally, I'd probably be proposing that we read a standard file with no custom DSLs, we can read multiple files if they obviously combine in a standard way (either replacing or extending lists) and that if someone wanted to combine two files in a non-standard way, they should use an (ideally simple) programming language to read the two files and merge them in their non-standard way, outputting a file they can pass to pants (I'd generally recommend something like https://jsonnet.org/ - I suspect Danny would probably recommend coffeescript), rather than inventing a DSL to serialise inside the config file.

The +[".isort.cfg"],-["bad.cfg"] syntax will need to continue work with flags and env vars, so it should probably still be supported in config files, for uniformity. The +/- thing is processed outside of config parsing, in the options processing itself, so I think it should still "just work" here, no?

And at that point, I'm not sure we need the append/filter functionality. It's just more ways to do things. Also, I think +/- works for dict-valued options as well?

This sounds like a pretty reasonable middle-ground of encoding a DSL in magic values (which is something we already document/teach), without modifying the structure of keys, if it works :)

is there a way we could do dictionaries to be more toml-native and less backwards-compatible with ini files?

Yes, there is, through tables and inline tables. That was the original approach I took. But it means that we have no structured way to disambiguate between option scopes vs. dict values, because tables would be used for both purposes. We could make this work, but it means the Config API will become very leaky.

The other reason I didn't like using tables for dict values is that it creates ambiguity for the human reader. Right now, the human reader knows unambiguously that a table header like [cache.java] refers to the option scopes cache.java and not the option --java belonging to the subsystem cache. If we also use tables for dict values, then they'd have table headers like [gen.scrooge.service_deps]and [jvm-platform.platforms]. How do you know from a quick scan if either of these are dict options or are distinct option scopes?
[jvm-platform.platforms]
java7 = { source = 7, target = 7, args = [] }
java8 = { source = 8, target = 8, args = [] }
java9 = { source = 9, target = 9, args = [] }
Worth repeating that the TOML dict support is still an improvement from the INI dict support because of indentation. Possible in TOML:
[jvm-platform]
platforms = """
{
  "java7": {"source": 7, "target": 7, "args": []},
  "java8": {"source": 8, "target": 8, "args": []},
  "java9": {"source": 9, "target": 9, "args": []},
}
"""
You could also the below in TOML. In INI, you must use the below :
[jvm-platform]
platforms = {
    "java7": {"source": 7, "target": 7, "args": []},
    "java8": {"source": 8, "target": 8, "args": []},
    "java9": {"source": 9, "target": 9, "args": []},
  }

Yeah, there's definitely not a clear winner between the two in my mind... I think I prefer leaning into the toml-native way of doing this, rather than using magic multi-line strings, but I haven't convinced myself either way...

I guess the multiline-strings approach is aiming to provide a form of consistency when reading ("I can easily grok what the sections are") at the expense of familiarity when reading ("I understand the general format I'm reading"). The toml-standard way feels like it's better for writing ("I understand what to write"), but between reading and writing, we should probably be optimising for reading. So I guess the question is: Do we want to favour people reading the file as toml, or people reading the file as a set of pants option scopes?

Eric-Arellano · 2020-02-04T17:01:02Z

(Really important note: I'm not saying we shouldn't do the things currently in this PR, and getting away from ini is a great thing, I'm just trying to explore the design space - the easiest way to not have to support and document and teach custom conventions is to simply not have them!)

Thanks for saying this and, more importantly, for asking great questions! This is a big decision and I appreciate you making sure we do due diligence.

So my real question here is: Is this feature so important that we want to diverge from the standard for it?

We're not diverging very far from the specification. In fact, TOML recommends using this my_option.append and my_option.filter syntax (or my_option.add and my_option.remove) in this issue: toml-lang/toml#644 (comment)

This sounds like a pretty reasonable middle-ground of encoding a DSL in magic values (which is something we already document/teach), without modifying the structure of keys, if it works :)

It's fine to allow users to use my_list_option: "+[1],-[0]", given that that's what we convert the .append and .filter sugar to anyways. I do strongly advocate providing the .append and .filter sugar for those who want it, though, as native TOML lists are far more ergonomic than quoted strings.

I guess the multiline-strings approach is aiming to provide a form of consistency when reading ("I can easily grok what the sections are") at the expense of familiarity when reading ("I understand the general format I'm reading"). The toml-standard way feels like it's better for writing ("I understand what to write"), but between reading and writing, we should probably be optimising for reading. So I guess the question is: Do we want to favour people reading the file as toml, or people reading the file as a set of pants option scopes?

Good analysis! I do agree with erring on the side of favoring readers over writers. Generally, only one person or team writes to pants.toml, whereas likely every engineer at the org will end up reading pants.toml at least once.

I think "reading the file as a set of Pants option scopes" is the correct mental model, here. It's ~an implementation detail that this config is implemented in TOML. Really, what our config is is a way to set options for their corresponding option scopes.

I remember when learning Pants I was a bit confused with the idea of options subscopes, especially because command line args make them appear ambiguous: does --cache-java-timeout mean --java-timeout belonging to the scope cache or --timeout belonging to the scope cache.java? Currently, a big advantage of our config files is that they unambiguously distinguish between option scopes vs. options. I'm afraid to lose that distinction.

benjyw · 2020-02-04T18:30:16Z

Now that you mention it, I do like add and remove more than append and filter, because append implies that the value must be a single item, but we allow it to be a list I think, and filter is a rather vague term (are we filtering in or filtering out?)

Eric-Arellano · 2020-02-04T18:32:37Z

add and remove sound great to me! @cosmicexplorer had proposed that last week over DM. I only was using append and filter due to the language used in our Options docs, which we can change, of course.

Eric-Arellano · 2020-02-06T00:22:36Z

With us now on 1.26.x, this is now ready to review (and hopefully to land).

stuhood · 2020-02-06T17:46:27Z

So my real question here is: Is this feature so important that we want to diverge from the standard for it?

I think that this kind of "deviation from the standard" (similar to putting # keys in json!) is a convention rather than a deviation from the standard, and relatively harmless. See toml-lang/toml#36 where the toml people refer to this kind of feature. The real challenge is just ensuring that you can document it.

The original motivating usecases for adding/removing still stand: allowing for adding or removing from a default value rather than replacing it is useful (...particularly adding... I could maybe do without removing).

I think that I'm ok with moving toward toml, but I think it's important to track how much pain we're causing for existing users, and to minimize it with automation. So making sure that #9054 gets well tested will be critical.

Thanks!

Eric-Arellano · 2020-02-06T21:17:49Z

I think that I'm ok with moving toward toml, but I think it's important to track how much pain we're causing for existing users, and to minimize it with automation. So making sure that #9054 gets well tested will be critical.

Agreed. I tried to make the tests pretty comprehensive in that PR, along with manually testing on Pants and Toolchain. It'd be very helpful if someone from Twitter could try running it on your config files and confirm that conversion takes less than 5-10 minutes.

stuhood · 2020-02-07T01:35:58Z

I think that I'm ok with moving toward toml, but I think it's important to track how much pain we're causing for existing users, and to minimize it with automation. So making sure that #9054 gets well tested will be critical.

Agreed. I tried to make the tests pretty comprehensive in that PR, along with manually testing on Pants and Toolchain. It'd be very helpful if someone from Twitter could try running it on your config files and confirm that conversion takes less than 5-10 minutes.

Realistically, this won't be before next week. But yes: will do.

blorente

Feel free to ignore the naming nits!

Great work, thanks!

blorente · 2020-02-07T16:07:10Z

src/python/pants/option/config.py

@@ -182,10 +193,11 @@ def get_source_for_option(self, section: str, option: str) -> Optional[str]:
 class _ConfigValues(ABC):
  """Encapsulates resolving the actual config values specified by the user's config file.

-  Beyond providing better encapsulation, this allows us to support alternative config file formats
-  in the future if we ever decide to support formats other than INI.
+  Due to encapsulation, this allows us to support both TOML and INI config files without any of


Futureproofing that actually worked! Wooho!

src/python/pants/option/config.py

src/python/pants/option/config_test.py

src/python/pants/option/config.py

illicitonion

The general approach here looks solid to me, though I'm unlikely to have time to closely review any of the code :) Thanks!

Borja suggested created a File1 and File2 class. This works well. It centralizes the raw config content with the expected parsed values. It also makes it easier to add a test case for TOML being the main config and INI being optional.

jsirois · 2020-02-21T15:52:28Z

Noting a small inaccuracy:

TOML does not have native support for string interpolation, i.e. substituting %(foo)s for the DEFAULT value foo. So, we provide our own implementation that behaves identically.

Our impl only behaves identically for string values - no others. ConfigParser supports substitution of any value since it treats all values as strings. I ran into this in the PEX upgrade PR where I had factored a shared parallelism integer value up into [DEFAULT]. In toml I'll just copypasta.

Eric-Arellano · 2020-02-21T16:02:40Z

Our impl only behaves identically for string values - no others. ConfigParser supports substitution of any value since it treats all values as strings. I ran into this in the PEX upgrade PR where I had factored a shared parallelism integer value up into [DEFAULT]. In toml I'll just copypasta.

This is true. But, this should work:

[DEFAULT]
num_workers = 4

[python-setup]
parallelism = "%(num_workers)"

Why? We stringify all TOML values before passing to parser.py, at the moment, so that TOML behaves identically to INI, where every value is a string.

pants/src/python/pants/option/config.py

Lines 350 to 385 in d690ab2

    
             def _stringify_val( 
        
               self, 
        
               raw_value: _TomlValue, 
        
               *, 
        
               option: str, 
        
               section: str, 
        
               section_values: Dict, 
        
               interpolate: bool = True, 
        
               list_prefix: Optional[str] = None, 
        
             ) -> str: 
        
               """For parity with configparser, we convert all values back to strings, which allows us to avoid 
        
               upstream changes to files like parser.py. 
        
               This is clunky. If we drop INI support, we should remove this and use native values (although we 
        
               must still support interpolation). 
        
               """ 
        
               possibly_interpolate = partial( 
        
                 self._possibly_interpolate_value, 
        
                 option=option, 
        
                 section=section, 
        
                 section_values=section_values, 
        
               ) 
        
               if isinstance(raw_value, str): 
        
                 return possibly_interpolate(raw_value) if interpolate else raw_value 
        
               if isinstance(raw_value, list): 
        
                 def stringify_list_member(member: _TomlPrimitve) -> str: 
        
                   if not isinstance(member, str): 
        
                     return str(member) 
        
                   interpolated_member = possibly_interpolate(member) if interpolate else member 
        
                   return f'"{interpolated_member}"' 
        
                 list_members = ", ".join(stringify_list_member(member) for member in raw_value) 
        
                 return f"{list_prefix or ''}[{list_members}]" 
        
               return str(raw_value)

jsirois · 2020-02-21T16:13:50Z

Erm. OK - thanks. I'm not honestly convinced the toml switch was a win - our use of it is now just about as unintuitive / unspecified as ini.

cosmicexplorer · 2020-02-21T16:18:08Z

What do you think might be a more appropriate reason for / feature of toml for use to use? Apologies if it’s stated above, I attempted to check the thread.

…

On Fri, Feb 21, 2020 at 08:13 John Sirois ***@***.***> wrote: Erm. OK - thanks. I'm not honestly convinced the toml switch was a win - our use of it is now just about as unintuitive / unspecified as ini. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9052?email_source=notifications&email_token=AAJ6UTZLAYFVGLLBJ5E5P6LRD74T7A5CNFSM4KOXSA4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMTHCNQ#issuecomment-589721910>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJ6UT3ITCGHEP4R3BL5S3DRD74T7ANCNFSM4KOXSA4A> .

jsirois · 2020-02-21T16:21:25Z

I think Eric did the best you can do with the impedance mismatch between an untyped config format and a typed one. The tradeoff is our use of toml no longer conforms to the types you expect in the toml file itself when you want to substitute. That use case is probably rare, so its probably a fine tradeoff. I just meant to point out that our use of toml deviates from standard toml files enough at this point that any argument for switching to toml for standardization benefits is probably washed away at this point.

Onward.

As decided in #9052, TOML brings several benefits over INI config files (despite some weirdness around things like dict values). This updates our docs to refer to `pants.toml`. A followup will deprecate `pants.ini`.

Eric-Arellano added 2 commits February 1, 2020 21:01

Add .toml config files

008bacc

Revert "Add .toml config files"

052cd22

This reverts commit 008bacc.

Eric-Arellano requested review from stuhood, jsirois, benjyw, illicitonion, cosmicexplorer, codealchemy and blorente February 2, 2020 05:27

Eric-Arellano commented Feb 2, 2020

View reviewed changes

Add support for TOML

7306439

Eric-Arellano force-pushed the pants-toml branch from 51a4bea to 7306439 Compare February 2, 2020 06:12

gshuflin reviewed Feb 2, 2020

View reviewed changes

src/python/pants/option/config.py Outdated Show resolved Hide resolved

Eric-Arellano added 3 commits February 2, 2020 10:26

Update docstring for _ConfigValues

d0db31b

Fix and test a new edge case: a section with only list options that u…

43cd952

…se `.append` and `.filter`

Use pants.toml as default config if it exists

fb49453

Without this change, users would need to explicitly set `--pants-config-files=pants.toml` every time. If `pants.toml` exists, we will ignore the default of `pants.ini` unless the user explicitly adds `pants.ini` via `--pants-config-files`.

Eric-Arellano force-pushed the pants-toml branch from f50b6ec to fb49453 Compare February 2, 2020 18:27

codealchemy approved these changes Feb 3, 2020

View reviewed changes

Eric-Arellano mentioned this pull request Feb 4, 2020

Add migrate_to_toml_config.py script to automatically update INI config files to TOML #9054

Merged

Use list.add and list.remove, rather than list.append and list.filter

cfe295f

Eric-Arellano added 2 commits February 5, 2020 19:53

Merge branch 'master' of github.com:pantsbuild/pants into pants-toml

5b4044d

Fix bad leftover list.filter code

e87f156

blorente approved these changes Feb 7, 2020

View reviewed changes

illicitonion approved these changes Feb 7, 2020

View reviewed changes

Eric-Arellano added 2 commits February 7, 2020 12:01

Review comments on config.py

1f5d8e9

Refactor and expand config_test.py

5469e27

Borja suggested created a File1 and File2 class. This works well. It centralizes the raw config content with the expected parsed values. It also makes it easier to add a test case for TOML being the main config and INI being optional.

gshuflin approved these changes Feb 7, 2020

View reviewed changes

Eric-Arellano merged commit 4365d03 into pantsbuild:master Feb 7, 2020

Eric-Arellano deleted the pants-toml branch February 7, 2020 21:04

Eric-Arellano mentioned this pull request Feb 11, 2020

Use pants.toml internally #9090

Merged

Eric-Arellano mentioned this pull request Feb 23, 2020

Update docs to use pants.toml #9165

Merged

Eric-Arellano mentioned this pull request Feb 27, 2020

Designate pants.ini as legacy, but still supported, in docs #9194

Merged

		pattern=r"%\((?P<interpolated>[a-zA-Z_0-9]*)\)s",
		repl=r"{\g<interpolated>}",

Add support for TOML config files (pants.toml) #9052

Add support for TOML config files (pants.toml) #9052

Conversation

Eric-Arellano commented Feb 2, 2020 • edited Loading

Problem

Why TOML?

PEP 518's thoughts on file formats

Reasons for us to use TOML

Solution

How we implement interpolation

How we implement list options

How we implement dict options

Remaining followup

Eric-Arellano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

illicitonion commented Feb 3, 2020

Eric-Arellano commented Feb 3, 2020 • edited Loading

benjyw commented Feb 3, 2020

benjyw commented Feb 3, 2020

Eric-Arellano commented Feb 3, 2020

Eric-Arellano commented Feb 3, 2020

benjyw commented Feb 3, 2020

benjyw commented Feb 3, 2020 via email

Eric-Arellano commented Feb 3, 2020

codealchemy left a comment

Choose a reason for hiding this comment

illicitonion commented Feb 4, 2020

Eric-Arellano commented Feb 4, 2020

benjyw commented Feb 4, 2020

Eric-Arellano commented Feb 4, 2020

Eric-Arellano commented Feb 6, 2020

stuhood commented Feb 6, 2020

Eric-Arellano commented Feb 6, 2020

stuhood commented Feb 7, 2020

blorente left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

illicitonion left a comment

Choose a reason for hiding this comment

jsirois commented Feb 21, 2020

Eric-Arellano commented Feb 21, 2020

jsirois commented Feb 21, 2020

cosmicexplorer commented Feb 21, 2020 via email

jsirois commented Feb 21, 2020

Add support for TOML config files (`pants.toml`) #9052

Add support for TOML config files (`pants.toml`) #9052

Eric-Arellano commented Feb 2, 2020 •

edited

Loading

Eric-Arellano commented Feb 3, 2020 •

edited

Loading