Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TOML config files (pants.toml) #9052

Merged
merged 11 commits into from
Feb 7, 2020

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented Feb 2, 2020

Problem

INI files have some issues that make them frustrating for users:

  • Gotcha with when to use quotes:
    • If the option is a string, you cannot use quotes, e.g. version: ipython==4.8.
    • If the value appears in a list, you must use quotes, e.g. config: [".isort.cfg"]
  • Gotcha with indenting lists and dict values:
    • The 7th paragraph of our instructions for new users warns:

Note that the formatting of the plugins list is important; all lines below the plugins: line must be indented by at least one white space to form logical continuation lines. This is standard for Python ini files. See Options for a guide on modifying your pants.ini.

  • Awkard to append and filter to a list option at the same time:
backend_packages: +[
  'pants.backend.scala.lint.scalafmt'],-[
  'pants.backend.codegen.antlr.java',
  'pants.backend.codegen.antlr.python',
  'pants.backend.codegen.wire.java']
  • No defined standard for INI. Users may try using INI features not supported or interpreted differently by Python's configparser.
  • No syntax highlighting or validation from editors, given that there's no standard.

Why TOML?

PEP 518's thoughts on file formats

Python core developers recently researched modern config values to choose the format for pyproject.toml. They compiled their results into PEP 518, comparing INI, JSON, YAML, and TOML.

Reasons they rejected INI:

  • No standard.

Reasons they rejected JSON:

  • Difficult for humans to edit, e.g. no support for comments.

Reasons they rejected YAML:

  • Specification is 86 pages when printed!
  • Not safe by default due to arbitrary execution of code.
  • PyYAML is large and has a C extension.

Reasons they liked TOML:

  • Simple while still being flexible enough.
  • Rust core developers said they have been "quite happy" with using TOML for Cargo.

Reasons for us to use TOML

  • Very similar syntax to INI, as TOML was inspired by INI.
    • Less churn for users.
  • Simpler than YAML.
    • I don't think we want users using anchors/aliases, for example.
  • Alignment with Rust and Python ecosystems.

Solution

Implement a config._TomlValues class that behaves identically to _IniValues. Both of these are subclassses of _ConfigValues, meaning that the two config formats may be interchanged without any file outside of config.py knowing which config format was originally used.

We use extremely comprehensive unit tests to ensure that _TomlValues behaves identically to _IniValues.

How we implement interpolation

TOML does not have native support for string interpolation, i.e. substituting %(foo)s for the DEFAULT value foo. So, we provide our own implementation that behaves identically.

How we implement list options

TOML has native list support, but does not have syntax for +[] and -[].

We could require users to use multiline strings, like config = """+[".isort.cfg"],-["bad.cfg"]""". This will work, and in fact the format we convert into in config.py so that the rest of Pants understands the value.

But, config = """+[".isort.cfg"],-["bad.cfg"]""" is clunky, so we introduce syntatic sugar:

[jvm]
options = ["-Xmx1g"]

[isort]
config.add= [".isort.cfg"]
config.remove = ["bad.cfg"]

This sugar allows users to use native TOML lists (more ergonomic than strings) and makes it easy to both add to and remove from a list option at the same time.

How we implement dict options

TOML has native support for dicts through both Tables and Inline Tables.

However, we already use Tables to implement the distinct option scopes. So, if we also used Tables to implement dictionary values, we would introduce an ambiguity whether a table refers to an options scope or a dictionary value. Originally, this PR tried to take this approach, but quickly rejected support for using Tables because it dramatically complicates the solution and would cause a leaky API.

Instead, users should use a multiline string for dictionary options:

[jvm-platform]
default_platform = "java8"
platforms = """
{
  "java7": {"source": 7, "target": 7, "args": []},
  "java8": {"source": 8, "target": 8, "args": []},
  "java9": {"source": 9, "target": 9, "args": []},
}
"""

(Note the indentation - this would error in INI because of being indented too much to the left, but it works in TOML 🎉)

Remaining followup

  1. Update our own usage to use TOML for dogfooding.
    • We'll also update Toolchain's config.
  2. Create a script that will automatically convert 80%-90% of INI files to TOML.
  3. Update our docs to use TOML.
  4. Deprecate INI.
    • Will allow us to remove a lot of awkwardness around the TOML implementation.
    • Will allow us to confidently generate suggested config values in help/deprecation/error messages, whereas now we have to deal with things like indentation of lists.

Copy link
Contributor Author

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewers: I recommend starting by looking at 008bacc as an example of what our config files look like with TOML. (Those work now when loaded with --pants-config-files - I only reverted them for a smaller diff.)

Then, move on to config_test.py to see what the PR "Solution" section means by being able to swap between _IniValues and _TomlValues without the rest of Pants having any knowledge of what format was used. I tried to be incredibly exhaustive but possibly missed some edge cases. Please feel free to propose more tests if you can think of any!

Finally, move on to config.py to see how we implement this all. I tried to keep it as readable as possible, but the implementation is a bit tricky due to how TOML stores config as a single nested dictionary + all the edge cases we must handle to behave identically to _IniValues. Lots of recursion and special casing.

Even though the TOML implementation is fairly involved, I hope that that does not result in us deciding to not add TOML support. I tried to encapsulate the complexity, and argue that the complexity in our implementation is justified by the benefits this will bring our users.

def defaults(self) -> Mapping[str, str]:
return self.parser.defaults()


_TomlPrimitve = Union[bool, int, float, str]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't (yet?) account for TOML's native support of date times. We don't have any options that use that type so I left it off.

def defaults(self) -> Mapping[str, str]:
return self.parser.defaults()


_TomlPrimitve = Union[bool, int, float, str]
_TomlValue = Union[_TomlPrimitve, List[_TomlPrimitve]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also technically be a Dict to represent native Tables, but I left this off because we restrict Tables to solely being used for distinct option scopes/sections, rather than also for dict values. See the PR description.

Comment on lines +296 to +297
pattern=r"%\((?P<interpolated>[a-zA-Z_0-9]*)\)s",
repl=r"{\g<interpolated>}",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, it's best practice to use re.compile. But it looks like that's not necessary in modern Python due to automatic caching? https://docs.python.org/3/library/re.html#re.compile

Lmk if I should use re.compile and any tips for where to put that - I'm not sure if it needs to be declared as a module constant, rather than a variable belonging to the method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If python3 automatically caches regexes, and these regexes are only used once on options parsing and then not needed for the rest of the run, then it's probably best not to re.compile at the module level, so I think this is good as-is.

Comment on lines +312 to +317
def _stringify_val(
self, raw_value: _TomlValue, *, interpolate: bool = True, list_prefix: Optional[str] = None,
) -> str:
"""For parity with configparser, we convert all values back to strings, which allows us to
avoid upstream changes to files like parser.py.

This is clunky. If we drop INI support, we should remove this and use native values."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I did not take this approach and preserved the parsed data types. But, I realized for the initial prototype that this complicates things too much because it means we have a leaky API. For example, it would be really hard to preserve the my_list_option.append and my_list_option.filter information.

If we end up removing INI, we could remove this all.


def has_option(self, section: str, option: str) -> bool:
try:
self.get_value(section, option)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, has_option is doing more work than necessary by relying on the get_value() implementation, e.g. it will interpolate values and _stringify them. But, I think that's fine to make this method much simpler. Otherwise, it has extremely high duplication of get_value().

[a]
list: [1, 2, 3, %(answer)s]
list2: +[7, 8, 9]
list3: -["x", "y", "z"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new. It's meant to test:

  1. Filter syntax
  2. Correctly quoting list members. Things don't work with -[x, y, z].

list: [1, 2, 3, %(answer)s]
list2: +[7, 8, 9]
list3: -["x", "y", "z"]
list4: +[0, 1],-[8, 9]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new. It's meant to test that we correctly handle both appending and filtering to a list option at the same time.

Comment on lines 324 to 338
@property
def default_file1_values(self):
return {**super().default_file1_values, "disclaimer": "Let it be known\nthat."}

@property
def expected_file1_options(self):
return {
**super().expected_file1_options,
"a": {
**super().expected_file1_options["a"], "list": '["1", "2", "3", "42"]',
},
"b.nested": {
"dict": '{\n "a": 1,\n "b": "42",\n "c": ["42", "42"],\n}'
},
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These only vary because of formatting of the multiline strings. We could change the TOML value to mirror the INI values, but I wanted the test TOML file to look like how we'd actually configure things in the wild, i.e. with sane indentation.

Without this change, users would need to explicitly set `--pants-config-files=pants.toml` every time.

If `pants.toml` exists, we will ignore the default of `pants.ini` unless the user explicitly adds `pants.ini` via `--pants-config-files`.
@illicitonion
Copy link
Contributor

The list and dict stuff here is pretty weird and magical...

The list stuff is particularly weird - rather than inventing a new mechanism, I'd be inclined to say that if people want to be doing layered modification of data structures, they should probably be generating a file, rather than doing un-coordinated modifications to accumulated state across files...

For dicts, is there a way we could do dictionaries to be more toml-native and less backwards-compatible with ini files? If we were designing this from scratch, what would dicts look like?

@Eric-Arellano
Copy link
Contributor Author

Eric-Arellano commented Feb 3, 2020

I'd be inclined to say that if people want to be doing layered modification of data structures, they should probably be generating a file, rather than doing un-coordinated modifications to accumulated state across files...

I don't follow what you mean. Pants has a long history of allowing you to either replace, append, or filter a list value, including from the command line, env vars, and config files, or a mix of the three. See https://www.pantsbuild.org/options.html#list-options

What do you mean by generating a file? From what I can tell, we must keep support for appending to and filtering from a list option in the config file, even if we don't think that feature is a good one because of "un-coordinated modifications to accumulated state across files".

We could make users wrap appends and filters in quotes:

[jvm]
options = ["-Xmx1g"]

[isort]
config = """+[".isort.cfg"],-["bad.cfg"]"""

But that seems much worse to me. We lose the benefits of native support of TOML lists (like syntax highlighting) and there's a gotcha why you use quotes for appending and filtering but not for replacing.


is there a way we could do dictionaries to be more toml-native and less backwards-compatible with ini files?

Yes, there is, through tables and inline tables. That was the original approach I took. But it means that we have no structured way to disambiguate between option scopes vs. dict values, because tables would be used for both purposes. We could make this work, but it means the Config API will become very leaky.

The other reason I didn't like using tables for dict values is that it creates ambiguity for the human reader. Right now, the human reader knows unambiguously that a table header like [cache.java] refers to the option scopes cache.java and not the option --java belonging to the subsystem cache. If we also use tables for dict values, then they'd have table headers like [gen.scrooge.service_deps]and [jvm-platform.platforms]. How do you know from a quick scan if either of these are dict options or are distinct option scopes?

[jvm-platform.platforms]
java7 = { source = 7, target = 7, args = [] }
java8 = { source = 8, target = 8, args = [] }
java9 = { source = 9, target = 9, args = [] }

Worth repeating that the TOML dict support is still an improvement from the INI dict support because of indentation. Possible in TOML:

[jvm-platform]
platforms = """
{
  "java7": {"source": 7, "target": 7, "args": []},
  "java8": {"source": 8, "target": 8, "args": []},
  "java9": {"source": 9, "target": 9, "args": []},
}
"""

You could also the below in TOML. In INI, you must use the below :

[jvm-platform]
platforms = {
    "java7": {"source": 7, "target": 7, "args": []},
    "java8": {"source": 8, "target": 8, "args": []},
    "java9": {"source": 9, "target": 9, "args": []},
  }

@benjyw
Copy link
Contributor

benjyw commented Feb 3, 2020

The +[".isort.cfg"],-["bad.cfg"] syntax will need to continue work with flags and env vars, so it should probably still be supported in config files, for uniformity. The +/- thing is processed outside of config parsing, in the options processing itself, so I think it should still "just work" here, no?

And at that point, I'm not sure we need the append/filter functionality. It's just more ways to do things. Also, I think +/- works for dict-valued options as well?

@benjyw
Copy link
Contributor

benjyw commented Feb 3, 2020

Re a conversion script, wouldn't this be easy to get right basically 100% of the time, just by parsing the ini file using the standard python ini parser and writing out the resulting data as properly formatted TOML? Why the lowball estimate of 65%-75%?

@Eric-Arellano
Copy link
Contributor Author

The +[".isort.cfg"],-["bad.cfg"] syntax will need to continue work with flags and env vars, so it should probably still be supported in config files, for uniformity. The +/- thing is processed outside of config parsing, in the options processing itself, so I think it should still "just work" here, no?

FWIT, there's nothing stopping a user from using config: """+['foo'],-['filter']""". In fact, that is what our TOML implementation converts the config.append and config.filter values to in config.py before being passed to the rest of Pants.

config.append and config.filter are simply sugar to allow using native lists, because the quotes are clunky.


Also, I think +/- works for dict-valued options as well?

I wasn't sure of this, as it's not mentioned in our docs. https://www.pantsbuild.org/options.html#dict-options

Regardless, it's irrelevant to offer syntatic sugar for dict appending/filtering because we always use strings to represent dict values, given the ambiguity of sections vs. dict values. So, users can still use this without issue:

[jvm-platform]
platforms = """
+{
  "java7": {"source": 7, "target": 7, "args": []},
  "java8": {"source": 8, "target": 8, "args": []},
  "java9": {"source": 9, "target": 9, "args": []},
}
"""

@Eric-Arellano
Copy link
Contributor Author

Re a conversion script, wouldn't this be easy to get right basically 100% of the time, just by parsing the ini file using the standard python ini parser and writing out the resulting data as properly formatted TOML? Why the lowball estimate of 65%-75%?

We could do this, but we would end up stripping all comments and whitespace.

See #9054 for the implementation. After running it on all of Pants' INI files, I estimate it at converting 80-85%, rather than 60-70%. The remaining issues are pretty simple to fix thanks to https://www.toml-lint.com/ and validation from editors like Pycharm.

@benjyw
Copy link
Contributor

benjyw commented Feb 3, 2020

Regardless of niggles on the list stuff, this change is overall awesome! .ini is awful.

@benjyw
Copy link
Contributor

benjyw commented Feb 3, 2020 via email

@Eric-Arellano
Copy link
Contributor Author

I'm also fine not having such a script, TBH. It is very little work to
manually port an ini file to toml. I wouldn't put too much effort into
perfecting it.

Hence settling for 80% conversion, rather than trying to get to 100% conversion. Pareto principle.

Originally, I converted all of our INI files by hand and it was frustratingly tedious. Took probably 10-15 minutes and lots of copy and paste + find and replace. With the script in #9054, took about 3 minutes and only fixing a couple small, remaining issues with multiline options.

Copy link
Contributor

@codealchemy codealchemy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@illicitonion
Copy link
Contributor

(Really important note: I'm not saying we shouldn't do the things currently in this PR, and getting away from ini is a great thing, I'm just trying to explore the design space - the easiest way to not have to support and document and teach custom conventions is to simply not have them!)

I'd be inclined to say that if people want to be doing layered modification of data structures, they should probably be generating a file, rather than doing un-coordinated modifications to accumulated state across files...

I don't follow what you mean. Pants has a long history of allowing you to either replace, append, or filter a list value, including from the command line, env vars, and config files, or a mix of the three. See https://www.pantsbuild.org/options.html#list-options

What do you mean by generating a file? From what I can tell, we must keep support for appending to and filtering from a list option in the config file, even if we don't think that feature is a good one because of "un-coordinated modifications to accumulated state across files".

We could make users wrap appends and filters in quotes:

[jvm]
options = ["-Xmx1g"]

[isort]
config = """+[".isort.cfg"],-["bad.cfg"]"""

But that seems much worse to me. We lose the benefits of native support of TOML lists (like syntax highlighting) and there's a gotcha why you use quotes for appending and filtering but not for replacing.

Yeah, I understand that we have that history. I'm trying to look at this from a perspective of "If I approached Pants for the first time (without the history of knowing about how we had ini files), and I was reading the documentation of how to write a pants.toml file, how would I feel?" and this feels to me like pretty magical non-standard syntax... I've not used another tool which added its own DSL for flexible list manipulation like this inside config files, and I'm not sure what's unique about Pants in this respect. So my real question here is: Is this feature so important that we want to diverge from the standard for it?

If we were designing this from scratch, for a completely new system, I'm not sure what we would choose it to look like; personally, I'd probably be proposing that we read a standard file with no custom DSLs, we can read multiple files if they obviously combine in a standard way (either replacing or extending lists) and that if someone wanted to combine two files in a non-standard way, they should use an (ideally simple) programming language to read the two files and merge them in their non-standard way, outputting a file they can pass to pants (I'd generally recommend something like https://jsonnet.org/ - I suspect Danny would probably recommend coffeescript), rather than inventing a DSL to serialise inside the config file.

The +[".isort.cfg"],-["bad.cfg"] syntax will need to continue work with flags and env vars, so it should probably still be supported in config files, for uniformity. The +/- thing is processed outside of config parsing, in the options processing itself, so I think it should still "just work" here, no?

And at that point, I'm not sure we need the append/filter functionality. It's just more ways to do things. Also, I think +/- works for dict-valued options as well?

This sounds like a pretty reasonable middle-ground of encoding a DSL in magic values (which is something we already document/teach), without modifying the structure of keys, if it works :)

is there a way we could do dictionaries to be more toml-native and less backwards-compatible with ini files?

Yes, there is, through tables and inline tables. That was the original approach I took. But it means that we have no structured way to disambiguate between option scopes vs. dict values, because tables would be used for both purposes. We could make this work, but it means the Config API will become very leaky.

The other reason I didn't like using tables for dict values is that it creates ambiguity for the human reader. Right now, the human reader knows unambiguously that a table header like [cache.java] refers to the option scopes cache.java and not the option --java belonging to the subsystem cache. If we also use tables for dict values, then they'd have table headers like [gen.scrooge.service_deps]and [jvm-platform.platforms]. How do you know from a quick scan if either of these are dict options or are distinct option scopes?

[jvm-platform.platforms]
java7 = { source = 7, target = 7, args = [] }
java8 = { source = 8, target = 8, args = [] }
java9 = { source = 9, target = 9, args = [] }

Worth repeating that the TOML dict support is still an improvement from the INI dict support because of indentation. Possible in TOML:

[jvm-platform]
platforms = """
{
  "java7": {"source": 7, "target": 7, "args": []},
  "java8": {"source": 8, "target": 8, "args": []},
  "java9": {"source": 9, "target": 9, "args": []},
}
"""

You could also the below in TOML. In INI, you must use the below :

[jvm-platform]
platforms = {
    "java7": {"source": 7, "target": 7, "args": []},
    "java8": {"source": 8, "target": 8, "args": []},
    "java9": {"source": 9, "target": 9, "args": []},
  }

Yeah, there's definitely not a clear winner between the two in my mind... I think I prefer leaning into the toml-native way of doing this, rather than using magic multi-line strings, but I haven't convinced myself either way...

I guess the multiline-strings approach is aiming to provide a form of consistency when reading ("I can easily grok what the sections are") at the expense of familiarity when reading ("I understand the general format I'm reading"). The toml-standard way feels like it's better for writing ("I understand what to write"), but between reading and writing, we should probably be optimising for reading. So I guess the question is: Do we want to favour people reading the file as toml, or people reading the file as a set of pants option scopes?

@Eric-Arellano
Copy link
Contributor Author

(Really important note: I'm not saying we shouldn't do the things currently in this PR, and getting away from ini is a great thing, I'm just trying to explore the design space - the easiest way to not have to support and document and teach custom conventions is to simply not have them!)

Thanks for saying this and, more importantly, for asking great questions! This is a big decision and I appreciate you making sure we do due diligence.

So my real question here is: Is this feature so important that we want to diverge from the standard for it?

We're not diverging very far from the specification. In fact, TOML recommends using this my_option.append and my_option.filter syntax (or my_option.add and my_option.remove) in this issue: toml-lang/toml#644 (comment)

This sounds like a pretty reasonable middle-ground of encoding a DSL in magic values (which is something we already document/teach), without modifying the structure of keys, if it works :)

It's fine to allow users to use my_list_option: "+[1],-[0]", given that that's what we convert the .append and .filter sugar to anyways. I do strongly advocate providing the .append and .filter sugar for those who want it, though, as native TOML lists are far more ergonomic than quoted strings.

I guess the multiline-strings approach is aiming to provide a form of consistency when reading ("I can easily grok what the sections are") at the expense of familiarity when reading ("I understand the general format I'm reading"). The toml-standard way feels like it's better for writing ("I understand what to write"), but between reading and writing, we should probably be optimising for reading. So I guess the question is: Do we want to favour people reading the file as toml, or people reading the file as a set of pants option scopes?

Good analysis! I do agree with erring on the side of favoring readers over writers. Generally, only one person or team writes to pants.toml, whereas likely every engineer at the org will end up reading pants.toml at least once.

I think "reading the file as a set of Pants option scopes" is the correct mental model, here. It's ~an implementation detail that this config is implemented in TOML. Really, what our config is is a way to set options for their corresponding option scopes.

I remember when learning Pants I was a bit confused with the idea of options subscopes, especially because command line args make them appear ambiguous: does --cache-java-timeout mean --java-timeout belonging to the scope cache or --timeout belonging to the scope cache.java? Currently, a big advantage of our config files is that they unambiguously distinguish between option scopes vs. options. I'm afraid to lose that distinction.

@benjyw
Copy link
Contributor

benjyw commented Feb 4, 2020

Now that you mention it, I do like add and remove more than append and filter, because append implies that the value must be a single item, but we allow it to be a list I think, and filter is a rather vague term (are we filtering in or filtering out?)

@Eric-Arellano
Copy link
Contributor Author

add and remove sound great to me! @cosmicexplorer had proposed that last week over DM. I only was using append and filter due to the language used in our Options docs, which we can change, of course.

@Eric-Arellano
Copy link
Contributor Author

With us now on 1.26.x, this is now ready to review (and hopefully to land).

@stuhood
Copy link
Member

stuhood commented Feb 6, 2020

So my real question here is: Is this feature so important that we want to diverge from the standard for it?

I think that this kind of "deviation from the standard" (similar to putting # keys in json!) is a convention rather than a deviation from the standard, and relatively harmless. See toml-lang/toml#36 where the toml people refer to this kind of feature. The real challenge is just ensuring that you can document it.

The original motivating usecases for adding/removing still stand: allowing for adding or removing from a default value rather than replacing it is useful (...particularly adding... I could maybe do without removing).


I think that I'm ok with moving toward toml, but I think it's important to track how much pain we're causing for existing users, and to minimize it with automation. So making sure that #9054 gets well tested will be critical.

Thanks!

@Eric-Arellano
Copy link
Contributor Author

I think that I'm ok with moving toward toml, but I think it's important to track how much pain we're causing for existing users, and to minimize it with automation. So making sure that #9054 gets well tested will be critical.

Agreed. I tried to make the tests pretty comprehensive in that PR, along with manually testing on Pants and Toolchain. It'd be very helpful if someone from Twitter could try running it on your config files and confirm that conversion takes less than 5-10 minutes.

@stuhood
Copy link
Member

stuhood commented Feb 7, 2020

I think that I'm ok with moving toward toml, but I think it's important to track how much pain we're causing for existing users, and to minimize it with automation. So making sure that #9054 gets well tested will be critical.

Agreed. I tried to make the tests pretty comprehensive in that PR, along with manually testing on Pants and Toolchain. It'd be very helpful if someone from Twitter could try running it on your config files and confirm that conversion takes less than 5-10 minutes.

Realistically, this won't be before next week. But yes: will do.

Copy link
Contributor

@blorente blorente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore the naming nits!

Great work, thanks!

@@ -182,10 +193,11 @@ def get_source_for_option(self, section: str, option: str) -> Optional[str]:
class _ConfigValues(ABC):
"""Encapsulates resolving the actual config values specified by the user's config file.

Beyond providing better encapsulation, this allows us to support alternative config file formats
in the future if we ever decide to support formats other than INI.
Due to encapsulation, this allows us to support both TOML and INI config files without any of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Futureproofing that actually worked! Wooho!

src/python/pants/option/config.py Outdated Show resolved Hide resolved
src/python/pants/option/config_test.py Outdated Show resolved Hide resolved
src/python/pants/option/config_test.py Outdated Show resolved Hide resolved
src/python/pants/option/config.py Show resolved Hide resolved
src/python/pants/option/config.py Outdated Show resolved Hide resolved
Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general approach here looks solid to me, though I'm unlikely to have time to closely review any of the code :) Thanks!

Borja suggested created a File1 and File2 class. This works well. It centralizes the raw config content with the expected parsed values.

It also makes it easier to add a test case for TOML being the main config and INI being optional.
@Eric-Arellano Eric-Arellano merged commit 4365d03 into pantsbuild:master Feb 7, 2020
@Eric-Arellano Eric-Arellano deleted the pants-toml branch February 7, 2020 21:04
@jsirois
Copy link
Contributor

jsirois commented Feb 21, 2020

Noting a small inaccuracy:

TOML does not have native support for string interpolation, i.e. substituting %(foo)s for the DEFAULT value foo. So, we provide our own implementation that behaves identically.

Our impl only behaves identically for string values - no others. ConfigParser supports substitution of any value since it treats all values as strings. I ran into this in the PEX upgrade PR where I had factored a shared parallelism integer value up into [DEFAULT]. In toml I'll just copypasta.

@Eric-Arellano
Copy link
Contributor Author

Our impl only behaves identically for string values - no others. ConfigParser supports substitution of any value since it treats all values as strings. I ran into this in the PEX upgrade PR where I had factored a shared parallelism integer value up into [DEFAULT]. In toml I'll just copypasta.

This is true. But, this should work:

[DEFAULT]
num_workers = 4

[python-setup]
parallelism = "%(num_workers)"

Why? We stringify all TOML values before passing to parser.py, at the moment, so that TOML behaves identically to INI, where every value is a string.

def _stringify_val(
self,
raw_value: _TomlValue,
*,
option: str,
section: str,
section_values: Dict,
interpolate: bool = True,
list_prefix: Optional[str] = None,
) -> str:
"""For parity with configparser, we convert all values back to strings, which allows us to avoid
upstream changes to files like parser.py.
This is clunky. If we drop INI support, we should remove this and use native values (although we
must still support interpolation).
"""
possibly_interpolate = partial(
self._possibly_interpolate_value,
option=option,
section=section,
section_values=section_values,
)
if isinstance(raw_value, str):
return possibly_interpolate(raw_value) if interpolate else raw_value
if isinstance(raw_value, list):
def stringify_list_member(member: _TomlPrimitve) -> str:
if not isinstance(member, str):
return str(member)
interpolated_member = possibly_interpolate(member) if interpolate else member
return f'"{interpolated_member}"'
list_members = ", ".join(stringify_list_member(member) for member in raw_value)
return f"{list_prefix or ''}[{list_members}]"
return str(raw_value)

@jsirois
Copy link
Contributor

jsirois commented Feb 21, 2020

Erm. OK - thanks. I'm not honestly convinced the toml switch was a win - our use of it is now just about as unintuitive / unspecified as ini.

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Feb 21, 2020 via email

@jsirois
Copy link
Contributor

jsirois commented Feb 21, 2020

I think Eric did the best you can do with the impedance mismatch between an untyped config format and a typed one. The tradeoff is our use of toml no longer conforms to the types you expect in the toml file itself when you want to substitute. That use case is probably rare, so its probably a fine tradeoff. I just meant to point out that our use of toml deviates from standard toml files enough at this point that any argument for switching to toml for standardization benefits is probably washed away at this point.

Onward.

Eric-Arellano added a commit that referenced this pull request Feb 27, 2020
As decided in #9052, TOML brings several benefits over INI config files (despite some weirdness around things like dict values).

This updates our docs to refer to `pants.toml`. A followup will deprecate `pants.ini`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants