Add `migrate_to_toml_config.py` script to automatically update INI config files to TOML #9054

Eric-Arellano · 2020-02-03T18:17:05Z

This script removes 80% of the tedium of migrating from INI config files to TOML config files.

It does not attempt to handle the remaining 20%:

multiline options
comments
nested lists and nested dictionaries

Users will be able to manually fix the remaining issues through either the validation from their editor or the site https://www.toml-lint.com.

--

Similar to fix_deprecated_globs_usage.py, this is not officially shipped with the pantsbuild.pants wheel, but we keep it in this repository for version control, tests, and to have a stable URL that we can instruct users to curl.

…nfig files to TOML

Turns out that we already did, but worth formalizing this.

cosmicexplorer · 2020-02-04T08:16:23Z

This script removes 80% of the tedium

Yes!

but we keep it in this repository for version control, tests, and to have a stable URL that we can instruct users to curl.

Where will this be documented?

cosmicexplorer

I'm totally fine accepting this if we're very confident the regex solution will work 80%, but I feel like the ini parser route might not be hard to drop in instead, and to me it feels much more likely that it would work for whatever our users' code might be, and while it's very good that we've identified the toml lint service to clean up the result, it still seems like there's a lot of value in trying to make the automated park work as much as possible.

cosmicexplorer · 2020-02-04T08:19:28Z

build-support/migration-support/migrate_to_toml_config.py

+
+
+def generate_new_config(config: Path) -> List[str]:
+  original_text = config.read_text()


You were entirely correct that I would love the regex, and I am aware that this is an 80% script, but it seems like using a python ini config parser wouldn't be a huge amount of effort, because we'll then be working with parsed objects instead of raw strings. Is there a reason we decided to go the regex route here over an ini parser? I would love to toss up an example ini -> toml parser or pair on it if that seems like the right way to go.

cosmicexplorer · 2020-02-04T08:20:53Z

build-support/migration-support/migrate_to_toml_config_test.py

+  assert result == expected.splitlines()
+
+
+def test_fully_automatable_config() -> None:


Thank you for writing great tests for this script and acknowledging that everything we ship to users should be tested!!! This makes me happy.

blorente · 2020-02-04T13:09:27Z

I'm on the same boat as @cosmicexplorer. Maybe, if using the full ini parser is too much of a hassle[1], we can fail visibly on the lines we leave alone (e.g. line 45, "opt1: False # Good Riddance" was not converted because it had a comment in it.).

Also on the same boat with regards to tests. Thanks for them!

[1]: Or if it swallows comments, which I'm not sure it does, but if it does it's a dealbreaker.

Eric-Arellano · 2020-02-04T16:43:32Z

Is there a reason we decided to go the regex route here over an ini parser?

Yes. See #9052 (comment). The issue with using a parser is that you lose information meaningful to the user, such as comments and whitespace between sections. Parsers throw away information like a section having 4 blank lines between the next section vs. only 1 blank line, and IMO we should do our best to preserve that.

We could possibly combine the parser with a regex-based approach. We do that in fix_deprecated_globs_usages.py: use ast to parse the globs() function and then use a simple if # in line check to nope out of the change if it has a comment. But, the allowable symbols are much more complex here than in a globs() function, so we would still need to keep the pre-existing regex to detect if there are comments. In other words, we'd need both the regex and the configparser. At that point, I'm not sure configparser would bring much value?

Outside of converting multiline options, comments, and nested lists/dictionaries, are there any use cases this script does not do that you think it should? Or one of those 3 that you think are important to automatically fix?

Maybe, if using the full ini parser is too much of a hassle[1], we can fail visibly on the lines we leave alone

I agree that this is nice to have, but did not pursue generating warnings—unlike with fix_deprecated_globs_usages.py—because a) the scope updating 1-3 config files is much smaller than updating hundreds of BUILD files and b) there are good tools to draw attention already to what the remaining issues are (editors and https://www.toml-lint.com/).

--

Generally, with this script, my main pitch is the Pareto principle. It would be neat if we could get this script converting things 100% perfectly, but that would mean far more time into this script than I have budgeted.

I don't expect users to have more than 50-500 lines of config(s). If this converts 80%, that leaves them 10-50 lines to manually fix. They'll be able to use all these tools to figure out how to fix those remaining issues:

the example provided by the 80% we did convert
our email explaining the highlights, especially with list options and dict options
their editors
https://www.toml-lint.com/
the TOML specification

So, I am not too concerned about getting us from 80% to 100%.

cosmicexplorer · 2020-02-04T20:57:25Z

Parsers throw away information like a section having 4 blank lines between the next section vs. only 1 blank line, and IMO we should do our best to preserve that.

This should probably read "the existing set of ini parsers". "Good" parsers will expose every part of the language to the user, including comments. I haven't immediately been able to find one that does this and I've ruled out writing one from scratch, but it might be possible to dive into the ConfigParser code and make some subclass which does retain this information.

Outside of converting multiline options, comments, and nested lists/dictionaries, are there any use cases this script does not do that you think it should?

I'm concerned that this script might e.g. accidentally miss a compiler warnings option and secretly cause people to start missing bugs. This should be an extremely important concern for us to avoid, imho. The twitter repo makes heavy use of compiler_option_sets, which you'll note is an option that has both compiler warnings options and is also a nested dict.

I'm a big fan of @blorente's suggestion to scream loudly if something is recognized as a construct that's not supported by this script. If it did something like:

[compile.zinc]
####### UNTRANSLATED INI CONTENT -- MUST BE HAND-CONVERTED TO TOML
compiler_option_sets: {
  'asdf': [...],
  ...
}
####### END UNTRANSLATED INI CONTENT

Eric-Arellano · 2020-02-04T22:44:57Z

I'm concerned that this script might e.g. accidentally miss a compiler warnings option and secretly cause people to start missing bugs.

I don't think this is possible. The TOML file will not load correctly when running ./pants if it is not entirely valid TOML. They can't silently have bugs - the bugs are loud.

suggestion to scream loudly if something is recognized as a construct that's not supported by this script

That makes this script much harder to implement. For example, the regex currently will not match if there is a comment, it's a multiline option, or there's a nested list. We don't know which of these 3 causes it is, only that it's not possible to safely fix. Now, we would have to change the regex to check which of those 3 situations is happening so that we can generate the correct warning message.

I don't currently have the time to get that last 20%, especially given that users will see the errors in the 5-50 lines that we could not convert through all the different mechanisms described in #9054 (comment).

Perhaps we could land this and then you can add the extra 20%?

blorente · 2020-02-05T12:17:15Z

I'm fine with merging without the following, but my additional 2 cents:

We don't know which of these 3 causes it is, only that it's not possible to safely fix.

For me, it would be enough to output "I couldn't fix line x, refer to for a list of possible reasons"
Let me spend a little bit of time to prototype what it would look like.

blorente · 2020-02-05T12:25:24Z

build-support/migration-support/migrate_to_toml_config.py

+      option, value = parsed_dict_line.groups()
+      updated_text_lines[i] = f'{option} = """{value}"""'
+      continue
+


maybe I'll go on a limb here and suggest some plaintext code:
This is supposed to go after the last if and before the return

Suggested change

print(f'We couldn't automatically convert line {i}: {line}. Please refer to https://github.com/pantsbuild/pants/pull/9054 for a description of what might be the case.', file=sys.stderr)

Thank you for the code snippet!

The one challenge with this snippet is that it will trigger on every non-option line, such as every single one of these lines, except for the very last one:

# Comment not attached to any option [cache.java] args: [ 'foo', 'bar' ] next_option: foo

I think that's more output than we would want. We could, of course, teach the script how to handle those different things like what a section header looks like, but then we're getting to the complexity I wanted to avoid.

--

Favor: can you please pull down this PR and run build-support/migration-support/migrate_to_toml_config.py pants.ini pants.travis-ci.ini pants.remote.ini, then open up the new TOML files in your editor? (Or, do this on Twitter's config files).

I suspect you'll find that there aren't many remaining issues and that those errors are fairly intuitive to fix, thanks to the editor warnings. This is why I'm pushing back on going from 80-100%; after trying out the script, it doesn't seem worth it to make it even better.

@benjyw's point is that we don't even need to provide a script in the first place - it's not very difficult to convert from INI to TOML, only a little tedious. We're providing this script as a courtesy to users, and I don't think we need to have it be perfect for them to be happy.

I still think I'd prefer to be overly verbose, but I agree that nobody is going to die for figuring out 20 manual changes. Particularly, because "users" here most likely means the same people that are seeing these changes happen, not actual end-users.

So not going to block on that, thanks for putting the effort to consider this point :)

build-support/migration-support/migrate_to_toml_config.py

Eric-Arellano · 2020-02-07T20:36:19Z

Looks like TOML will land soon after 1-2 more reviews—i.e. we are going to move forward with TOML support—so I'm going to merge this. This PR works without official TOML support as it's only a conversion script. I want to merge now so that we aren't blocked by CI later in the day.

## Problem INI files have some issues that make them frustrating for users: * Gotcha with when to use quotes: * If the option is a string, you cannot use quotes, e.g. `version: ipython==4.8`. * If the value appears in a list, you _must_ use quotes, e.g. `config: [".isort.cfg"]` * Gotcha with indenting lists and dict values: * The 7th paragraph of our instructions for new users warns: > Note that the formatting of the plugins list is important; all lines below the plugins: line must be indented by at least one white space to form logical continuation lines. This is standard for Python ini files. See Options for a guide on modifying your pants.ini. * Awkard to append and filter to a list option at the same time: ```ini backend_packages: +[ 'pants.backend.scala.lint.scalafmt'],-[ 'pants.backend.codegen.antlr.java', 'pants.backend.codegen.antlr.python', 'pants.backend.codegen.wire.java'] ``` * No defined standard for INI. Users may try using INI features not supported or interpreted differently by Python's `configparser`. * No syntax highlighting or validation from editors, given that there's no standard. ## Why TOML? ### PEP 518's thoughts on file formats Python core developers recently researched modern config values to choose the format for `pyproject.toml`. They compiled their results into [PEP 518](https://www.python.org/dev/peps/pep-0518/#other-file-formats), comparing INI, JSON, YAML, and TOML. Reasons they rejected INI: * No standard. Reasons they rejected JSON: * Difficult for humans to edit, e.g. no support for comments. Reasons they rejected YAML: * Specification is 86 pages when printed! * Not safe by default due to arbitrary execution of code. * PyYAML is large and has a C extension. Reasons they liked TOML: * Simple while still being flexible enough. * Rust core developers said they have been "quite happy" with using TOML for Cargo. ### Reasons for us to use TOML * Very similar syntax to INI, as TOML was inspired by INI. * Less churn for users. * Simpler than YAML. * I don't think we want users using anchors/aliases, for example. * Alignment with Rust and Python ecosystems. ## Solution Implement a `config._TomlValues` class that behaves identically to `_IniValues`. Both of these are subclassses of `_ConfigValues`, meaning that the two config formats may be interchanged without any file outside of `config.py` knowing which config format was originally used. We use extremely comprehensive unit tests to ensure that `_TomlValues` behaves identically to `_IniValues`. ### How we implement interpolation TOML does not have native support for string interpolation, i.e. substituting `%(foo)s` for the `DEFAULT` value `foo`. So, we provide our own implementation that behaves identically. ### How we implement list options TOML has native list support, but does not have syntax for `+[]` and `-[]`. We could require users to use multiline strings, like `config = """+[".isort.cfg"],-["bad.cfg"]"""`. This will work, and in fact the format we convert into in `config.py` so that the rest of Pants understands the value. But, `config = """+[".isort.cfg"],-["bad.cfg"]"""` is clunky, so we introduce syntatic sugar: ```toml [jvm] options = ["-Xmx1g"] [isort] config.add= [".isort.cfg"] config.remove = ["bad.cfg"] ``` This sugar allows users to use native TOML lists (more ergonomic than strings) and makes it easy to both add to and remove from a list option at the same time. ### How we implement dict options TOML has native support for dicts through both [Tables](https://github.com/toml-lang/toml#user-content-table) and [Inline Tables](https://github.com/toml-lang/toml#inline-table). However, we already use Tables to implement the distinct option scopes. So, if we also used Tables to implement dictionary values, we would introduce an ambiguity whether a table refers to an options scope or a dictionary value. Originally, this PR tried to take this approach, but quickly rejected support for using Tables because it dramatically complicates the solution and would cause a leaky API. Instead, users should use a multiline string for dictionary options: ```toml [jvm-platform] default_platform = "java8" platforms = """ { "java7": {"source": 7, "target": 7, "args": []}, "java8": {"source": 8, "target": 8, "args": []}, "java9": {"source": 9, "target": 9, "args": []}, } """ ``` (Note the indentation - this would error in INI because of being indented too much to the left, but it works in TOML.) ## Remaining followup 1. [ ] Update our own usage to use TOML for dogfooding. * We'll also update Toolchain's config. 1. [x] Create a script that will automatically convert 80%-90% of INI files to TOML. * See #9054. 1. [ ] Update our docs to use TOML. 1. [ ] Deprecate INI. * Will allow us to remove a lot of awkwardness around the TOML implementation. * Will allow us to confidently generate suggested config values in help/deprecation/error messages, whereas now we have to deal with things like indentation of lists.

Add migrate_to_toml_config.py script to automatically update INI co…

531fdcc

…nfig files to TOML

Eric-Arellano requested review from stuhood, cosmicexplorer, codealchemy and blorente February 3, 2020 18:17

Eric-Arellano added 2 commits February 3, 2020 11:23

Actually test build-support::. Oops!

0a63df0

Test that we ignore list and dict options with comments

0a9765c

Turns out that we already did, but worth formalizing this.

Eric-Arellano mentioned this pull request Feb 3, 2020

Add support for TOML config files (pants.toml) #9052

Merged

4 tasks

codealchemy approved these changes Feb 3, 2020

View reviewed changes

cosmicexplorer reviewed Feb 4, 2020

View reviewed changes

blorente reviewed Feb 5, 2020

View reviewed changes

build-support/migration-support/migrate_to_toml_config.py Show resolved Hide resolved

Use list.add and list.remove, rather than list.append and list.filter

a4853af

blorente self-requested a review February 7, 2020 11:36

blorente approved these changes Feb 7, 2020

View reviewed changes

Eric-Arellano merged commit 4fb7572 into pantsbuild:master Feb 7, 2020

Eric-Arellano deleted the ini-to-toml branch February 7, 2020 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `migrate_to_toml_config.py` script to automatically update INI config files to TOML #9054

Add `migrate_to_toml_config.py` script to automatically update INI config files to TOML #9054

Eric-Arellano commented Feb 3, 2020

cosmicexplorer commented Feb 4, 2020

cosmicexplorer left a comment

cosmicexplorer Feb 4, 2020

cosmicexplorer Feb 4, 2020

blorente commented Feb 4, 2020 •

edited

Loading

Eric-Arellano commented Feb 4, 2020

cosmicexplorer commented Feb 4, 2020

Eric-Arellano commented Feb 4, 2020

blorente commented Feb 5, 2020 •

edited

Loading

blorente Feb 5, 2020 •

edited

Loading

Eric-Arellano Feb 6, 2020

blorente Feb 7, 2020

Eric-Arellano commented Feb 7, 2020



		def generate_new_config(config: Path) -> List[str]:
		original_text = config.read_text()

		assert result == expected.splitlines()


		def test_fully_automatable_config() -> None:


	print(f'We couldn't automatically convert line {i}: {line}. Please refer to https://github.com/pantsbuild/pants/pull/9054 for a description of what might be the case.', file=sys.stderr)

Add migrate_to_toml_config.py script to automatically update INI config files to TOML #9054

Add migrate_to_toml_config.py script to automatically update INI config files to TOML #9054

Conversation

Eric-Arellano commented Feb 3, 2020

cosmicexplorer commented Feb 4, 2020

cosmicexplorer left a comment

Choose a reason for hiding this comment

cosmicexplorer Feb 4, 2020

Choose a reason for hiding this comment

cosmicexplorer Feb 4, 2020

Choose a reason for hiding this comment

blorente commented Feb 4, 2020 • edited Loading

Eric-Arellano commented Feb 4, 2020

cosmicexplorer commented Feb 4, 2020

Eric-Arellano commented Feb 4, 2020

blorente commented Feb 5, 2020 • edited Loading

blorente Feb 5, 2020 • edited Loading

Choose a reason for hiding this comment

Eric-Arellano Feb 6, 2020

Choose a reason for hiding this comment

blorente Feb 7, 2020

Choose a reason for hiding this comment

Eric-Arellano commented Feb 7, 2020

Add `migrate_to_toml_config.py` script to automatically update INI config files to TOML #9054

Add `migrate_to_toml_config.py` script to automatically update INI config files to TOML #9054

blorente commented Feb 4, 2020 •

edited

Loading

blorente commented Feb 5, 2020 •

edited

Loading

blorente Feb 5, 2020 •

edited

Loading