-
-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add migrate_to_toml_config.py
script to automatically update INI config files to TOML
#9054
Conversation
…nfig files to TOML
Turns out that we already did, but worth formalizing this.
Yes!
Where will this be documented? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm totally fine accepting this if we're very confident the regex solution will work 80%, but I feel like the ini parser route might not be hard to drop in instead, and to me it feels much more likely that it would work for whatever our users' code might be, and while it's very good that we've identified the toml lint service to clean up the result, it still seems like there's a lot of value in trying to make the automated park work as much as possible.
|
||
|
||
def generate_new_config(config: Path) -> List[str]: | ||
original_text = config.read_text() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were entirely correct that I would love the regex, and I am aware that this is an 80% script, but it seems like using a python ini config parser wouldn't be a huge amount of effort, because we'll then be working with parsed objects instead of raw strings. Is there a reason we decided to go the regex route here over an ini parser? I would love to toss up an example ini -> toml parser or pair on it if that seems like the right way to go.
assert result == expected.splitlines() | ||
|
||
|
||
def test_fully_automatable_config() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing great tests for this script and acknowledging that everything we ship to users should be tested!!! This makes me happy.
I'm on the same boat as @cosmicexplorer. Maybe, if using the full ini parser is too much of a hassle[1], we can fail visibly on the lines we leave alone (e.g. Also on the same boat with regards to tests. Thanks for them! [1]: Or if it swallows comments, which I'm not sure it does, but if it does it's a dealbreaker. |
Yes. See #9052 (comment). The issue with using a parser is that you lose information meaningful to the user, such as comments and whitespace between sections. Parsers throw away information like a section having 4 blank lines between the next section vs. only 1 blank line, and IMO we should do our best to preserve that. We could possibly combine the parser with a regex-based approach. We do that in Outside of converting multiline options, comments, and nested lists/dictionaries, are there any use cases this script does not do that you think it should? Or one of those 3 that you think are important to automatically fix?
I agree that this is nice to have, but did not pursue generating warnings—unlike with -- Generally, with this script, my main pitch is the Pareto principle. It would be neat if we could get this script converting things 100% perfectly, but that would mean far more time into this script than I have budgeted. I don't expect users to have more than 50-500 lines of config(s). If this converts 80%, that leaves them 10-50 lines to manually fix. They'll be able to use all these tools to figure out how to fix those remaining issues:
So, I am not too concerned about getting us from 80% to 100%. |
This should probably read "the existing set of ini parsers". "Good" parsers will expose every part of the language to the user, including comments. I haven't immediately been able to find one that does this and I've ruled out writing one from scratch, but it might be possible to dive into the
I'm concerned that this script might e.g. accidentally miss a compiler warnings option and secretly cause people to start missing bugs. This should be an extremely important concern for us to avoid, imho. The twitter repo makes heavy use of I'm a big fan of @blorente's suggestion to scream loudly if something is recognized as a construct that's not supported by this script. If it did something like: [compile.zinc]
####### UNTRANSLATED INI CONTENT -- MUST BE HAND-CONVERTED TO TOML
compiler_option_sets: {
'asdf': [...],
...
}
####### END UNTRANSLATED INI CONTENT |
I don't think this is possible. The TOML file will not load correctly when running
That makes this script much harder to implement. For example, the regex currently will not match if there is a comment, it's a multiline option, or there's a nested list. We don't know which of these 3 causes it is, only that it's not possible to safely fix. Now, we would have to change the regex to check which of those 3 situations is happening so that we can generate the correct warning message. I don't currently have the time to get that last 20%, especially given that users will see the errors in the 5-50 lines that we could not convert through all the different mechanisms described in #9054 (comment). Perhaps we could land this and then you can add the extra 20%? |
I'm fine with merging without the following, but my additional 2 cents:
For me, it would be enough to output "I couldn't fix line x, refer to for a list of possible reasons" |
option, value = parsed_dict_line.groups() | ||
updated_text_lines[i] = f'{option} = """{value}"""' | ||
continue | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I'll go on a limb here and suggest some plaintext code:
This is supposed to go after the last if and before the return
print(f'We couldn't automatically convert line {i}: {line}. Please refer to https://github.com/pantsbuild/pants/pull/9054 for a description of what might be the case.', file=sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the code snippet!
The one challenge with this snippet is that it will trigger on every non-option line, such as every single one of these lines, except for the very last one:
# Comment not attached to any option
[cache.java]
args: [
'foo',
'bar'
]
next_option: foo
I think that's more output than we would want. We could, of course, teach the script how to handle those different things like what a section header looks like, but then we're getting to the complexity I wanted to avoid.
--
Favor: can you please pull down this PR and run build-support/migration-support/migrate_to_toml_config.py pants.ini pants.travis-ci.ini pants.remote.ini
, then open up the new TOML files in your editor? (Or, do this on Twitter's config files).
I suspect you'll find that there aren't many remaining issues and that those errors are fairly intuitive to fix, thanks to the editor warnings. This is why I'm pushing back on going from 80-100%; after trying out the script, it doesn't seem worth it to make it even better.
@benjyw's point is that we don't even need to provide a script in the first place - it's not very difficult to convert from INI to TOML, only a little tedious. We're providing this script as a courtesy to users, and I don't think we need to have it be perfect for them to be happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think I'd prefer to be overly verbose, but I agree that nobody is going to die for figuring out 20 manual changes. Particularly, because "users" here most likely means the same people that are seeing these changes happen, not actual end-users.
So not going to block on that, thanks for putting the effort to consider this point :)
Looks like TOML will land soon after 1-2 more reviews—i.e. we are going to move forward with TOML support—so I'm going to merge this. This PR works without official TOML support as it's only a conversion script. I want to merge now so that we aren't blocked by CI later in the day. |
## Problem INI files have some issues that make them frustrating for users: * Gotcha with when to use quotes: * If the option is a string, you cannot use quotes, e.g. `version: ipython==4.8`. * If the value appears in a list, you _must_ use quotes, e.g. `config: [".isort.cfg"]` * Gotcha with indenting lists and dict values: * The 7th paragraph of our instructions for new users warns: > Note that the formatting of the plugins list is important; all lines below the plugins: line must be indented by at least one white space to form logical continuation lines. This is standard for Python ini files. See Options for a guide on modifying your pants.ini. * Awkard to append and filter to a list option at the same time: ```ini backend_packages: +[ 'pants.backend.scala.lint.scalafmt'],-[ 'pants.backend.codegen.antlr.java', 'pants.backend.codegen.antlr.python', 'pants.backend.codegen.wire.java'] ``` * No defined standard for INI. Users may try using INI features not supported or interpreted differently by Python's `configparser`. * No syntax highlighting or validation from editors, given that there's no standard. ## Why TOML? ### PEP 518's thoughts on file formats Python core developers recently researched modern config values to choose the format for `pyproject.toml`. They compiled their results into [PEP 518](https://www.python.org/dev/peps/pep-0518/#other-file-formats), comparing INI, JSON, YAML, and TOML. Reasons they rejected INI: * No standard. Reasons they rejected JSON: * Difficult for humans to edit, e.g. no support for comments. Reasons they rejected YAML: * Specification is 86 pages when printed! * Not safe by default due to arbitrary execution of code. * PyYAML is large and has a C extension. Reasons they liked TOML: * Simple while still being flexible enough. * Rust core developers said they have been "quite happy" with using TOML for Cargo. ### Reasons for us to use TOML * Very similar syntax to INI, as TOML was inspired by INI. * Less churn for users. * Simpler than YAML. * I don't think we want users using anchors/aliases, for example. * Alignment with Rust and Python ecosystems. ## Solution Implement a `config._TomlValues` class that behaves identically to `_IniValues`. Both of these are subclassses of `_ConfigValues`, meaning that the two config formats may be interchanged without any file outside of `config.py` knowing which config format was originally used. We use extremely comprehensive unit tests to ensure that `_TomlValues` behaves identically to `_IniValues`. ### How we implement interpolation TOML does not have native support for string interpolation, i.e. substituting `%(foo)s` for the `DEFAULT` value `foo`. So, we provide our own implementation that behaves identically. ### How we implement list options TOML has native list support, but does not have syntax for `+[]` and `-[]`. We could require users to use multiline strings, like `config = """+[".isort.cfg"],-["bad.cfg"]"""`. This will work, and in fact the format we convert into in `config.py` so that the rest of Pants understands the value. But, `config = """+[".isort.cfg"],-["bad.cfg"]"""` is clunky, so we introduce syntatic sugar: ```toml [jvm] options = ["-Xmx1g"] [isort] config.add= [".isort.cfg"] config.remove = ["bad.cfg"] ``` This sugar allows users to use native TOML lists (more ergonomic than strings) and makes it easy to both add to and remove from a list option at the same time. ### How we implement dict options TOML has native support for dicts through both [Tables](https://github.com/toml-lang/toml#user-content-table) and [Inline Tables](https://github.com/toml-lang/toml#inline-table). However, we already use Tables to implement the distinct option scopes. So, if we also used Tables to implement dictionary values, we would introduce an ambiguity whether a table refers to an options scope or a dictionary value. Originally, this PR tried to take this approach, but quickly rejected support for using Tables because it dramatically complicates the solution and would cause a leaky API. Instead, users should use a multiline string for dictionary options: ```toml [jvm-platform] default_platform = "java8" platforms = """ { "java7": {"source": 7, "target": 7, "args": []}, "java8": {"source": 8, "target": 8, "args": []}, "java9": {"source": 9, "target": 9, "args": []}, } """ ``` (Note the indentation - this would error in INI because of being indented too much to the left, but it works in TOML.) ## Remaining followup 1. [ ] Update our own usage to use TOML for dogfooding. * We'll also update Toolchain's config. 1. [x] Create a script that will automatically convert 80%-90% of INI files to TOML. * See #9054. 1. [ ] Update our docs to use TOML. 1. [ ] Deprecate INI. * Will allow us to remove a lot of awkwardness around the TOML implementation. * Will allow us to confidently generate suggested config values in help/deprecation/error messages, whereas now we have to deal with things like indentation of lists.
This script removes 80% of the tedium of migrating from INI config files to TOML config files.
It does not attempt to handle the remaining 20%:
Users will be able to manually fix the remaining issues through either the validation from their editor or the site https://www.toml-lint.com.
--
Similar to
fix_deprecated_globs_usage.py
, this is not officially shipped with thepantsbuild.pants
wheel, but we keep it in this repository for version control, tests, and to have a stable URL that we can instruct users tocurl
.