refactor: identifiers in a file #511

henryiii · 2021-01-02T16:31:11Z

Fixes #431, should make #496 easier. I did not provide a way to override the files. I am now using importlib.resources here. ~~instead of messing with the backport, we can wait until #508 and then just use the stdlib.~~ Edit: the 3.9 version is much nicer and less intrusive, using that.

YannickJadoul · 2021-01-02T16:44:56Z

To nitpick, pinned_docker_images.cfg is read with configparser.ConfigParser. Do we have a reason to pick toml for this?

henryiii · 2021-01-02T16:51:08Z

See #496 (comment)

And TOML is a real format, while .cfg is a rather poorly defined thing. And we already require toml (YAML would have been much more natural - I'm kind of working around the fact that TOML/cfg has to have keys, while YAML could have just been a list. Really is ugly when you have the dual Python 2.7 builds with matching identifiers.

YannickJadoul · 2021-01-02T17:05:11Z

See #496 (comment)

Right! I also says

similar to the manylinux one at cibuildwheel/resources/pinned_docker_images.cfg

but fair enough ;-)

Would it then make sense to move pinned_docker_images.cfg to toml as well (maybe in a follow-up)? These formats might look too similar to both keep around, no?

Really is ugly when you have the dual Python 2.7 builds with matching identifiers.

"ugly" and "Python 2.7" in the same sentence? You must be joking! ;-)

But yes, the overview is completely gone in these files :-(

cibuildwheel/util.py

henryiii · 2021-01-02T17:12:10Z

Would it then make sense to move pinned_docker_images.cfg to toml as well

I think we should match them up, yes. If we don't want to add yaml as a dependency, what about using json?

henryiii · 2021-01-02T17:19:08Z

TOML (basically the same as cfg):

[cp39-manylinux_aarch64]
version="3.9"
path_str="/opt/python/cp39-cp39"

YAML:

- identifier: cp39-manylinux_aarch64
  version: 3.9
  path_str: /opt/python/cp39-cp39

JSON (technically valid YAML too):

{"identifier": "cp39-manylinux_aarch64", "version": "3.9", "path_str": "/opt/python/cp39-cp39"},

Currently CPython 2.7 is the only one with two builds from one build identifier (m and mu).

We could also combine the files into one file, as OS's don't overlap.

YannickJadoul · 2021-01-02T18:01:19Z

The JSON looks best to me, indeed, the nice thing being that if you have a list of those, they will immediately show you the versions without scrolling between blocks.

Funnily enough, none of them look as good as our Python namedtuple.

(This might of course also just be me, being used to the current format in Python code, and not yet to the TOML version.)

henryiii · 2021-01-02T18:10:41Z

This seems more like data to me, always seems odd to have this sitting inside the .py files, especially having to edit them for updates. And if we add the ability to update these files programmatically, that’s much better as a data file!

YAML would allow:

- {identifier: cp39-manylinux_aarch64, version: 3.9, path_str: /opt/python/cp39-cp39}

not a fan of mixing yaml and json styles but would probably be cleanest. But adds a dependency. And likely hard to auto generate for an update.

we could shorten identifier or path_str.

YannickJadoul · 2021-01-02T18:25:32Z

not a fan of mixing yaml and json styles but would probably be cleanest.

Me neither, but this looks nice.

But adds a dependency. And likely hard to auto generate for an update.

Well, with something like this we're pretty close, no?

{
"linux": [
    ...
    {"identifier": "cp39-manylinux_x86_64", "version": "3.9", "path_str": "/opt/python/cp39-cp39"},
    {"identifier": "cp39-manylinux_i686", "version": "3.9", "path_str": "/opt/python/cp39-cp39"},
    ...
    {"identifier": "cp39-manylinux_aarch64", "version": "3.9", "path_str": "/opt/python/cp39-cp39"},
],
...
}

I'm indeed not sure whether adding yaml is worth it (though it should be easy enough); up to @joerick, that one, I'd say.

YannickJadoul · 2021-01-02T18:37:10Z

For the record, I did ask about json/yaml in #256, but seems we got caught up in another discussion: #256 (comment)

henryiii · 2021-01-02T19:13:53Z

Questions for @joerick:

One file or three
Format (JSON, YAML, TOML with current workaround, cfg with same workaround)
Should we change pinned_docker_images.cfg to match if we pick one of the others?

If you want it to be a vote, I ordered the format list above by my preferences, json first.

joerick · 2021-01-04T17:07:53Z

One file or three

The PythonConfiguration objects for each platform have different schemas, so I think they should be stored in different files.

Format (JSON, YAML, TOML with current workaround, cfg with same workaround)

I don't think we should be adding another runtime dependency to parse an internal config file. TOML is currently only needed for the cp35 windows workaround, so it might be dropped again after that. I'd be more in favour of configparser or JSON. Generally speaking I don't much enjoy JSON as a user-facing format, it's very brittle to edit directly.

I think we should be using a list of configs, not the dict of 'identifier': {...attributes}. Ordering is important, and as you've found, on linux there are two builds with the same identifier. That rules out configparser, because it doesn't support lists. TOML does with [[array_name]], but it's probably not worth it here.

So, it's probably back to JSON. But, for a bit of fun, I tried out https://github.com/Instagram/LibCST to see if there was a pure-python way to do it. And honestly, it's not too crazy. Here's a script that patches the urls in macos.py:

import libcst as cst


class UpgradeTransformer(cst.CSTTransformer):
    def leave_Call(self, original_node: cst.Call, updated_node: cst.Call) -> cst.BaseExpression:
        node = updated_node

        if not (isinstance(node.func, cst.Name) and node.func.value == 'PythonConfiguration'):
            return node

        for arg in node.args:
            assert arg.keyword is not None

            if arg.keyword.value == 'url':
                assert isinstance(arg.value, cst.SimpleString)
                url = arg.value.evaluated_value

                print(url)
                # patch the url here
                url += '#patched'

                node = node.with_deep_changes(arg.value, value=f"'{url}'")

        return node


macos_cst = cst.parse_module(open('cibuildwheel/macos.py').read())
macos_cst = macos_cst.visit(UpgradeTransformer())
print(macos_cst.code)

This could be used to update the NamedTuples in-place, or another option could be to use something like the above with a standalone configs file, like macos_python_configs.py.

I'm probably too close to this to tell if it's a good idea, what do you guys think?

henryiii · 2021-01-04T17:25:09Z

I'd rather a config format (JSON) over editing Python files. I'm not a fan of Python files for config; Python generally should be code, and data should be in a data format. I would expect to be able to take a Python file and add arbitrary Python code, which such a parser as you've shown would choke on.

I think we should be using a list of configs, not the dict of 'identifier': {...attributes}. Ordering is important, and as you've found, on linux there are two builds with the same identifier.

That's what I mean with "the current workaround" - I was referring to the fact that it was a dict rather than a list. TOML allows an array inside a section, yes, but the values of that array are not allowed to be dicts, I believe - only simple values. Then you'd have to parse again with some other format on the strings.

YannickJadoul · 2021-01-04T17:34:39Z

I definitely like the "fun" part, but I'm not entirely sure it's the right match here? Also, if you didn't want to introduce another dependency, libcst is a bit annoying, there ;-)

Not sure I'm convinced of that myself, but we don't need to per se follow a fancy "everything's possible" kind of standard, ofc. Something like

cp39-manylinux_x86_64,  3.9, /opt/python/cp39-cp39
cp39-manylinux_i686,    3.9, /opt/python/cp39-cp39
cp39-manylinux_aarch64, 3.9, /opt/python/cp39-cp39

might perfectly server our purpose for now to populate the existing NamedTuple, won't be anything hard to parse (some split() and strip()?), and can be easily replaced by something more fancy once we do suddenly want separators (could be ; or \t or anything else) or spaces in our strings.

henryiii · 2021-01-04T17:36:34Z

This is called csv, and it's got a parser in the standard library. ;)

YannickJadoul · 2021-01-04T17:38:29Z

I know, but I didn't want to pin us to csv per se ;-)
Also, never been a great fan of the CSV parser in the standard library, since it's still pretty verbose. But yes.

YannickJadoul · 2021-01-04T17:44:10Z

Actually, I stand corrected. It's just a oneliner:

list(csv.DictReader(open("test.csv", 'r'), fieldnames=["identifier", "version", "path_str"]))

~~(Except for those spaces behind the comma; we'd have to write cp39-manylinux_x86_64,3.9,/opt/python/cp39-cp39)~~

Nevermind, this works:

list(csv.DictReader(open("test.csv", 'r'), fieldnames=["identifier", "version", "path_str"], skipinitialspace=True))

henryiii · 2021-01-04T17:52:10Z

Actually, maybe toml does support inline dicts, I just noticed this: https://python-poetry.org/docs/dependency-specification/#expanded-dependency-specification-syntax

YannickJadoul · 2021-01-04T18:23:22Z

Seems this also works with pytoml yes:

[x86_64]
cp38 = {identifier = "cp38-manylinux_x86_64", version = "3.8", path_str = "/opt/python/cp38-cp38"}
cp39 = {identifier = "cp39-manylinux_x86_64", version = "3.9", path_str = "/opt/python/cp39-cp39"}

Just a pity of those cp38 and cp39 keys (since toml doesn't have arrays).

henryiii · 2021-01-04T18:50:35Z

Yes it does. Just not at top level.

henryiii · 2021-01-04T18:53:23Z

Try:

[linux]
vals = [
    {identifier = "cp38-manylinux_x86_64", version = "3.8", path_str = "/opt/python/cp38-cp38"},
    {identifier = "cp39-manylinux_x86_64", version = "3.9", path_str = "/opt/python/cp39-cp39"},
]

I can check later.

YannickJadoul · 2021-01-04T19:16:07Z

Yes it does. Just not at top level.

Once again, you're right. I hadn't tired that!

Well, I'd be up for TOML, then, actually, if @joerick doesn't mind the extra dependency. If not, I do kind of like the simplicity of the CSV (or similar) version.

henryiii · 2021-01-04T20:18:06Z

I like JSON, CSV, and this form of TOML pretty equally, with a minor preference on the JSON still.

Currently working on a cookie cutter and the specs for it all all JSON. It's not unusable as long as you don't want to add comments to whatever you are doing. :)

I'd probably do something like:

[tool.cibw.build-platforms]
linux = [
    {...},

That way, you could imagine them being specifiable in a single file if someone wanted to.

henryiii · 2021-01-04T22:17:46Z

CSV wouldn't work, because they are variable length. That leaves JSON and TOML. The TOML I have right now is not bad at all, IMO; I dropped the workaround. Though JSON would also work.

cibuildwheel/resources/windows.toml

YannickJadoul · 2021-01-04T23:02:35Z

CSV wouldn't work, because they are variable length.

It will if you have multiple files per OS, as @joerick said he preferred anyway? The url on Windows can just stay empty, no?

But fair enough, all my concerns about readability and getting an overview are gone, indeed, in the current form! So that just leaves the dependency concern.

joerick · 2021-01-04T23:46:19Z

This TOML is looking very nice, now :) I think I might be swayed over. I also really like the way it slots into the namedtuple using the PythonConfiguration(**items) syntax. Does this way of writing it survive a round-trip through toml.load(), toml.dump()?

If those headers are required, (I know this is me changing my mind), perhaps we should put them all in one file? Then the headers actually mean something.

henryiii · 2021-01-05T05:40:46Z

Does this way of writing it survive a round-trip

Excellent question. With some customization and a few extra spaces, it can be done. Will commit an example tomorrow.

henryiii · 2021-01-05T22:12:32Z

One benefit of json + one file here, by the way, it would make it easy to write a parser for it in javascript that would allow a user to type into a box and see the results live in the docs (never tried that, but should be possible, I think?). TOML or any other format would be harder to parse in JS (though you could always just read it with toml and stick a simple list JSON into the docs during generation, so really, not a big issue).

Going to push my toml version soon.

unit_test/test_build_ids.py

henryiii · 2021-01-05T22:40:39Z

I've put an example roundtrip (the linux one and a test for now). What do you think? Should I go with this, in a single file?

YannickJadoul · 2021-01-05T23:20:24Z

Now that we have this nice formatting, I'm up for a single file. I think it could enhance updates, when now, you need to pass 3 files. Though one file is not a must-have for me, either.

henryiii · 2021-01-06T02:44:13Z

Touched up types a bit in my last comment, adding exhaustiveness checking on the PlatStr literal, and worked around the fact that toml doesn't have py.typed or .pyi files in the package, even though it does have them in the original source. (uiri/toml#347)

joerick

Think we're nearly there with this one! Just a few minor comments on naming etc.

cibuildwheel/__main__.py

cibuildwheel/resources/build-platforms.toml

joerick · 2021-01-09T09:56:18Z

cibuildwheel/util.py

+    from importlib.resources import files
+
+
+resources_dir = files('cibuildwheel') / 'resources'


Out of curiosity, what does importlib.resources.files do that Path(__file__).parent doesn't? It can't be .egg compatibility, can it? :P

It supports directories that are zip files (see https://github.com/scikit-hep/particle/releases/tag/v0.14.0 - that .pyz file runs directly without installing, python3 particle.pyz just works), and namespace packages that are split across directories.

Though, however, I'm using it here because it's more elegant and the "right' thing to do. We are accessing a resource.

PS: There's a few tricks when running from a zip file, you can open the path or access the filename, but cannot use the file name to open the file without a little workaround which I didn't attempt to implement.

cibuildwheel/__main__.py

henryiii force-pushed the refactor/identifiers branch from 76cc801 to f3e1680 Compare January 2, 2021 16:48

henryiii force-pushed the refactor/identifiers branch from f3e1680 to ba1d73c Compare January 2, 2021 16:53

henryiii force-pushed the refactor/identifiers branch 2 times, most recently from c8e697d to 828aa8a Compare January 2, 2021 17:08

henryiii commented Jan 2, 2021

View reviewed changes

cibuildwheel/util.py Outdated Show resolved Hide resolved

henryiii force-pushed the refactor/identifiers branch from 828aa8a to bf2a023 Compare January 2, 2021 17:14

henryiii force-pushed the refactor/identifiers branch 2 times, most recently from 3152675 to f2d4de2 Compare January 4, 2021 22:11

henryiii commented Jan 4, 2021

View reviewed changes

cibuildwheel/resources/windows.toml Outdated Show resolved Hide resolved

YannickJadoul reviewed Jan 5, 2021

View reviewed changes

unit_test/test_build_ids.py Outdated Show resolved Hide resolved

henryiii force-pushed the refactor/identifiers branch from a59ac64 to c69640d Compare January 6, 2021 00:39

henryiii mentioned this pull request Jan 6, 2021

style: isort #519

Merged

henryiii force-pushed the refactor/identifiers branch 2 times, most recently from e6fa915 to 42621aa Compare January 6, 2021 01:08

henryiii added 2 commits January 6, 2021 18:16

refactor: identifiers in a file

97ddab5

chore: more typing

b1f1f23

henryiii force-pushed the refactor/identifiers branch from 13e5dfa to b1f1f23 Compare January 6, 2021 23:19

joerick reviewed Jan 8, 2021

View reviewed changes

henryiii added 6 commits January 8, 2021 21:54

refactor: PlatStr -> PlatformName

9bb8c95

refactor: load in file, use importlib.resources

b54f665

refactor: change toml structure

9090072

refactor: use common Traversable

67f59dd

fix: ensure reasonable version of importlib_resources

1d91408

style: fix extra line

4674245

joerick approved these changes Jan 9, 2021

View reviewed changes

henryiii merged commit 5eb5ea1 into pypa:master Jan 9, 2021

henryiii deleted the refactor/identifiers branch January 9, 2021 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: identifiers in a file #511

refactor: identifiers in a file #511

henryiii commented Jan 2, 2021 •

edited

Loading

YannickJadoul commented Jan 2, 2021

henryiii commented Jan 2, 2021

YannickJadoul commented Jan 2, 2021

henryiii commented Jan 2, 2021

henryiii commented Jan 2, 2021 •

edited

Loading

YannickJadoul commented Jan 2, 2021

henryiii commented Jan 2, 2021 •

edited

Loading

YannickJadoul commented Jan 2, 2021

YannickJadoul commented Jan 2, 2021 •

edited

Loading

henryiii commented Jan 2, 2021 •

edited

Loading

joerick commented Jan 4, 2021

henryiii commented Jan 4, 2021 •

edited

Loading

YannickJadoul commented Jan 4, 2021 •

edited

Loading

henryiii commented Jan 4, 2021

YannickJadoul commented Jan 4, 2021

YannickJadoul commented Jan 4, 2021 •

edited

Loading

henryiii commented Jan 4, 2021

YannickJadoul commented Jan 4, 2021 •

edited

Loading

henryiii commented Jan 4, 2021 •

edited

Loading

henryiii commented Jan 4, 2021

YannickJadoul commented Jan 4, 2021 •

edited

Loading

henryiii commented Jan 4, 2021 •

edited

Loading

henryiii commented Jan 4, 2021 •

edited

Loading

YannickJadoul commented Jan 4, 2021 •

edited

Loading

joerick commented Jan 4, 2021

henryiii commented Jan 5, 2021

henryiii commented Jan 5, 2021

henryiii commented Jan 5, 2021

YannickJadoul commented Jan 5, 2021

henryiii commented Jan 6, 2021 •

edited

Loading

joerick left a comment

joerick Jan 9, 2021

henryiii Jan 9, 2021 •

edited

Loading

		from importlib.resources import files


		resources_dir = files('cibuildwheel') / 'resources'

refactor: identifiers in a file #511

refactor: identifiers in a file #511

Conversation

henryiii commented Jan 2, 2021 • edited Loading

YannickJadoul commented Jan 2, 2021

henryiii commented Jan 2, 2021

YannickJadoul commented Jan 2, 2021

henryiii commented Jan 2, 2021

henryiii commented Jan 2, 2021 • edited Loading

YannickJadoul commented Jan 2, 2021

henryiii commented Jan 2, 2021 • edited Loading

YannickJadoul commented Jan 2, 2021

YannickJadoul commented Jan 2, 2021 • edited Loading

henryiii commented Jan 2, 2021 • edited Loading

joerick commented Jan 4, 2021

henryiii commented Jan 4, 2021 • edited Loading

YannickJadoul commented Jan 4, 2021 • edited Loading

henryiii commented Jan 4, 2021

YannickJadoul commented Jan 4, 2021

YannickJadoul commented Jan 4, 2021 • edited Loading

henryiii commented Jan 4, 2021

YannickJadoul commented Jan 4, 2021 • edited Loading

henryiii commented Jan 4, 2021 • edited Loading

henryiii commented Jan 4, 2021

YannickJadoul commented Jan 4, 2021 • edited Loading

henryiii commented Jan 4, 2021 • edited Loading

henryiii commented Jan 4, 2021 • edited Loading

YannickJadoul commented Jan 4, 2021 • edited Loading

joerick commented Jan 4, 2021

henryiii commented Jan 5, 2021

henryiii commented Jan 5, 2021

henryiii commented Jan 5, 2021

YannickJadoul commented Jan 5, 2021

henryiii commented Jan 6, 2021 • edited Loading

joerick left a comment

Choose a reason for hiding this comment

joerick Jan 9, 2021

Choose a reason for hiding this comment

henryiii Jan 9, 2021 • edited Loading

Choose a reason for hiding this comment

henryiii commented Jan 2, 2021 •

edited

Loading

henryiii commented Jan 2, 2021 •

edited

Loading

henryiii commented Jan 2, 2021 •

edited

Loading

YannickJadoul commented Jan 2, 2021 •

edited

Loading

henryiii commented Jan 2, 2021 •

edited

Loading

henryiii commented Jan 4, 2021 •

edited

Loading

YannickJadoul commented Jan 4, 2021 •

edited

Loading

YannickJadoul commented Jan 4, 2021 •

edited

Loading

YannickJadoul commented Jan 4, 2021 •

edited

Loading

henryiii commented Jan 4, 2021 •

edited

Loading

YannickJadoul commented Jan 4, 2021 •

edited

Loading

henryiii commented Jan 4, 2021 •

edited

Loading

henryiii commented Jan 4, 2021 •

edited

Loading

YannickJadoul commented Jan 4, 2021 •

edited

Loading

henryiii commented Jan 6, 2021 •

edited

Loading

henryiii Jan 9, 2021 •

edited

Loading