Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tomli for reading the lock file #6562

Merged
merged 5 commits into from
Nov 5, 2022

Conversation

dimbleby
Copy link
Contributor

@dimbleby dimbleby commented Sep 19, 2022

Pros and cons in this one, opinions are invited.

tomlkit is significantly slower than tomli / tomli-w. And also, at least in a project that uses type-checking, it's a pain in the neck.

Even for a modestly sized project like poetry itself, reading poetry.lock on my laptop takes 0.6 - 0.7 seconds (timings obtained with some time.time() statements inserted into the codebase). tomli manages it in ~0.02 seconds.

On larger projects the difference is more significant, in my experimentation that ratio of of 30-ish remains consistent.

Admittedly this is only likely to save folk a small number of seconds on any given command. But fast tools are nicer to use than slow tools.

Against this: we already have tomlkit in the dependency tree, adding tomli and tomli-w looks a bit weird.

The difference is that pyproject.toml is human-readable and it's highly desirable to maintain formatting and comments etc in interactions with it (poetry add, poetry remove). Much as I'd like to use tomli and tomli-w everywhere: they just don't offer that feature.

poetry.lock however is machine-generated and machine-read and there is no reason to pay the tomlkit performance penalty in reading and writing it.

(I've also removed the code that re-reads the lockfile after writing it, "checking lock file data consistency". Not clear to me what that was for: do you trust your toml writer or not? That same code could be removed independent of the switch away from tomlkit)

@dimbleby dimbleby force-pushed the tomli-tomliw branch 2 times, most recently from 7001419 to d20cd42 Compare September 19, 2022 12:27
@neersighted
Copy link
Member

neersighted commented Sep 19, 2022

Another argument against introducing tomli is we'll need to vendor it into poetry-core when the lock file reader is added (for locked dist/locked build support).

I think the reason for reading the lock file back is because we may write escape characters or similar in strings (since we are accepting untrusted inputs from PyPI/other package indexes + setuptools and other build-systems) that make the file invalid (even if they are valid Python strings). tomli-w encourages something similar (namely, it makes no promises that it writes valid toml and suggests reading back to validate toml), but to be fair, I'm not sure it's 100% necessary.

@dimbleby
Copy link
Contributor Author

dimbleby commented Sep 19, 2022

poetry-core doesn't need to rewrite pyproject.toml, so the other way to go would be to remove tomlkit there altogether

Edit: quite apart from being a smaller dependency, tomli is making its way to the standard library (as tomllib, PEP680). So if poetry-core cares about reducing its vendored dependencies, this is absolutely the direction to head in

@dimbleby
Copy link
Contributor Author

dimbleby commented Sep 19, 2022

per CI the way I've done this is also breaking for plugins, apparently the export plugin at least was relying on locker.lock.path which is now just locker.path.

Edit: breaking for the export plugin unittest rather than for the plugin itself, as it turns out.

@dimbleby
Copy link
Contributor Author

dimbleby commented Sep 19, 2022

comment about poetry-core and vendoring got me thinking. The full programme for tomlkit-minimization - if that is indeed a desirable path, which perhaps is what this MR is really asking - would go something like this:

  • copy poetry-core's TOMLFile into poetry
  • poetry only needs to use TOMLFile (hence tomlkit) for interactions with pyproject.toml
    • probably that doesn't matter for anything except poetry.lock, other files are small enough that I guess the performance difference is likely tiny
  • have poetry-core switch to reading pyproject.toml (and poetry.lock, if that day comes) via tomli
  • deprecate / remove the poetry-core version of the TOMLFile
  • (at this point poetry-core is vendoring tomli but not tomlkit)
  • eventually, tomllib is in the python standard library and support for python <3.11 is dropped. Then poetry-core doesn't vendor anything toml related.

@dimbleby dimbleby force-pushed the tomli-tomliw branch 2 times, most recently from f2c3518 to 5ecf4ac Compare September 19, 2022 14:16
@mkniewallner
Copy link
Member

I may not have enough knowledge to be 100% sure of that, but I am also under the assumption that poetry-core is never doing any write operations on TOML files.

So if we confirm that, I'm +1 on the plan highlighted by @dimbleby. python-poetry/poetry-core#445 would probably be good to have as well (issue it fixes is highlighted here, for the context).

@dimbleby
Copy link
Contributor Author

dimbleby commented Oct 1, 2022

I see that this one has been added to a 1.4 milestone. A couple of thoughts:

  • I don't know whether that's code for "some far distant future" or (say) "November". It would be lovely if the meaning of such labels was shared somewhere

  • In case it does mean "some far distant future", I'd note that I don't think there's anything much to be scared of in this one

@neersighted
Copy link
Member

1.4 is next next release. The reason for this is mostly because of how invasive this is and the need to make a hard breaking change in core. I'd rather make changes in core more gradually in order to avoid branching it as much as possible.

I could be convinced otherwise, but personally I'd rather try and get the new installer + pip --python landed and then worry about this after we've got those complete.

@dimbleby
Copy link
Contributor Author

dimbleby commented Oct 1, 2022

how invasive this is and the need to make a hard breaking change in core

I'm glad I spoke then, because I think we're at cross purposes. This one does not make or require a change in core, but stands alone. It's the follow-up python-poetry/poetry-core#483 and #6616 that change poetry-core

@neersighted neersighted modified the milestones: 1.4, 1.3 Oct 1, 2022
@neersighted
Copy link
Member

Ah, fair point. I was under the impression that you intended #6616 to be a replacement/successor to this PR and commented on the wrong one. Am I correct in that it's a PR of a branch rebased on top of this, and you plan to keep it rebased so this can land first?

In that case I don't have major objections -- I'll try to complete a review of this PR sooner than later.

@dimbleby
Copy link
Contributor Author

dimbleby commented Oct 1, 2022

re "1.4 is next next release". Of course I understand that 1.4 is two more than 1.2, that wasn't what I was asking! It would be good to understand whether that is more likely to be (say) November, or in some far-distant future (think of the gap betwewen 1.1 and 1.2 and double it).

Eg that would help me to decide whether it's even worth quibbling about how soon this gets merged.

(FWIW I'd like to see poetry turn out a 1.3 pretty darned soon, if only as a demonstration to the world - and to maintainers! - that shorter gaps are the new normal).

Anyway back on topic: yes, the reason I kept #6616 separate (and in two commits) was in anticipation that it might be reasonable to merge this one sooner, and that one later.

@neersighted
Copy link
Member

It's very much a fluid thing, but the intention is to make releases on the order of months. I would invite you to participate in the Discord channel that you have been invited to if you'd like to join said discussions.

Certainly, we're carrying several important fixes in master right now, and it might be very nice to cut a 1.3, maybe even land invasive changes to core (this, extras and sdist normalization) in 1.4, and then scope the installer work for a 1.5. There's plenty of ways to skin the cat and it's mostly deciding what people have the energy for given limited volunteer bandwidth.

@radoering
Copy link
Member

I know @dimbleby is going to kill me for this and the exact format of the lock file is not relevant for most people but what about using tomli for reading but keeping tomlkit for writing? (I tried a bit and it seems straight forward.)

I see at least two advantages:

  • We have more control about the format. E.g. using tomli-w will just revert locker: less verbose output for package.files in lock file 2.0 #6734. (It doesn't look like that in the test but I suppose that's only because the test uses names and hashes that are significantly shorter than in reality. In a real world example, the output of tomli-w is like in the verbose example of the mentioned PR.) Further, tomli-w will write the dependencies of a package more verbose in some cases, e.g. I don't like something like
    [package.dependencies]
    msgpack = "*"
    requests = "*"
    [package.dependencies.lockfile]
    optional = true
    version = ">=0.9"
  • Only tomli will be included in the standard library. tomli-w will remain an additional dependency.

Further, I don't think there is a relevant performance difference for writing the lock file in real world examples. (Running poetry lock on a small example takes 0.4 s on my PC, no matter if I use tomli-w or tomlkit for writing. I suppose in a larger example the time for writing the file can be neglected.)

@dimbleby
Copy link
Contributor Author

Control over the formatting is a double-edged sword, eg it's tomlkit's 'control' that exposes us to such as #6201. Personally I see no reason to care about fine-grained control of how lists are presented: both formats are plenty readable enough.

However I must admit that I feel a strong mistrust of tomlkit and consequently perhaps I'm not as rational as I'd like to be about it...!

As I said at the start of this thread, there are pros and cons, and the additional tomli-w dependency is admittedly among the cons.

I think the case is stronger over in poetry-core, where removing tomlkit in favour of something that is now in the standard library (since python 3.11) seems a clear win. Also poetry-core doesn't need to write (except in unit test) so that in some sense simplifies the discussion.

I think tomlkit is also slow at writing, though I don't have numbers to hand. I agree that it's debatable how important half a second here or there is - though I do think that if you cut such gaps in the right places, it can make a real difference to the feel of a command line tool.

So anyway my preference would be that this stands as it is.

However, I wouldn't want the perfect to be the enemy of the good: if keeping tomlkit for writing gets this over the line then let's do that, and I can try again at stepping away from it some other time...

@dimbleby
Copy link
Contributor Author

(I just want to acknowledge that in ten minutes experimentation I was unable to make tomlkit slow at writing, which modestly strengthens the case for not bothering with tomli-w)

@dimbleby
Copy link
Contributor Author

dimbleby commented Nov 2, 2022

I pushed a commit that removes the tomlkit-using re-read of the lockfile after writing it. That is slow, per top of thread. (Probably it's that which had left me with the apparently mistaken impression that writing the file with tomlkit is slow.)

@radoering
Copy link
Member

Actually, I thought this is about commands that only have to read the lock file (like poetry lock --check, poetry show or poetry install). I would not have thought that re-reading after writing the lock file really matters (in terms of perceptible performance difference) especially after resolving dependencies. However, I've never heard of an Inconsistent lock file data issue so we probably can drop it.

@dimbleby
Copy link
Contributor Author

dimbleby commented Nov 2, 2022

it maybe doesn't make much sense in the context of a solve that has likely taken much longer, but I genuinely find - at least now that I know about it! - that the gap between "Resolving dependencies" completing and "Writing lock file" is noticeable, and that it feels nicer to lose it.

(Kinda weird that we print "Writing lock file" after it has already been written, but whatever).

Anyway I also have never seen this check do anything other than pass.

radoering added a commit to python-poetry/poetry-plugin-export that referenced this pull request Nov 3, 2022
companion to python-poetry/poetry#6562

Co-authored-by: Randy Döring <30527984+radoering@users.noreply.github.com>
@neersighted neersighted changed the title use tomli and tomliw to read and write lockfile Use tomli for reading the lock file Nov 3, 2022
@radoering radoering merged commit b28339d into python-poetry:master Nov 5, 2022
@dimbleby dimbleby deleted the tomli-tomliw branch November 5, 2022 14:20
@Secrus Secrus mentioned this pull request Nov 25, 2022
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/pyproject Metadata/pyproject.toml-related kind/refactor Pulls that refactor, or clean-up code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants