Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a common class to store JSON / Simple API response data #71

Merged
merged 27 commits into from
Jul 14, 2023

Conversation

ryanking13
Copy link
Member

@ryanking13 ryanking13 commented Jul 5, 2023

This adds an intermediate class ProjectInfo (I can't think of a better name... open to other names) that parses both JSON and Simple API PyPI responses. micropip now takes this ProjectInfo class instead of a raw PyPI JSON response when searching for packages

This is a preparation to make micropip support both JSON and Simple API.

Related: #62

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ryanking13 that would be very useful indeed!

A couple of comments. Ping me when it's ready for a final review.

micropip/package_index.py Outdated Show resolved Hide resolved
micropip/package_index.py Outdated Show resolved Hide resolved
micropip/package_index.py Outdated Show resolved Hide resolved
micropip/package_index.py Outdated Show resolved Hide resolved
tests/test_package_index.py Outdated Show resolved Hide resolved
tests/test_package_index.py Show resolved Hide resolved
micropip/package_index.py Show resolved Hide resolved
micropip/_utils.py Outdated Show resolved Hide resolved
@ryanking13 ryanking13 changed the title Draft: Add a common class to store JSON / Simple API response data Add a common class to store JSON / Simple API response data Jul 8, 2023
@ryanking13
Copy link
Member Author

@rth Okay, I think this is ready to be reviewed.

I cached sys_tags and parse_wheel_filename since those functions were bottlenecks when I checked.

In terms of performance, from_simple_api is slower than from_json_api. This is because simple API does not have a version key in the response so we need to parse the wheel filename for every file. I added a few checks to optimize the speed but it is still around 2 times slower.

For example, this is a result when I tested with numpy, which has ~2000 files in the index:

from micropip.package_index import ProjectInfo
from pathlib import Path
from gzip import decompress
import json
import timeit

numpy_json = json.loads(decompress(Path("tests/test_data/pypi_response/numpy_json.json.gz").read_bytes()))
numpy_simple = json.loads(decompress(Path("tests/test_data/pypi_response/numpy_simple.json.gz").read_bytes()))
num_runs = 100

def run_json():
    ProjectInfo.from_json_api(numpy_json)

def run_simple():
    ProjectInfo.from_simple_api(numpy_simple)

print("Total number of files:", len(numpy_simple["files"]))
print(f"Parsing JSON API for {num_runs} times:", timeit.timeit(run_json, number=num_runs), "seconds")
print(f"Parsing Simple API for {num_runs} times:", timeit.timeit(run_simple, number=num_runs), "seconds")
Total number of files: 2185
Parsing JSON API for 100 times: 0.11372180000762455 seconds
Parsing Simple API for 100 times: 0.243828200007556 seconds

@ryanking13 ryanking13 marked this pull request as ready for review July 8, 2023 12:11
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring @ryanking13 ! A couple of minor comments, otherwise LGTM.

if abi_incompatible:
abis_string = ",".join({tag.abi for tag in tags})
raise ValueError(
f"Wheel abi '{abis_string}' is not supported. Supported abis are 'abi3' and 'cp{version}'."
Copy link
Member

@rth rth Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though we were not supporting abi3? I think there was a discussion either in the cibuildwheel or some other wheel related MR that abi3 didn't make sense for us.

If this was copied from existing code fine but maybe we should open a follow up issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are planning to break ABI with every Python version, then abi3 is a bit pointless. I think we can support it and just check that the interpreter version matches exactly. So it's supposed to be that cp311-abi3 works with all Pythons <= 311 but instead we check that the interpreter is exactly cp311. For tools like cibuildwheel that generate wheels, there's no reason to support abi3. But if someone uses a tool like maturin to generate an abi3 wheel we should be able to consume it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have some packages that has abi3 abi tags in Pyodide distribution.

image

micropip/_utils.py Show resolved Hide resolved
micropip/package_index.py Outdated Show resolved Hide resolved
micropip/package_index.py Outdated Show resolved Hide resolved
micropip/package_index.py Show resolved Hide resolved
@rth
Copy link
Member

rth commented Jul 11, 2023

I cached sys_tags and parse_wheel_filename since those functions were bottlenecks when I checked

Thanks it's good to know! In what remains it looks like Version.__str__ and Version.__hash__ (e.g. when using as a dict key) is also non negligible (run with pyinstrument),

Screenshot 2023-07-11 at 15 34 12

So if we want to go further one day, maybe it could be micro-optimized a bit more.. Though individually it's already quite fast,

In [1]: from packaging.version import Version

In [2]: v = Version("1.2.0")

In [3]: %timeit hash(v)
423 ns ± 2.47 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit str(v)
650 ns ± 2.04 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [5]: %timeit str("1.2.0")
12.2 ns ± 0.0178 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

try:
version = Version(version_str)
if str(version) != version_str:
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep this? As long as there is a version it's fine no? And str(version) has a non negligible runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think it is okay to remove it now.

That check was introduced at pyodide/pyodide#2752, but I think it cannot happen on wheel files (unless someone modifies the version manually).

)

releases_compatible = {
version: _compatible_wheels(files, version)
Copy link
Member

@rth rth Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to go faster (no need to do it, just an idea). We can probably also subclass Version to something that has a frozen __str__ and __hash__ so it doesn't need to be re-computed at each dict comparison..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue (#73) so that we can do it as a follow-up. I also think it is would be a good issue for new contributors if there is other sprint / tutorial events.

@ryanking13
Copy link
Member Author

Thanks it's good to know! In what remains it looks like Version.str and Version.hash (e.g. when using as a dict key) is also non negligible (run with pyinstrument),

Right, converting string <--> Version takes quite a time... I'm not sure how many people will have performance issues with this (especially given that network latency is a much larger part of it), but it's definitely something we can optimize for. By the way, pyinstrument is cool. Thanks for introducing an awesome tool :)

@ryanking13
Copy link
Member Author

Thanks for the reviews!

@ryanking13 ryanking13 merged commit 930819f into pyodide:main Jul 14, 2023
5 checks passed
@ryanking13 ryanking13 deleted the fetch-response-class branch July 14, 2023 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants