Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to separate concerns and facilitate new behavior #110

Merged
merged 5 commits into from
May 24, 2024

Conversation

jaraco
Copy link
Contributor

@jaraco jaraco commented May 24, 2024

  • Extract freeze_data and load_pip_packages to separate concerns.
  • Extract load_pip_package function.
  • Use degenerate form for fallback.
  • Extract package_item behavior.
  • Extract is_valid filter on pip package entries with no URL.

Copy link
Member

@hoodmane hoodmane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jaraco! Looks good to me.

micropip/_commands/freeze.py Outdated Show resolved Hide resolved
micropip/_commands/freeze.py Show resolved Hide resolved
Comment on lines 43 to 46
return map(
package_item,
filter(None, map(load_pip_package, importlib.metadata.distributions())),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how you feel about the readability of map and filter in Python where we don't have any way to do right function application. What I'd really like to see is

importlib.metadata.distributions().map(load_pip_package).filter(None).map(package_item)

so that the logic flows from left to right and top to bottom instead of in a spiral. I tend to use comprehensions instead but they are arguably just as bad if not worse. Combining filter and map does make sense but list comprehensions do it in a strange order. I do like Rust's filter_map quite a lot.

Anyways, code is clear I'm happy to do it this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm a big fan of the general concept of filter and map (and reduce). Part of my affinity comes from my first formal language training being in Scheme. But my reason for sticking with it is how it incentivizes stateless functional programming. Comprehensions and generator expressions achieve the same thing, but often by adding unnecessary variables. If I'm working with a comprehension and it has the form [do(x) for x in input if check(x)], I will almost certainly translate that to map(do, filter(check, input)) and avoid creating the name x. I will sometimes separate the two lines into filtered = filter(check, input); map(do, filtered) if the name filtered improves readability (which it could in the case above).

I do avoid map/filter when their use feels contrived or convoluted for the application.

What I'd really like to see is

importlib.metadata.distributions().map(load_pip_package).filter(None).map(package_item)

I agree - that reads better. And other languages (Node, Ruby, ...) design that paradigm into their language. Python, on the other hand, has historically discouraged these chains of operations. So my pragmatic approach is to accept that the language constructs are somewhat inelegant, but still preferable to the alternative of eager-evaluated for/append loops.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a lot of my early programming experience was also with lisps. I particularly enjoy programming with only immutable data structures. But it's not always the best for performance.

I'm really excited about the iterator helpers spec in JavaScript which will make doing all of this lazily very ergonomic too:
https://github.com/tc39/proposal-iterator-helpers

Comment on lines +49 to +50
def package_item(entry: dict[str, Any]) -> tuple[str, dict[str, Any]]:
return canonicalize_name(entry["name"]), entry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, often the need to write a function like this instead of a lambda dissuades me from using map. The combination of spiral control flow with in line lambdas tends to be a bit unmanageable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I wouldn't object to inlining this behavior in a generator expression in load_pip_packages.

@@ -42,23 +42,23 @@ def freeze_data() -> dict[str, Any]:
def load_pip_packages() -> Iterator[tuple[str, dict[str, Any]]]:
return map(
package_item,
filter(None, map(load_pip_package, importlib.metadata.distributions())),
filter(is_valid, map(load_pip_package, importlib.metadata.distributions())),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is filter(None just a no-op? It's there to make the diff look better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter(None, ...) is the same as filter(bool, ...). It was there to suppress the distributions that returned None. I included that to maintain the prior expectation that if the url is None, no package entry is generated (the continue in the prior implementation).

I've been using that form since Python 1.5, before bool was introduced in Python 2.3. It occurs to me I should start using filter(bool, ...) for better readability.

micropip/_commands/freeze.py Outdated Show resolved Hide resolved


def is_valid(entry: dict[str, Any]) -> bool:
return entry["url"] is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return entry["url"] is not None
return entry.get("url", None) is not None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaraco if we do this change we can merge it. I'll rebase and apply this and then do a rebase merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would still be interested for answers to my questions about your view on the ergonomics and readability of filter, map and other functional constructs in Python.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, whoops. This was a mistake. The proper key should be "file_name". Specifically,

return entry["file_name"] is not None

Since load_pip_package sets file_name unconditionally, calling .get() is unnecessary; the code can rely on that key being present, but it may be set to None.

With the patch as suggested, entry.get("url", None) will always be None and no entry will ever be valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and it appears you recognized that in the rewritten commit. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thankfully we have test coverage for this =)

@hoodmane hoodmane merged commit 7ad2963 into pyodide:main May 24, 2024
5 checks passed
@hoodmane
Copy link
Member

Thanks @jaraco!

@jaraco jaraco deleted the refactor/freeze branch May 26, 2024 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants