Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Itertools recipes improvements #116842

Closed
pochmann3 opened this issue Mar 14, 2024 · 4 comments
Closed

Itertools recipes improvements #116842

pochmann3 opened this issue Mar 14, 2024 · 4 comments
Assignees

Comments

@pochmann3
Copy link
Contributor

pochmann3 commented Mar 14, 2024

roundrobin

>>> ranges = [range(5, 1000), range(4, 3000), range(0), range(3, 2000), range(2, 5000), range(1, 3500)]
>>> collections.Counter(roundrobin(ranges)) == collections.Counter(ranges)

That looks weird. Round-robin with a single iterable? And the ranges aren't iterated? I suspect this was intended:

collections.Counter(roundrobin(*ranges)) == collections.Counter(chain(*ranges))

unique_everseen

These could be lazier by yielding first (before updating seen):

seen.add(element)
yield element

seen.add(k)
yield element

The similar "roughly equivalent" code of cycle and the factor recipe also yield as early as they can.

iter_index

>>> # For example, bytes and str perform subsequence searches

Not really subsequence searches but substring searches. For example, "fbr" is a subsequence of "foobar", but they don't search for that. See Subsequence and Substring at Wikipedia.

@rhettinger

Linked PRs

@rhettinger rhettinger self-assigned this Mar 15, 2024
@rhettinger
Copy link
Contributor

Yes to the roundrobin test update.

No to changing unique_everseen. In general, I update the loop invariant first. This helps me avoid bugs. Also, changing the order doesn't change the total amount of work. It just shifts work from when the first element is emitted to when the iterator stops.

Yes to changing the subsequence terminology in the comment. It is really pedantic but we might as well change it to "continuous subsequence". The word "substring" isn't quite right because it also applies to bytes and the user defined types that implement continuous subsequence searches.

I'll make the edits a bit later when I get home.

Thanks for looking over the code. Nice to know that someone besides me reads it.

@pochmann3
Copy link
Contributor Author

update the loop invariant first

Should the cycle code then likewise do saved.append(element) before yield element?

changing the order doesn't change the total amount of work

Depends. If the user for example only requests the first three (like take(3, unique_everseen(...))), then you add an element to the set unnecessarily.

"continuous subsequence"

"contiguous" seems more popular, 18.400 results vs 4.930.

The word "substring" isn't quite right because it also applies to bytes and the user defined types that implement continuous subsequence searches

The comment only talks about "bytes and str". And bytes are also sometimes called bytestrings, even in the Python docs (and they're definitely strings in general computer science terminology).

Thanks for looking over the code. Nice to know that someone besides me reads it.

I love itertools and have been reading/checking all changes for a long time :-). Btw do you get notified about comments on merged commits? I made one once and don't know whether you saw it (you did do the change a while later).

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Mar 15, 2024
)

(cherry picked from commit 41e844a)

Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
@rhettinger
Copy link
Contributor

Okay. Fixed.

@rhettinger
Copy link
Contributor

@pochmann3 You may interested in the issues I recently filed in the more-itertools docs. Minor implementation tweaks have more of a payoff there because people run that code instead of just reading it.

vstinner pushed a commit to vstinner/cpython that referenced this issue Mar 20, 2024
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants