bpo-45250: fix docs regarding `iter` and iterators being inconsistently required by CPython #29170

brettcannon · 2021-10-22T19:39:49Z

It is now considered a historical accident that e.g. for loops and the iter() built-in function do not require the iterators they work with to define __iter__, only __next__.

https://bugs.python.org/issue45250

Doc/library/collections.abc.rst

Misc/NEWS.d/next/Documentation/2021-10-22-12-09-18.bpo-45250.Iit5-Y.rst

srittau · 2021-10-25T07:35:26Z

I'm strongly opposed to changing the meaning of a term that has had a clear definition for decades (and is implemented as collections.abc.Iterator) by an ambiguous definition. It's clear that some methods accept a super type, instead of requiring a full iterator, but that means the description of those methods should be changed, not the definition.

Addendum: As a typeshed maintainer I see the problem of ill-defined terms and protocols in Python everyday ("file-like objects" are my personal bane). Making existing clear definitions less clear is not the right way forward.

brettcannon · 2021-10-25T21:00:03Z

The issue is the definition as written down is wrong/inaccurate from the perspective of Python itself. So it isn't that this redefines a term as much as actually makes the glossary reflect the real world as to how Python itself uses and considered what iterators are (i.e. this properly reflects what PyIter_Check() actually checks for). For instance, if a pass an iterable to a for loop and it returns something that only defines __next__(), it will work. If you were to ask someone what that object returned by the iterable is, do you think people would argue that it isn't an iterator simply because it doesn't define __iter__(), or that it is an iterator because a for loop accepts it?

And I don't quite understand how the definition is ambiguous? An iterator defines __next__(). You probably should define __iter__(), but it isn't strictly necessary to meet the iterator protocol, it just so happens the commonly used ABC for showing iterator compliance provides an __iter__() definition for free. If the ABC wasn't used for type hints then I don't think anyone would care, but because the class predates protocols then people have accidentally thought __iter__() was a required part of being an iterator (along with the inaccurate glossary definitions).

What's your proposal otherwise? To create a brand new term of what e.g. a for loop expects an iterable to return that isn't an iterator? To my knowledge everyone considers that an iterator, so trying to redefine that is also a major ask and something that I personally think is less helpful than simply getting the definition to be accurate.

Do note that Guido and other folks agree with this plan in https://mail.python.org/archives/list/python-dev@python.org/thread/3W7TDX5KNVQVGT5CUHBK33M7VNTP25DZ/#3W7TDX5KNVQVGT5CUHBK33M7VNTP25DZ, so this isn't entirely without discussion and some agreement. We can ask the SC to make a final call on this if you really want to (I will abstain from voting on it if it comes to that).

srittau · 2021-10-26T11:03:13Z

(I'll use the terms "partial iterator" for objects just defining __next__() and "full iterator" for objects also defining __iter__().)

I'm not looking at this from the perspective how this is implemented in Python, but from a user's perspective. I don't think it's correct to say Python considers an iterator to have only __next__(). That may be true for the CPython core, but it's clearly not the case for the standard library (which I consider to be part of Python), where collections.abc.Iterator and typing.Iterator require the __iter__() method.

Practically, the status quo (some function, methods, statements don't require __iter__()) is a non-issue in practice. But saying "__iter__() is optional (but recommended)" turns this into a practical communication issue. Now, if I have one API that returns an iterator and a second API that accepts an iterator, I can't be sure that they are compatible. I can only hope that either the API returning the iterator follows the recommendation or the API consuming it doesn't access __iter__(). In the future, APIs need to document whether they include __iter__() or not. It also means that all existing documentation that mentions "iterators" needs to be updated to clarify whether they mean partial or full iterators.

I see two solutions to this problem:

Introduce a new term as you suggest (e.g. "partial iterator") and define a link between the "iterator" and "partial iterator" glossary entries. This has the advantage that both concepts are clearly named.
My preferred choice: Keep the definition of "iterator" as is and update the documentation where applicable, either with a note that technically __iter__() is not required, or by replacing the term "iterator" with a sentence like "an object implementing the __next__() protocol, e.g. an iterator".

brettcannon · 2021-10-26T18:22:02Z

collections.abc.Iterator and typing.Iterator require the iter() method.

I think this is where our views are differing. It's an unfortunate side-effect/bug, from my perspective, that because collections.abc.Iterator includes a default implementation for __iter__() that typing considers that method as required (tangent: this is the same reason I want to create protocols for importlib as importlib.abc has classes that are too broad for typing purposes due to the helper methods provided).

In other words I don't view collections.abc.Iterator as the official definition/type of what an iterator is, but instead an ABC that helps you implement an iterator.

Maybe this is suggesting it's time to have a lower-level protocols module or something which gets all of these interfaces correctly based on the language definition, then collections.abc can inherit from that new module? Or perhaps typing.Iterator comes back as a protocol or there's a new typing.SupportsIteration that does the right thing while collections.abc.Iterator inherits from it?

ambv · 2021-10-28T20:16:54Z

The relevant python-dev thread doesn't seem to have reached consensus and died out.

brettcannon · 2021-10-29T18:35:26Z

I was planning to ask the SC to make a call.

godlygeek · 2021-10-30T09:11:03Z

For instance, if a pass an iterable to a for loop and it returns something that only defines __next__(), it will work. If you were to ask someone what that object returned by the iterable is, do you think people would argue that it isn't an iterator simply because it doesn't define __iter__(), or that it is an iterator because a for loop accepts it?

I would absolutely make the argument that it isn't an iterator because it doesn't define __iter__(). It happens to work for a for loop anyway, despite the fact that it doesn't follow the full protocol for an iterator, but that is a perfectly natural thing to happen in a language that uses duck typed protocols: if a class implements only one part of a protocol, it may or may not work in places where that protocol is required, depending on whether or not the parts of the protocol it implements are the only parts needed.

It seems to me that you're proposing a change to the language here, not just to the documentation. As things stand today iter(iter(obj)) has a well defined, consistent meaning, because iterators are required to be iterable. If this proposed change is made, that will no longer be the case, which seems like it could easily break the assumptions of real world code.

For instance, this recipe is in the itertools documentation:

def sliding_window(iterable, n):
    # sliding_window('ABCDEFG', 4) -> ABCD BCDE CDEF DEFG
    it = iter(iterable)
    window = collections.deque(islice(it, n), maxlen=n)
    if len(window) == n:
        yield tuple(window)
    for x in it:
        window.append(x)
        yield tuple(window)

As things are defined today, this function will work for all iterables, because an iterable is defined to return an iterator from __iter__, and because an iterator is defined to itself be an iterable (and therefore valid input to islice). This function would be broken if the iterable's __iter__ returned an object with __next__ but without __iter__, but as the language is defined today, that's an invalid implementation of an iterator, and so the thing that's returning something that isn't quite an iterator is itself not quite an iterable, and so it's invalid input to this function.

If we change the language definition to say that iter() need not return something iterable, then we need a new way to describe the preconditions of this function: it needs not just any iterable as input, but an iterable whose iterators are iterable.

As things stand today, if an iterator has __next__ but not __iter__, it is operating out of contract and therefore not a valid iterator, and libraries like itertools have no obligation to support it. How much code in the wild assumes that iterators are always iterable, and will break if we give people leave to begin making non-iterable iterators? The "roughly equivalent" pure Python implementations of accumulate, dropwhile, islice, and zip_longest all assume that the iterable's iterator is iterable; it's not crazy to think that real world Python code commonly makes the same assumption.

SebastiaanZ · 2021-10-30T15:34:04Z

There are more recipes in the itertools documentation that highlight how common of a pattern it is to rely on iterators having an __iter__ method, such as this one:

def grouper(iterable, n, fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

While I'd agree that repices show usage of the language and do not define it, I can't help but think that those recipes show that solving iteration problems with patterns that rely on iterators being iterable (as-in having an __iter__ method that returns self) are commonplace.

This makes this change a breaking change and unfortunately one that makes code that uses such patterns have a hidden bug, as most iterators will have an __iter__ method (if people heed the encouragement in the docs), but not all of them, since it's no longer a requirement. Existing code may fail, but it's not going to be obvious right away.

PythonCHB · 2021-10-31T06:26:40Z

There are more recipes in the itertools documentation that highlight how common of a pattern it is to rely on iterators having an iter method, such as this one:

def grouper(iterable, n, fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

Hmm -- the name chosen for that parameter is "iterable", not "iterator". So it seems to me that it's expecting that all iterables will return an iterator when passed to iter() -- which indeed is what an iterable is. However, of course, the rest of itertools (and many builtins) also expect an iteralbe, so if you are going to chain them, like in this example, the iterators involved need to also be iterable.

I will say that it took me years to clearly "get" the distinction myself, but I think there are two things here:

An iterator returns an item when passed to next() e.g. has a __next__ method, and raises StopIteration when there are no more items.

An iterable returns an iterator when passed to the iter() function, e.g. has an __iter__ method.

In addition, it is a very common, very useful, and highly recommended convention that all iterators also be iterables, i.e. an __iter__ method that returns itself.

Personally, I think it's both correct and useful to define things this way -- that is "iterator" is about the iteration, and "iterable" is about being able to produce an iterator.

It is very, very, convenient that in most places in Python, one can pass either an iterator or iterable in places where an iterator is required -- this saves us from having to call iter() directly very often at all. But it does mean that most iterators should be iterable.

But they are still distinct protocols, we should make that clear in the docs.

All that being said: an enormous amount of code (including most of itertools) expects iterables rather than iterators, so the docs should make it very clear that an iterator that does not support iter() will be of little general utility.

SebastiaanZ · 2021-10-31T07:55:03Z

There are more recipes in the itertools documentation that highlight how common of a pattern it is to rely on iterators having an iter method, such as this one:
def grouper(iterable, n, fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)
Hmm -- the name chosen for that parameter is "iterable", not "iterator". So it seems to me that it's expecting that all iterables will return an iterator when passed to iter() -- which indeed is what an iterable is. However, of course, the rest of itertools (and many builtins) also expect an iteralbe, so if you are going to chain them, like in this example, the iterators involved need to also be iterable.

I'm sorry, I should have been clearer. Even if you pass an iterable to the grouper function in this recipe, if that iterable's __iter__ method returns an iterator that does not have an __iter__ method that returns self itself, this recipe will fail. This is because this recipe itself will implicitly call iter twice:

    # here you make an iterator from whatever iterable you pass in
    args = [iter(iterable)] * n 

    # that iterator is then passed to `zip_longest`, which will in turn
    # call `iter` on the iterator this function has already created.
    return zip_longest(*args, fillvalue=fillvalue)

Even if we agree that iterators are not always iterable and therefore not suitable for this recipe, if the actual iterable you pass to this recipe has a non-iterable iterator, this recipe will still fail. This gist demonstrates what I mean.

godlygeek · 2021-10-31T08:01:27Z

Hmm -- the name chosen for that parameter is "iterable", not "iterator". So it seems to me that it's expecting that all iterables will return an iterator when passed to iter() -- which indeed is what an iterable is.

Yes, but that's not all it's assuming. It's also assuming that the iterator returned by iter() is itself an iterable, because it passes it to zip_longest, which requires iterables. That's what @SebastiaanZ was pointing out.

As things are defined today, that's a valid assumption, because iterators are required to be iterable. If we change the documentation to say that iterators are not required to be iterable, this function will go from working for all iterables to working for only some iterables (those whose iterators are iterable). And we won't have any word to describe the subset of iterables for which the function works.

But they are still distinct protocols, we should make that clear in the docs.

The docs today clearly specify that all iterators must be iterable. This isn't clarifying the requirements for iterators, it's changing them.

…from the language reference

godlygeek · 2021-11-02T12:31:52Z

It looks like even the interpreter core assumes that all iterators are iterable.

Unpacking like a, b, *rest = iterable is done by unpack_iterable, which calls PyObject_GetIter() on the iterable, and later passes that iterator to PySequence_List, which treats it as an iterable.

So an iterable whose iterators are not iterable can be used in a for loop or consumed by list, but it can't be unpacked:

>>> class Iterator:
...     def __next__(self):
...         if hasattr(self, "consumed"):
...             raise StopIteration
...         self.consumed = True
...         return 42
...
>>> class Iterable:
...     def __iter__(self):
...         return Iterator()
...
>>> list(Iterable())
[42]
>>> first, *rest = Iterable()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Iterator' object is not iterable

Which fits with my mental model that this Iterator is not really an iterator because it implements only half of the iterator protocol, and so it will be broken in contexts that require the whole protocol.

Doc/glossary.rst

Doc/library/stdtypes.rst

brettcannon · 2021-11-15T20:03:45Z

The SC discussed this and the decision was to update the definitions to say it's a CPython implementation detail that it is inconsistent in its checking whether what an iterable returns is a proper iterator or not (i.e. keep the __iter__ requirement but admit it isn't rigorously enforced by CPython as a historical accident).

I have not decided if I'm going to update this PR or close it and open a new one with those changes.

…tors to define `__iter__`

miss-islington · 2021-11-20T00:40:37Z

Thanks @brettcannon for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

miss-islington · 2021-11-20T00:40:39Z

Sorry @brettcannon, I had trouble checking out the 3.10 backport branch.
Please backport using cherry_picker on command line.
cherry_picker be36e0634060c7d5dee8e8876fb888bbb53d992a 3.10

…nconsistently required by CPython (pythonGH-29170) It is now considered a historical accident that e.g. `for` loops and the `iter()` built-in function do not require the iterators they work with to define `__iter__`, only `__next__`. (cherry picked from commit be36e06) Co-authored-by: Brett Cannon <brett@python.org>

bedevere-bot · 2021-11-20T00:59:57Z

GH-29650 is a backport of this pull request to the 3.10 branch.

…nconsistently required by CPython (GH-29170) (GH-29650) It is now considered a historical accident that e.g. `for` loops and the `iter()` built-in function do not require the iterators they work with to define `__iter__`, only `__next__`. (cherry picked from commit be36e06) Co-authored-by: Brett Cannon <brett@python.org>

…tently required by CPython (pythonGH-29170) It is now considered a historical accident that e.g. `for` loops and the `iter()` built-in function do not require the iterators they work with to define `__iter__`, only `__next__`.

brettcannon added 7 commits October 22, 2021 12:02

Update the glossary

867a96e

Add a news entry

ff97128

Update the built-in functions

aaddd9f

Add some glossary links to the data model

2b11e3f

Update stdtypes.rst

5224d4d

Clarify collections.abc docs

7ee081a

Add glossary links to the C API for iterators

ad88009

brettcannon requested a review from rhettinger as a code owner October 22, 2021 19:39

bedevere-bot added the awaiting core review label Oct 22, 2021

the-knights-who-say-ni added the CLA signed label Oct 22, 2021

bedevere-bot added the docs Documentation in the Doc dir label Oct 22, 2021

brettcannon added the needs backport to 3.10 only security fixes label Oct 22, 2021

sobolevn reviewed Oct 22, 2021

View reviewed changes

Doc/library/collections.abc.rst Outdated Show resolved Hide resolved

Misc/NEWS.d/next/Documentation/2021-10-22-12-09-18.bpo-45250.Iit5-Y.rst Outdated Show resolved Hide resolved

Fix issues found via review

7dccf4b

brettcannon self-assigned this Oct 22, 2021

Clarify the docs for collections.abc.AsyncIterator

1dcceb9

brettcannon closed this Oct 28, 2021

brettcannon reopened this Oct 28, 2021

Remove a reference to __iter__() having to be defined on iterators …

15ac1e2

…from the language reference

kieferrm mentioned this pull request Nov 7, 2021

Python: Iteration Plan for November 2021 microsoft/vscode-python#17956

Closed

32 tasks

holdenweb reviewed Nov 13, 2021

View reviewed changes

Doc/glossary.rst Outdated Show resolved Hide resolved

holdenweb reviewed Nov 13, 2021

View reviewed changes

Doc/library/stdtypes.rst Outdated Show resolved Hide resolved

brettcannon added 2 commits November 20, 2021 00:02

Revert some changes

91d86cf

Clarify that CPython is inconsistent when it comes to requiring itera…

e0286fe

…tors to define `__iter__`

brettcannon changed the title ~~bpo-45250: fix docs regarding __iter__ and iterators~~ bpo-45250: fix docs regarding __iter__ and iterators being inconsistently required by CPython Nov 20, 2021

brettcannon merged commit be36e06 into python:main Nov 20, 2021

brettcannon deleted the iterator-definition branch November 20, 2021 00:40

bedevere-bot removed the awaiting core review label Nov 20, 2021

bedevere-bot removed the needs backport to 3.10 only security fixes label Nov 20, 2021

TeamSpen210 mentioned this pull request Jun 11, 2024

Use of __iter__ and __next__ does not confer all Iterator protocol abilities python/mypy#17361

Closed

TeamSpen210 mentioned this pull request Dec 24, 2024

List comprehensions sometimes call iter(iter(v)) instead of just iter(v) #128211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-45250: fix docs regarding `iter` and iterators being inconsistently required by CPython #29170

bpo-45250: fix docs regarding `iter` and iterators being inconsistently required by CPython #29170

brettcannon commented Oct 22, 2021 •

edited

Loading

srittau commented Oct 25, 2021 •

edited

Loading

brettcannon commented Oct 25, 2021

srittau commented Oct 26, 2021

brettcannon commented Oct 26, 2021

ambv commented Oct 28, 2021

brettcannon commented Oct 29, 2021

godlygeek commented Oct 30, 2021 •

edited

Loading

SebastiaanZ commented Oct 30, 2021 •

edited

Loading

PythonCHB commented Oct 31, 2021

SebastiaanZ commented Oct 31, 2021 •

edited

Loading

godlygeek commented Oct 31, 2021

godlygeek commented Nov 2, 2021 •

edited

Loading

brettcannon commented Nov 15, 2021

miss-islington commented Nov 20, 2021

miss-islington commented Nov 20, 2021

bedevere-bot commented Nov 20, 2021

bpo-45250: fix docs regarding __iter__ and iterators being inconsistently required by CPython #29170

bpo-45250: fix docs regarding __iter__ and iterators being inconsistently required by CPython #29170

Conversation

brettcannon commented Oct 22, 2021 • edited Loading

srittau commented Oct 25, 2021 • edited Loading

brettcannon commented Oct 25, 2021

srittau commented Oct 26, 2021

brettcannon commented Oct 26, 2021

ambv commented Oct 28, 2021

brettcannon commented Oct 29, 2021

godlygeek commented Oct 30, 2021 • edited Loading

SebastiaanZ commented Oct 30, 2021 • edited Loading

PythonCHB commented Oct 31, 2021

SebastiaanZ commented Oct 31, 2021 • edited Loading

godlygeek commented Oct 31, 2021

godlygeek commented Nov 2, 2021 • edited Loading

brettcannon commented Nov 15, 2021

miss-islington commented Nov 20, 2021

miss-islington commented Nov 20, 2021

bedevere-bot commented Nov 20, 2021

bpo-45250: fix docs regarding `iter` and iterators being inconsistently required by CPython #29170

bpo-45250: fix docs regarding `iter` and iterators being inconsistently required by CPython #29170

brettcannon commented Oct 22, 2021 •

edited

Loading

srittau commented Oct 25, 2021 •

edited

Loading

godlygeek commented Oct 30, 2021 •

edited

Loading

SebastiaanZ commented Oct 30, 2021 •

edited

Loading

SebastiaanZ commented Oct 31, 2021 •

edited

Loading

godlygeek commented Nov 2, 2021 •

edited

Loading