Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-68320, gh-88302 - Allow for pathlib.Path subclassing #31691

Merged
merged 53 commits into from
Dec 23, 2022

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Mar 5, 2022

Users may wish to define subclasses of pathlib.PurePath and Path to add or modify existing methods. Before this change, attempting to instantiate a subclass raised an exception like:

AttributeError: type object 'PPath' has no attribute '_flavour'

Previously the _flavour attribute was assigned as follows:

PurePath._flavour        = xxx not set!! xxx
PurePosixPath._flavour   = _PosixFlavour()
PureWindowsPath._flavour = _WindowsFlavour()

It's now set as follows:

PurePath._flavour        = os.path
PurePosixPath._flavour   = posixpath
PureWindowsPath._flavour = ntpath

Functionality from _PosixFlavour and _WindowsFlavour is moved into PurePath as underscored-prefixed classmethods. Flavour classes are removed.

A deeper dive into this patch can be read here: https://discuss.python.org/t/make-pathlib-extensible/3428/42

Fixes #68320 #88302

Users may wish to define subclasses of `pathlib.Path` to add or modify
existing methods. Before this change, attempting to instantiate a subclass
raised an exception like:

    AttributeError: type object 'PPath' has no attribute '_flavour'

Previously the `_flavour` attribute was assigned as follows:

    PurePath._flavour        = xxx not set!! xxx
    PurePosixPath._flavour   = _PosixFlavour()
    PureWindowsPath._flavour = _WindowsFlavour()

This commit replaces it with a `_pathmod` attribute, set as follows:

    PurePath._pathmod        = os.path
    PurePosixPath._pathmod   = posixpath
    PureWindowsPath._pathmod = ntpath

Functionality from `_PosixFlavour` and `_WindowsFlavour` is moved into
`PurePath` as underscored-prefixed classmethods. Flavours are removed.
Lib/pathlib.py Show resolved Hide resolved
@brettcannon brettcannon self-requested a review March 7, 2022 22:26
@barneygale
Copy link
Contributor Author

Hey @brettcannon, just bumping this PR in case it dropped off your radar.

@brettcannon
Copy link
Member

@barneygale it hasn't, but my PR review queue is 11 PRs deep, this is in position 7, and I have worry about getting PEP 594 done in time for 3.11b1. Plus I have some stuff I need to get done for work this month which is eating into my paid OSS time.

@barneygale barneygale changed the title bpo-24132, bpo-44136 - Fix pathlib.Path subclassing. gh-68320, gh-88302 - Fix pathlib.Path subclassing. Apr 24, 2022
@brettcannon brettcannon reopened this Apr 27, 2022
@barneygale barneygale closed this Apr 29, 2022
@barneygale barneygale reopened this Apr 29, 2022
Copy link
Member

@brettcannon brettcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the comments are either code touch-ups or asking to make the code more self-describing.

Lib/pathlib.py Outdated Show resolved Hide resolved
Lib/pathlib.py Outdated Show resolved Hide resolved
Lib/pathlib.py Outdated Show resolved Hide resolved
Lib/pathlib.py Outdated Show resolved Hide resolved
Lib/pathlib.py Outdated Show resolved Hide resolved
Lib/pathlib.py Outdated Show resolved Hide resolved
Lib/pathlib.py Outdated Show resolved Hide resolved
Lib/pathlib.py Outdated Show resolved Hide resolved
Lib/pathlib.py Outdated Show resolved Hide resolved
Lib/pathlib.py Show resolved Hide resolved
@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

barneygale and others added 2 commits April 29, 2022 23:35
@brettcannon
Copy link
Member

Thanks for taking the time to see this through, @barneygale !

@barneygale
Copy link
Contributor Author

Thank you so much for your help with this Brett! Hope you have a restful break :)

@zmievsa
Copy link
Contributor

zmievsa commented Dec 24, 2022

@barneygale Congrats! You started so long ago but we finally get the results. It's an amazing effort.

iritkatriel added a commit to iritkatriel/cpython that referenced this pull request Dec 28, 2022
* Correct CVE-2020-10735 documentation (python#100306)

* pythongh-94912: Added marker for non-standard coroutine function detection (python#99247)

This introduces a new decorator `@inspect.markcoroutinefunction`,
which, applied to a sync function, makes it appear async to
`inspect.iscoroutinefunction()`.

* Docs: Don't upload CI artifacts (python#100330)

* pythongh-89727: Fix os.walk RecursionError on deep trees (python#99803)

Use a stack to implement os.walk iteratively instead of recursively to
avoid hitting recursion limits on deeply nested trees.

* pythongh-69929: re docs: Add more specific definition of \w (python#92015)

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>

* pythongh-89051: Add ssl.OP_LEGACY_SERVER_CONNECT (python#93927)

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Christian Heimes <christian@python.org>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Fixes python#89051

* pythongh-88211: Change lower-case and upper-case to match recommendations in imaplib docs (python#99625)

* pythongh-100348: Fix ref cycle in `asyncio._SelectorSocketTransport` with `_read_ready_cb` (python#100349)

* pythongh-99925: Fix inconsistency in `json.dumps()` error messages (pythonGH-99926)

* Clarify that every thread has its own default context in contextvars (python#99246)

* pythongh-99576: Fix cookiejar file that was not truncated for some classes (pythonGH-99616)

Co-authored-by: Łukasz Langa <lukasz@langa.pl>

* pythongh-100188: Reduce misses in BINARY_SUBSCR_(LIST/TUPLE)_INT (python#100189)

Don't specialize if the index is negative.

* pythongh-99991: improve docs on str.encode and bytes.decode (python#100198)

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>

* pythongh-91081: Add note on WeakKeyDictionary behavior when deleting a replaced entry (python#91499)

Co-authored-by: Pieter Eendebak <P.T.eendebak@tudelft.nl>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>

* pythongh-85267: Improvements to inspect.signature __text_signature__ handling (python#98796)

This makes a couple related changes to inspect.signature's behaviour
when parsing a signature from `__text_signature__`.

First, `inspect.signature` is documented as only raising ValueError or
TypeError. However, in some cases, we could raise RuntimeError.  This PR
changes that, thereby fixing python#83685.

(Note that the new ValueErrors in RewriteSymbolics are caught and then
reraised with a message)

Second, `inspect.signature` could randomly drop parameters that it
didn't understand (corresponding to `return None` in the `p` function).
This is the core issue in python#85267. I think this is very surprising
behaviour and it seems better to fail outright.

Third, adding this new failure broke a couple tests. To fix them (and to
e.g. allow `inspect.signature(select.epoll.register)` as in python#85267), I
add constant folding of a couple binary operations to RewriteSymbolics.

(There's some discussion of making signature expression evaluation
arbitrary powerful in python#68155. I think that's out of scope. The
additional constant folding here is pretty straightforward, useful, and
not much of a slippery slope)

Fourth, while python#85267 is incorrect about the cause of the issue, it turns
out if you had consecutive newlines in __text_signature__, you'd get
`tokenize.TokenError`.

Finally, the `if name is invalid:` code path was dead, since
`parse_name` never returned `invalid`.

* pythonGH-100363: Speed up `asyncio.get_running_loop` (python#100364)

* pythonGH-100133: fix `asyncio` subprocess losing `stderr` and `stdout` output (python#100154)

* pythongh-100374: Fixed a bug in socket.getfqdn() (pythongh-100375)

* pythongh-100129: Add tests for pickling all builtin types and functions (pythonGH-100142)

* Remove unused variable from `dis._find_imports` (python#100396)

* pythongh-78878: Fix crash when creating an instance of `_ctypes.CField` (python#14837)

* pythonGH-69564: Clarify use of octal format of mode argument in help(os.chmod) (python#20621)

Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>

* pythonGH-99554: Pack location tables more effectively (pythonGH-99556)

* Correct typo in typing.py (python#100423)

In the docstring of `ParamSpec`, the name of `P = ParamSpec('P')` was
mistakenly written as `'T'`.

* pythongh-99761: Add `_PyLong_IsPositiveSingleDigit` function to check for single digit integers  (python#100064)

* pythonGH-99770: Make the correct call specialization fail kind show up in the stats (pythonGH-99771)

* pythongh-78997: fix bad rebase of moved test file (python#100424)

* pythongh-100344: Add C implementation for `asyncio.current_task` (python#100345)

Co-authored-by: pranavtbhat

* pythonGH-99554: Trim trailing whitespace (pythonGH-100435)



Automerge-Triggered-By: GH:brandtbucher

* pythongh-85432: Harmonise parameter names between C and pure-Python implementations of `datetime.time.strftime`, `datetime.datetime.fromtimestamp` (python#99993)

* pythongh-57762: fix misleading tkinter.Tk docstring (python#98837)

Mentioned as a desired change by terryjreedy on the corresponding issue,
since Tk is not a subclass of Toplevel.

* pythongh-48496: Added example and link to faq for UnboundLocalError in reference (python#93068)

* Fix typo in 3.12 What's New (python#100449)

* pythongh-76963: PEP3118 itemsize of an empty ctypes array should not be 0 (pythonGH-5576)

The itemsize returned in a memoryview of a ctypes array is now computed from the item type, instead of dividing the total size by the length and assuming that the length is not zero.

* pythonGH-100459: fix copy-paste errors in specialization stats (pythonGH-100460)

* pythongh-99110: Initialize `frame->previous` in init_frame to fix segmentation fault when accessing `frame.f_back` (python#100182)

* pythongh-98712: Clarify "readonly bytes-like object" semantics in C arg-parsing docs (python#98710)

* pythongh-92216: improve performance of `hasattr` for type objects (pythonGH-99979)

* pythongh-100288: Specialise LOAD_ATTR_METHOD for managed dictionaries (pythonGH-100289)

* Revert "pythongh-100288: Specialise LOAD_ATTR_METHOD for managed dictionaries (pythonGH-100289)" (python#100468)

This reverts commit c3c7848.

* pythongh-94155: Reduce hash collisions for code objects (python#100183)

* Uses a better hashing algorithm to get better dispersion and remove commutativity.

* Incorporates `co_firstlineno`, `Py_SIZE(co)`, and bytecode instructions.

* This is now the entire set of criteria used in `code_richcompare`, except for `_PyCode_ConstantKey` (which would incorporate the types of `co_consts` rather than just their values).

* pythongh-83076: 3.8x speed improvement in (Async)Mock instantiation (python#100252)

* pythongh-99482: remove `jython` compatibility parts from stdlib and tests (python#99484)

* bpo-40447: accept all path-like objects in compileall.compile_file (python#19883)

Signed-off-by: Filipe Laíns <lains@archlinux.org>
Signed-off-by: Filipe Laíns <lains@riseup.net>
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>

* pythonGH-100425: Improve accuracy of builtin sum() for float inputs (pythonGH-100426)

* pythongh-68320, pythongh-88302 - Allow for private `pathlib.Path` subclassing (pythonGH-31691)

Users may wish to define subclasses of `pathlib.Path` to add or modify
existing methods. Before this change, attempting to instantiate a subclass
raised an exception like:

    AttributeError: type object 'PPath' has no attribute '_flavour'

Previously the `_flavour` attribute was assigned as follows:

    PurePath._flavour        = xxx not set!! xxx
    PurePosixPath._flavour   = _PosixFlavour()
    PureWindowsPath._flavour = _WindowsFlavour()

This change replaces it with a `_pathmod` attribute, set as follows:

    PurePath._pathmod        = os.path
    PurePosixPath._pathmod   = posixpath
    PureWindowsPath._pathmod = ntpath

Functionality from `_PosixFlavour` and `_WindowsFlavour` is moved into
`PurePath` as underscored-prefixed classmethods. Flavours are removed.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Brett Cannon <brett@python.org>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Eryk Sun <eryksun@gmail.com>

* pythongh-99947: Ensure unreported errors are chained for SystemError during import (pythonGH-99946)

* Add "strict" to dotproduct(). Add docstring. Factor-out common code. (pythonGH-100480)

* pythongh-94808: improve test coverage of number formatting (python#99472)

* pythongh-100454: Start running SSL tests with OpenSSL 3.1.0-beta1 (python#100456)

* pythongh-100268: Add is_integer method to int (python#100439)

This improves the lives of type annotation users of `float` - which type checkers implicitly treat as `int|float` because that is what most code actually wants. Before this change a `.is_integer()` method could not be assumed to exist on things annotated as `: float` due to the method not existing on both types.

* pythongh-77771: Add enterabs example in sched (python#92716)

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>

* pythonGH-91166: Implement zero copy writes for `SelectorSocketTransport` in asyncio (python#31871)

Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>

* pythonGH-91166: Implement zero copy writes for `SelectorSocketTransport` in asyncio (python#31871)

Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>

* Misc Itertools recipe tweaks (pythonGH-100493)

* pythongh-100357: Convert several functions in `bltinsmodule` to AC (python#100358)

* Remove wrong comment about `repr` in `test_unicode` (python#100495)

* pythongh-99908: Tutorial: Modernize the 'data-record class' example (python#100499)

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

* pythongh-100474: Fix handling of dirs named index.html in http.server (pythonGH-100475)



If you had a directory called index.html or index.htm within a directory, it would cause http.server to return a 404 Not Found error instead of the directory listing. This came about due to not checking that the index was a regular file.

I have also added a test case for this situation.

Automerge-Triggered-By: GH:merwok

* pythongh-100287: Fix unittest.mock.seal with AsyncMock (python#100496)

* pythongh-99535: Add test for inheritance of annotations and update documentation (python#99990)

* pythongh-100428: Make float documentation more accurate (python#100437)

Previously, the grammar did not accept `float("10")`.
Also implement mdickinson's suggestion of removing the indirection.

* [Minor PR] Quotes in documentation changed into code blocks (python#99536)

Minor formatting fix in documentation

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>

* pythongh-100472: Fix docs claim that compileall parameters could be bytes (python#100473)

* pythongh-100519: simplification to `eff_request_host` in cookiejar.py (python#99588)

`IPV4_RE` includes a `.`, and the `.find(".") == -1` included here is already testing to make sure there's no dot, so this part of the expression is tautological. Instead use more modern `in` syntax to make it clear what the check is doing here. The simplified implementation more clearly matches the wording in RFC 2965.

Co-authored-by: hauntsaninja <hauntsaninja@gmail.com>

* pythongh-99308: Clarify re docs for byte pattern group names (python#99311)

* pythongh-92446: Improve argparse choices docs; revert bad change to lzma docs (python#94627)

Based on the definition of the collections.abc classes, it is more accurate to use "sequence" instead of "container" when describing argparse choices.

A previous attempt at fixing this in python#92450 was mistaken; this PR reverts that change.

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>

* Fix name of removed `inspect.Signature.from_builtin` method in 3.11.0a2 changelog (python#100525)

* pythongh-100520: Fix `rst` markup in `configparser`  docstrings (python#100524)

* pythongh-99509: Add `__class_getitem__` to `multiprocessing.queues.Queue` (python#99511)

* pythongh-94603: micro optimize list.pop (pythongh-94604)

* Remove `NoneType` redefinition from `clinic.py` (python#100551)

* pythongh-100553: Improve accuracy of sqlite3.Row iter test (python#100555)

* pythonGH-98831: Modernize a ton of simpler instructions (python#100545)

* load_const and load_fast aren't families for now
* Don't decref unmoved names
* Modernize GET_ANEXT
* Modernize GET_AWAITABLE
* Modernize ASYNC_GEN_WRAP
* Modernize YIELD_VALUE
* Modernize POP_EXCEPT (in more than one way)
* Modernize PREP_RERAISE_STAR
* Modernize LOAD_ASSERTION_ERROR
* Modernize LOAD_BUILD_CLASS
* Modernize STORE_NAME
* Modernize LOAD_NAME
* Modernize LOAD_CLASSDEREF
* Modernize LOAD_DEREF
* Modernize STORE_DEREF
* Modernize COPY_FREE_VARS (mark it as done)
* Modernize LIST_TO_TUPLE
* Modernize LIST_EXTEND
* Modernize SET_UPDATE
* Modernize SETUP_ANNOTATIONS
* Modernize DICT_UPDATE
* Modernize DICT_MERGE
* Modernize MAP_ADD
* Modernize IS_OP
* Modernize CONTAINS_OP
* Modernize CHECK_EXC_MATCH
* Modernize IMPORT_NAME
* Modernize IMPORT_STAR
* Modernize IMPORT_FROM
* Modernize JUMP_FORWARD (mark it as done)
* Modernize JUMP_BACKWARD (mark it as done)

Signed-off-by: Filipe Laíns <lains@archlinux.org>
Signed-off-by: Filipe Laíns <lains@riseup.net>
Co-authored-by: Jeremy Paige <ucodery@gmail.com>
Co-authored-by: Carlton Gibson <carlton@noumenal.es>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Jon Burdo <jon@jonburdo.com>
Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Thomas Grainger <tagrain@gmail.com>
Co-authored-by: Brad Wolfe <brad.wolfe@gmail.com>
Co-authored-by: Richard Kojedzinszky <rkojedzinszky@users.noreply.github.com>
Co-authored-by: František Nesveda <fnesveda@users.noreply.github.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
Co-authored-by: Nikita Sobolev <mail@sobolevn.me>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
Co-authored-by: Bisola Olasehinde <horlasehinde@gmail.com>
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Pieter Eendebak <P.T.eendebak@tudelft.nl>
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
Co-authored-by: Dominic Socular <BBH@awsl.rip>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Hai Shi <shihai1992@gmail.com>
Co-authored-by: amaajemyfren <32741226+amaajemyfren@users.noreply.github.com>
Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com>
Co-authored-by: david-why <david_why@outlook.com>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Co-authored-by: penguin_wwy <940375606@qq.com>
Co-authored-by: Eli Schwartz <eschwartz93@gmail.com>
Co-authored-by: Itamar Ostricher <itamarost@gmail.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
Co-authored-by: Bill Fisher <william.w.fisher@gmail.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Ken Jin <kenjin@python.org>
Co-authored-by: Carl Meyer <carl@oddbird.net>
Co-authored-by: Filipe Laíns <lains@riseup.net>
Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
Co-authored-by: Barney Gale <barney.gale@gmail.com>
Co-authored-by: Brett Cannon <brett@python.org>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Eryk Sun <eryksun@gmail.com>
Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
Co-authored-by: Illia Volochii <illia.volochii@gmail.com>
Co-authored-by: JosephSBoyle <48555120+JosephSBoyle@users.noreply.github.com>
Co-authored-by: James Frost <git@frost.cx>
Co-authored-by: MonadChains <monadchains@gmail.com>
Co-authored-by: Bart Broere <mail@bartbroere.eu>
Co-authored-by: Glyph <code@glyph.im>
Co-authored-by: hauntsaninja <hauntsaninja@gmail.com>
Co-authored-by: Ilya Kulakov <kulakov.ilya@gmail.com>
Co-authored-by: Guy Yagev <yourlefthandman8@gmail.com>
Co-authored-by: Jakub Kuczys <me@jacken.men>
@mzipay

This comment was marked as abuse.

@barneygale
Copy link
Contributor Author

Hi mzipay. Your code has a couple of problems:

  • ('extslot') does not define a tuple, and generally you don't need to worry about __slots__ unless performance is a concern
  • Your initialiser assigns to _extslot with an underscore prefix, but the rest of your code doesn't use the prefix
  • Your initialiser doesn't call super().

Here's a fixed example:

>>> from pathlib import Path
>>> 
>>> 
>>> class ExtPath(Path):
...     def __init__(self, *args):
...         self.extslot = 'this will disappear'
...         super().__init__(*args)
... 
>>> 
>>> extpath = ExtPath('Spam/eggs.txt')
>>> extpath.extslot
'this will disappear'
>>> extpath.parent.extslot
'this will disappear'

Hope that helps. If you have any specific criticisms or questions you have about pathlib's design I'll do my best to address them. But bear in mind you'll catch more flies with honey than vinegar ;)

@zmievsa
Copy link
Contributor

zmievsa commented Apr 6, 2023

@mzipay
Copy link

mzipay commented Apr 6, 2023

@barneygale Apologies, (1) I was trying to simplify my sample code and didn't copy/paste it properly. (2) No "vinegar" intended, just trying to be succinct. (In fact I intentionally noted that "I love the idea of pathlib. I use it as often as I possibly can.")

If you have any specific criticisms or questions you have about pathlib's design I'll do my best to address them.

Yes, I can be more specific: pathlib's __new__/__init__ behavior is unexpected and still "breaks" subclassing. My example code very clearly demonstrates that, and I'll include it in its entirety below.
(I don't necessarily think the Path subclassing needs to be fixed. It's be great if it did actually get fixed, but at least a prominent disclaimer of "not intended to be subclassed" at the top of the module's doc page would be appropriate.)

First, though:

generally you don't need to worry about slots unless performance is a concern

In my real-world use case, I am dealing with tens of thousands of Paths, so performance is a concern for me.
But even if it weren't, it doesn't change that the pathlib behavior is unexpected. Any state that is managed in a Path subclass's __new__ or __init__ gets lost, as you will see below.
To be more clear, I'll even remove my use of __slots__ for the time being - the example will still fail.

Now back to the proposed fix (which doesn't work). Actually run this on 3.12.0a6:

from pathlib import Path

class ExtPath(Path):

    def __init__(self, *args):
        super().__init__(*args)
        self.value = 'this will disappear'

if __name__ == '__main__':
    extpath = ExtPath('Spam/eggs.txt')
    assert extpath.value == 'this will disappear'
    assert type(extpath.parent) is ExtPath
    assert extpath.parent.value == 'this will disappear'

You'll get TypeError: object.__init__() takes exactly one argument (the instance to initialize). This is because pathlib's __new__ impls are not consistent w/r/t __init__. (Neither Path nor PurePath even define __init__, so it's object.__init__ that would receive the proposed "fix" - which is why it doesn't work.)

Even if the super call is corrected, it still produces the unexpected result from my initial posting (granted I should've just posted the full example in the first place):

from pathlib import Path

class ExtPath(Path):

    def __init__(self, *args):
        super().__init__()  # object.__init__
        self.value = 'this will disappear'

if __name__ == '__main__':
    extpath = ExtPath('Spam/eggs.txt')
    assert extpath.value == 'this will disappear'
    assert type(extpath.parent) is ExtPath
    assert extpath.parent.value == 'this will disappear'

Running this results in an unexpected AttributeError:

$ python extpath.py 
Traceback (most recent call last):
  File "/tmp/extpath.py", line 13, in <module>
    assert extpath.parent.value == 'this will disappear'
           ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ExtPath' object has no attribute 'value'

Why does this happen?

Take a close look at 3.12.0a6's pathlib.py, lines 302-317. Here you will find the PurePath._from_parts and PurePath._from_parsed_parts methods.

These methods are used, from what I gather inspecting the code, to offer performance improvements by avoiding the need to re-parse the path segments. But they do so by bypassing __new__ and __init__ altogether!

Note the use of object.__new__(cls) in these methods to create the objects. As a result, neither PurePath.__new__ nor Path.__new__ are invoked (same for __init__s, even if they were defined).

This means that any property or method of (a subclass of) Path that returns a new Path (like .parent in my example) will return what seems to be an object of the correct type (note my second assert), but any state that that subclass manages, either in its own __new__ or __init__, is lost; hence the AttributeError.

@mzipay
Copy link

mzipay commented Apr 6, 2023

Seems this just got fixed in another issue (#85281) and merged a few days ago. Built 3.12.0a7, subclassing Path now appears to have expected behavior.

@barneygale
Copy link
Contributor Author

Yep that's been a longstanding pathlib issue; it was difficult to address without regressing performance, and so there was a lot of preparatory work involved first. Glad to hear you have it working now!

barneygale added a commit to barneygale/cpython that referenced this pull request May 2, 2023
…calls to `os.path.normcase()`

Use `re.IGNORECASE` to implement case-insensitive matching. This
restores behaviour from before python#31691.
barneygale added a commit that referenced this pull request May 2, 2023
…to `os.path.normcase()` (GH-104105)

Use `re.IGNORECASE` to implement case-insensitive matching. This
restores behaviour from before GH-31691.
@Qonfused Qonfused mentioned this pull request Jun 28, 2023
82 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Direct sub-classing of pathlib.Path