Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string.printable.isprintable() returns False #67206

Closed
planet36 mannequin opened this issue Dec 9, 2014 · 8 comments
Closed

string.printable.isprintable() returns False #67206

planet36 mannequin opened this issue Dec 9, 2014 · 8 comments
Labels
docs Documentation in the Doc dir stdlib Python modules in the Lib dir topic-unicode

Comments

@planet36
Copy link
Mannequin

planet36 mannequin commented Dec 9, 2014

BPO 23017
Nosy @birkenfeld, @vstinner, @ezio-melotti, @stevendaprano, @bitdancer, @4kir4, @iritkatriel
Files
  • bug-string-ascii.py: Test case shows that string.printable has control characters
  • 0001-Fix-string.printable-respect-POSIX-spec.patch
  • docs-string.printable.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2014-12-09.03:52:01.009>
    labels = ['type-bug', '3.9', '3.10', '3.11', 'library', 'expert-unicode', 'docs']
    title = 'string.printable.isprintable() returns False'
    updated_at = <Date 2021-11-29.16:17:13.755>
    user = 'https://bugs.python.org/planet36'

    bugs.python.org fields:

    activity = <Date 2021-11-29.16:17:13.755>
    actor = 'iritkatriel'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation', 'Library (Lib)', 'Unicode']
    creation = <Date 2014-12-09.03:52:01.009>
    creator = 'planet36'
    dependencies = []
    files = ['37391', '37398', '37441']
    hgrepos = []
    issue_num = 23017
    keywords = ['patch']
    message_count = 5.0
    messages = ['232343', '232376', '232382', '232613', '407290']
    nosy_count = 10.0
    nosy_names = ['georg.brandl', 'vstinner', 'ezio.melotti', 'steven.daprano', 'r.david.murray', 'docs@python', 'akira', 'planet36', 'bru', 'iritkatriel']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue23017'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    Linked PRs

    @planet36
    Copy link
    Mannequin Author

    planet36 mannequin commented Dec 9, 2014

    string.printable includes all whitespace characters. However, the only whitespace character that is printable is the space (0x20).

    By definition, the only ASCII characters considered printable are:
    alphanumeric characters
    punctuation characters
    the space character (not all whitespace characters)

    Source:
    http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03

    7.2 POSIX Locale

    Conforming systems shall provide a POSIX locale, also known as the C locale.

    7.3.1 LC_CTYPE

    space
    Define characters to be classified as white-space characters.

    In the POSIX locale, exactly <space>, <form-feed>, <newline>, <carriage-return>, <tab>, and <vertical-tab> shall be included.
    

    cntrl
    Define characters to be classified as control characters.

    In the POSIX locale, no characters in classes alpha or print shall be included.
    

    graph
    Define characters to be classified as printable characters, not including the <space>.

    In the POSIX locale, all characters in classes alpha, digit, and punct shall be included; no characters in class cntrl shall be included.
    

    print
    Define characters to be classified as printable characters, including the <space>.

    In the POSIX locale, all characters in class graph shall be included; no characters in class cntrl shall be included.
    

    LC_CTYPE Category in the POSIX Locale

    # "print" is by default "alnum", "punct", and the <space>

    @planet36 planet36 mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 9, 2014
    @bru
    Copy link
    Mannequin

    bru mannequin commented Dec 9, 2014

    Here is a simple fix for the issue, plus a test.
    It does not break any unit test but this raises a backwards-compatibility problem. Therefore I wouldn't advise using it for Python 3.4 but only 3.5+.

    @bitdancer
    Copy link
    Member

    This is a bit of a conundrum. Our (string module) definition of printable is very clear, and it includes the other whitespace characters.

    We could document that this does not match the posix definition of printable. It also does not match the RFC 5822 definition of printable (for example), which does *not* include whitespace characters (not even space), but the posix definition is a more likely source of confusion.

    isprintable is a newer function than string.printable, and serves a different purpose. I suppose that when PEP-3138 was written and implemented the disconnect between the two definitions was not noticed.

    For backward compatibility reasons I suspect we are stuck with the discrepancy, but perhaps others will think it worth the pain of changing string.printable. I kind of doubt it, though.

    @bitdancer bitdancer added the docs Documentation in the Doc dir label Dec 9, 2014
    @4kir4
    Copy link
    Mannequin

    4kir4 mannequin commented Dec 13, 2014

    C standard defines locale-specific *printing characters* that are [ -~]
    in "C" locale for implementations that use 7-bit US ASCII character set
    i.e., SP (space, 0x20) is a printing character in C (isprint() returns
    nonzero).

    There is isgraph() function that returns zero for the space but
    otherwise is equivalent to isprint().

    POSIX definition is aligned with the ISO C standard.

    I don't know what RFC 5822 has to do with this issue but the rfc
    contradicts itself e.g., in one place it has: "printable US-ASCII
    characters except SP" that imlies that SP *is* printable but in other
    places it considers isprint==isgraph. The authors probably meant
    characters for which isgraph() is nonzero when they use "printable
    US-ASCII" (that is incorrect according to C standard).

    Tests from bpo-9770 show the relation between C character classes and
    string constants [1]:

    set(string.printable) == set(C['graph']) + set(C['space'])

    where C['space'] is '\t\n\v\f\r ' (the standard C whitespace).

    It is a documented behavior [2]:

    This is a combination of digits, ascii_letters, punctuation,
    and whitespace

    where *whitespace* is C['space'].

    In Python 2, *printable* is locale-dependent and it coincides with the
    corresponding Python 3 definition in "C" locale with ASCII charset.

    Unlike other string constants, *printable* differs from C['print'] on
    both Python 2 and 3 because it includes whitespace characters other than
    space.

    str.isprintable [3] obeys C['print'] (in ASCII range) and considers SP
    to be printable.

    ---

    It might be too late to change string.printable to correspond to C
    isprint() (for ASCII characters).

    I've uploaded a documentation patch that mentions that string.printable
    and str.isprintable differ.

    [1] http://bugs.python.org/review/9770/diff/12212/Lib/test/test_curses_ascii.py
    [2] https://hg.python.org/cpython/file/3.4/Doc/library/string.rst#l62
    [3] https://docs.python.org/3.4/library/stdtypes.html#str.isprintable

    @iritkatriel
    Copy link
    Member

    Reproduced on 3.11.

    @iritkatriel iritkatriel added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes labels Nov 29, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @picnixz picnixz removed 3.11 only security fixes 3.10 only security fixes 3.9 only security fixes labels Dec 26, 2024
    @picnixz
    Copy link
    Member

    picnixz commented Dec 26, 2024

    Still reproducible on main.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 9, 2025

    We could document that this does not match the posix definition of printable. It also does not match the RFC 5822 definition of printable (for example), which does not include whitespace characters (not even space), but the posix definition is a more likely source of confusion.
    (...)
    For backward compatibility reasons I suspect we are stuck with the discrepancy, but perhaps others will think it worth the pain of changing string.printable. I kind of doubt it, though.

    I agree that it would be a bad idea to change string.printable value now. It's too late. The best that we can do is to document that string.printable.isprintable() is false but it's deliberate.

    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 15, 2025
    …the POSIX sense (pythonGH-128820)
    
    (cherry picked from commit d906bde)
    
    Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 15, 2025
    …the POSIX sense (pythonGH-128820)
    
    (cherry picked from commit d906bde)
    
    Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 15, 2025
    …the POSIX sense (pythonGH-128820)
    
    (cherry picked from commit d906bde)
    
    Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
    picnixz added a commit that referenced this issue Jan 15, 2025
    … the POSIX sense (GH-128820) (#128868)
    
    gh-67206: Document that `string.printable` is not printable in the POSIX sense (GH-128820)
    (cherry picked from commit d906bde)
    
    Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
    picnixz added a commit that referenced this issue Jan 15, 2025
    … the POSIX sense (GH-128820) (#128867)
    
    gh-67206: Document that `string.printable` is not printable in the POSIX sense (GH-128820)
    (cherry picked from commit d906bde)
    
    Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
    @picnixz
    Copy link
    Member

    picnixz commented Jan 15, 2025

    We documented the behaviour (and backported those notes to 3.12 and 3.13), hence I will close this issue as completed.

    @picnixz picnixz closed this as completed Jan 15, 2025
    @picnixz picnixz removed the type-bug An unexpected behavior, bug, or error label Jan 15, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir stdlib Python modules in the Lib dir topic-unicode
    Projects
    Status: Todo
    Development

    No branches or pull requests

    4 participants