Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-printable garbage in some fields #9

Closed
un-def opened this issue Dec 18, 2023 · 1 comment
Closed

Non-printable garbage in some fields #9

un-def opened this issue Dec 18, 2023 · 1 comment

Comments

@un-def
Copy link
Owner

un-def commented Dec 18, 2023

For example, this track: https://open.spotify.com/track/3NUKBSs5ZrPlxE3KsY5ySG (“We're There (feat. Chunky)” by Zed Bias).

image

The same two garbage symbols with or without Pango markup escape.

On the surface, it seems like a bug in the blocklet:

  • playerctl displays the title without any garbage:
    $ playerctl -p spotify metadata title
    We're There (feat. Chunky)
    
  • Strangely enough, i3blocks-mpris running in the console also works fine (somehow):
    $ python -m i3blocks_mpris -p spotify -f '{title}' --no-markup-escape
    We're There (feat. Chunky)
    
  • The Spotify web page displays the title correctly:
    image

But if we dig a bit deeper, we'll find the root of the problem:

  • The Spotify web page:

    <title>We&#x27;��re There (feat. Chunky) - song and lyrics by Zed Bias, Chunky | Spotify</title>
    {
      // embedded application/ld+json document; most fields are omitted for brevity
      "@context":"http://schema.googleapis.com/",
      "@type":"MusicRecording",
      "@id":"https://open.spotify.com/track/3NUKBSs5ZrPlxE3KsY5ySG",
      "url":"https://open.spotify.com/track/3NUKBSs5ZrPlxE3KsY5ySG",
      "name":"We'\u0080\u0099re There (feat. Chunky)",
      "datePublished":"2013-11-25"
    }
  • playerctl | hexdump:

    57 65 27 c2 80 c2 99 72 65 20 54 68 65 72 65 20 28 66 65 61 74 2e 20 43 68 75 6e 6b 79 29 0a
    
  • i3blocks-mpris | hexdump:

    $ python -m i3blocks_mpris -p spotify -f '{title}' --no-markup-escape | stdbuf -o0 hexdump -ve '/1 "%02x "'
    57 65 27 c2 80 c2 99 72 65 20 54 68 65 72 65 20 28 66 65 61 74 2e 20 43 68 75 6e 6b 79 29 0a
    

Here is a byte-by-byte comparison of hex representations and chars (I marked “invisible” chars as ×):

57 65 27 c2 80 c2 99 72 65 20 54 68 65 72 65 20 28 66 65 61 74 2e 20 43 68 75 6e 6b 79 29 0a
W  e  '  ×  ×  ×  ×  r  e     T  h  e  r  e     (  f  e  a  t  .     C  h  u  n  k  y  )

There are definitely some non-printable codepoints: �� / \u0080\u0099 / c2 80 c2 99!

Okay, we need to go a bit more deeper.

Let's assume that there was not the regular ASCII U+0027 ' APOSTROPHEbut the fancy pseudo-typographic one, a.k.a. U+2019 ’ RIGHT SINGLE QUOTATION MARK (see https://en.wikipedia.org/wiki/Apostrophe#Unicode), and try to mess with encodings:

rsqm = "’"
unicodedata.name(rsqm)   # → 'RIGHT SINGLE QUOTATION MARK'
binascii.hexlify(rsqm.encode(), ' ').decode()  # → 'e2 80 99'

Bingo! The same 0x80 0x99.

Let's check again: https://www.fileformat.info/info/unicode/char/2019/index.htm (“Unicode Character 'RIGHT SINGLE QUOTATION MARK' (U+2019)”)

Encodings Encoded
UTF-8 (hex) 0xE2 0x80 0x99 (e28099)

A questsion on Stack Overflow on the same topic: https://stackoverflow.com/questions/2477452/%C3%A2%E2%82%AC-showing-on-page-instead-of

BTW,

rsqm = "’"
unicodedata.name(rsqm)   # → 'RIGHT SINGLE QUOTATION MARK'
rsqm.encode('utf-8').decode('windows-1252')  # → '’'

Yep, exactly the same character sequence as in the aforementioned SO question.

@un-def un-def changed the title [WIP] Non-printable garbage in some fields Non-printable garbage in some fields Dec 18, 2023
un-def added a commit that referenced this issue Dec 19, 2023
@un-def
Copy link
Owner Author

un-def commented Dec 19, 2023

Fixed via 77f59c2

image

@un-def un-def closed this as completed Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant