Windows + UTF8 diacritical character output problem #5

clangen · 2016-05-14T05:27:40Z

Hi there! Awesome work on the win32a variant of PDCurses, I'm really enjoying working with it.

However, I seem to have a problem when calling wprintw with UTF8 strings that contain certain diacritical marks. In this particular case, I've found the acute accent ´, aka 0xb4 to cause strange behavior. Specifically, the output is terminated at this character, and the next line is bunched up on the previous line. Sorry if that's a crappy description, here's an example:

Expected output:

A Hard Day´s Night
Abbey Road
Beatles For Sale
...

But here's what it actually looks like:

A Hard DayAbbey Road
Beatles For Sale
...

Is there any known solution or work around for this problem? Besides the obvious "use a regular apostrophe instead?"

Thanks!

The text was updated successfully, but these errors were encountered:

Bill-Gray · 2016-05-14T16:30:57Z

Hmmm... here's a minimal example that does produce that acute accent :

#include <curses.h>

int main( const int argc, const char **argv)
{
initscr();
cbreak( );
noecho( );
clear( );
refresh( );

 printw( "A Hard Day\xc2\xb4s Night\n");
 printw( "Abbey Road\n");

 refresh();
 getch();

 refresh();

 endwin();
 return( 0);

}

You'll notice that the acute accent in the printw() call has been UTF-8

encoded, resulting in it becoming two bytes instead of one :

https://en.wikipedia.org/wiki/UTF-8

I tried just doing it as "A Hard Day\xb4s Night\n" (which isn't a valid

UTF-8 string) and got exactly the behavior you describe. I've not checked
all that closely, but I'd wager that the code marches along through the
string, finds invalid UTF-8, and stops.

If you _do_ have trouble even with a for-real UTF-8 string,  I'd give

the above mini-program a try and see what it does.

-- Bill

On 2016-05-14 01:27, clangen wrote:

Hi there! Awesome work on the win32a variant of PDCurses, I'm really enjoying working
with it.

However, I seem to have a problem when calling |wprintw| with UTF8 strings that contain
certain diacritical marks. In this particular case, I've found the acute accent |´|, aka
|0xb4| to cause strange behavior. Specifically, the output is terminated at this
character, and the next line is bunched up on the previous line. Sorry if that's a
crappy description, here's an example:

Expected output:

|A Hard Day´s Night Abbey Road Beatles For Sale ... |

But here's what it actually looks like:

|A Hard DayAbbey Road Beatles For Sale ... |

Is there any known solution or work around for this problem? Besides the obvious "use a
regular apostrophe instead?"

Thanks!

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/Bill-Gray/PDCurses/issues/5

clangen · 2016-05-15T04:55:04Z

Shoot, you're absolutely right -- I had a problem with my UTF8 decoding and it was missing that leading byte. Argh! Apologies for the waste of time, and thanks for looking into this so promptly!

clangen closed this as completed May 15, 2016

okbob mentioned this issue Jan 6, 2023

infinity cycle in screen initialization, when pdcurses app is used as pager #256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows + UTF8 diacritical character output problem #5

Windows + UTF8 diacritical character output problem #5

clangen commented May 14, 2016

Bill-Gray commented May 14, 2016

clangen commented May 15, 2016 •

edited

Loading

Windows + UTF8 diacritical character output problem #5

Windows + UTF8 diacritical character output problem #5

Comments

clangen commented May 14, 2016

Bill-Gray commented May 14, 2016

clangen commented May 15, 2016 • edited Loading

clangen commented May 15, 2016 •

edited

Loading