Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows + UTF8 diacritical character output problem #5

Closed
clangen opened this issue May 14, 2016 · 2 comments
Closed

Windows + UTF8 diacritical character output problem #5

clangen opened this issue May 14, 2016 · 2 comments

Comments

@clangen
Copy link
Contributor

clangen commented May 14, 2016

Hi there! Awesome work on the win32a variant of PDCurses, I'm really enjoying working with it.

However, I seem to have a problem when calling wprintw with UTF8 strings that contain certain diacritical marks. In this particular case, I've found the acute accent ´, aka 0xb4 to cause strange behavior. Specifically, the output is terminated at this character, and the next line is bunched up on the previous line. Sorry if that's a crappy description, here's an example:

Expected output:

A Hard Day´s Night
Abbey Road
Beatles For Sale
...

But here's what it actually looks like:

A Hard DayAbbey Road
Beatles For Sale
...

Is there any known solution or work around for this problem? Besides the obvious "use a regular apostrophe instead?"

Thanks!

@Bill-Gray
Copy link
Owner

Hmmm... here's a minimal example that does produce that acute accent :

#include <curses.h>

int main( const int argc, const char **argv)
{
initscr();
cbreak( );
noecho( );
clear( );
refresh( );

 printw( "A Hard Day\xc2\xb4s Night\n");
 printw( "Abbey Road\n");

 refresh();
 getch();

 refresh();

 endwin();
 return( 0);

}

You'll notice that the acute accent in the printw() call has been UTF-8

encoded, resulting in it becoming two bytes instead of one :

https://en.wikipedia.org/wiki/UTF-8

I tried just doing it as "A Hard Day\xb4s Night\n" (which isn't a valid

UTF-8 string) and got exactly the behavior you describe. I've not checked
all that closely, but I'd wager that the code marches along through the
string, finds invalid UTF-8, and stops.

If you _do_ have trouble even with a for-real UTF-8 string,  I'd give

the above mini-program a try and see what it does.

-- Bill

On 2016-05-14 01:27, clangen wrote:

Hi there! Awesome work on the win32a variant of PDCurses, I'm really enjoying working
with it.

However, I seem to have a problem when calling |wprintw| with UTF8 strings that contain
certain diacritical marks. In this particular case, I've found the acute accent |´|, aka
|0xb4| to cause strange behavior. Specifically, the output is terminated at this
character, and the next line is bunched up on the previous line. Sorry if that's a
crappy description, here's an example:

Expected output:

|A Hard Day´s Night Abbey Road Beatles For Sale ... |

But here's what it actually looks like:

|A Hard DayAbbey Road Beatles For Sale ... |

Is there any known solution or work around for this problem? Besides the obvious "use a
regular apostrophe instead?"

Thanks!


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/Bill-Gray/PDCurses/issues/5

@clangen
Copy link
Contributor Author

clangen commented May 15, 2016

Shoot, you're absolutely right -- I had a problem with my UTF8 decoding and it was missing that leading byte. Argh! Apologies for the waste of time, and thanks for looking into this so promptly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants