Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 regexp warnings #69

Closed
ysb33r opened this issue Dec 8, 2014 · 12 comments
Closed

UTF-8 regexp warnings #69

ysb33r opened this issue Dec 8, 2014 · 12 comments
Assignees
Labels

Comments

@ysb33r
Copy link
Member

ysb33r commented Dec 8, 2014

I am seeing these warnings appear when run from asciidoctor-gradle-plugin

.../asciidoctorj-pdf-1.5.0-alpha.6.jar!/gems/pdf-core-0.2.5/lib/pdf/core/pdf_object.rb:55 warning: regexp match /.../n against to UTF-8 string
@mojavelinux
Copy link
Member

This is actually known. Personally, I consider this to be an errant warning in Prawn, but there's nothing I've been able to do about it so far. Perhaps we need to gather more info (determine if there is anything to worry about) and pursue upstream in Prawn.

@miko
Copy link

miko commented Dec 10, 2014

Hello, I am also getting this info, but what really worries me, only ASCII characters gets passed in. For this adoc file I get my national letters stripped out of my name:

= Hello World
Michał Kołodziejczyk <miko@example.com>
1.0.0, 2014-12-10

Hello world!

@ysb33r
Copy link
Member Author

ysb33r commented Dec 10, 2014

Indeed it shows up as
screen shot 2014-12-10 at 12 18 48
(The M is not gone, I just did not paste it into the test)

However in the case of my surname Cronjé the e-acute is not dropped.

@ysb33r
Copy link
Member Author

ysb33r commented Dec 10, 2014

Are we back to the same issue from #33 / #36 ?

@mojavelinux
Copy link
Member

It's slightly different. The source file is already marked with the UTF-8 encoding header. The problem is that the regular expression has the 8-BIT ASCII encoding modifier (see https://github.com/prawnpdf/pdf-core/blob/master/lib/pdf/core/pdf_object.rb#L61), which frankly I just don't understand why this is necessary. Either way, this is an upstream issue. (Keep in mind we are on an older version of pdf-core atm, so we'd need to upgrade if it gets fixed/changed).

only ASCII characters gets passed in

It doesn't matter. Both Asciidoctor and Prawn force all strings to UTF-8 encoding. That's further reason why this is a completely errant warning, because the regular expression in question is not violating any encoding assumptions.

@mojavelinux
Copy link
Member

For this adoc file I get my national letters stripped out of my name:

That's a separate issue. Asciidoctor PDF uses custom fonts. When I created the font files that are bundled in the gem, I did not include all international characters in the font (something I've been meaning to fix). This is a valid test case that demonstrates this problem. Could you file a separate issue?

@miko
Copy link

miko commented Dec 11, 2014

Done as #72 , thanks for the explanation.

@mojavelinux
Copy link
Member

Great! Thanks @miko!

@mojavelinux mojavelinux self-assigned this Jun 18, 2015
@mojavelinux mojavelinux added this to the v1.5.0 milestone Jun 18, 2015
@mojavelinux mojavelinux added bug and removed upstream labels Jun 18, 2015
@mojavelinux
Copy link
Member

Turns out this was a bug in Asciidoctor PDF. When setting the value of a PDF metadata field, the content must be encoded if it includes glyphs that fall outside the WINANSI code set. We now check for this and encode the value properly.

@mojavelinux
Copy link
Member

This was resolved by 0310c3e.

@orloffm
Copy link

orloffm commented Aug 21, 2015

I am still getting

pdf-core-0.6.0/lib/pdf/core/pdf_object.rb:61: warning: regexp match /.../n against to UTF-8 string

when compiling this file:

TEST
====

== Проверка

Notepad2 says the file is "UTF8 Signature". Asciidoctor-pdf has version asciidoctor-pdf-1.5.0.alpha.9. And I am on Windows.

@mojavelinux
Copy link
Member

@orloffm I've filed a separate issue to track this. Please is #308. It doesn't seem to break the operation of the xref, so I'd say that the warning is harmless. Regardless, it will be good to get it cleared up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants