From 83145ae017f47dc7e224502e4b8aa50badbfd774 Mon Sep 17 00:00:00 2001 From: Moritz Lenz Date: Sun, 2 Feb 2014 10:54:20 +0100 Subject: [PATCH] [doc] clarify that CHECK coderefs return octets I would have intuitively expected that the fallback to decode() should return a character string. Since this has cost me several hours of debugging, I find it appropriate to be more explicit in the docs. --- Encode.pm | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/Encode.pm b/Encode.pm index 0c58043..e9779e7 100644 --- a/Encode.pm +++ b/Encode.pm @@ -801,13 +801,24 @@ If you're not interested in this, then bitwise-OR it with the bitmask. =head2 coderef for CHECK As of C 2.12, C can also be a code reference which takes the -ordinal value of the unmapped character as an argument and returns a string -that represents the fallback character. For instance: +ordinal value of the unmapped character as an argument and returns +octets that represent the fallback character. For instance: $ascii = encode("ascii", $utf8, sub{ sprintf "", shift }); Acts like C but U+I is used instead of C<\x{I}>. +Even the fallback for C must return octets, which are +then decoded with the character encoding that C accepts. So for +example if you wish to decode octests as UTF-8, and use ISO-8859-15 as +a fallback for bytes that are not valid UTF-8, you could write + + $str = decode 'UTF-8', $octets, sub { + my $tmp = chr shift; + from_to $tmp, 'ISO-8859-15', 'UTF-8'; + return $tmp; + }; + =head1 Defining Encodings To define a new encoding, use: