Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad encoding on default picocli.CommandLine#getOut and picocli.CommandLine#getErr #1320

Closed
charphi opened this issue Feb 2, 2021 · 6 comments · Fixed by #1321
Closed

Bad encoding on default picocli.CommandLine#getOut and picocli.CommandLine#getErr #1320

charphi opened this issue Feb 2, 2021 · 6 comments · Fixed by #1321
Labels
theme: shell An issue or change related to interactive (JLine) applications type: bug 🐛 type: enhancement ✨
Milestone

Comments

@charphi
Copy link
Contributor

charphi commented Feb 2, 2021

Depending on your console settings, the default PrintWriter of picocli.CommandLine#getOut may use a bad encoding.

Picocli code:

public PrintWriter getOut() {
    if (this.out == null) {
        this.setOut(new PrintWriter(System.out, true));
    }

    return this.out;
}

This code creates a PrintWriter using Charset.defaultCharset() which can be different from the current console encoding.
A possible solution would be to use Charset.forName(System.getProperty("sun.stdout.encoding")).
See https://github.com/keerath/openjdk-8-source/blob/5f6e9d42a9f6b6736100c9c6f43f5f5ea1570cfb/jdk/src/share/classes/java/lang/System.java#L1189

Here is a code to reproduce (Windows 10, command prompt, encoding cp437):

///usr/bin/env jbang "$0" "$@" ; exit $?
//REPOS jitpack
//DEPS com.github.remkop.picocli:picocli:4.6.1
//DEPS com.github.remkop.picocli:picocli-codegen:4.6.1

import picocli.CommandLine;

import java.io.OutputStreamWriter;
import java.nio.charset.Charset;
import java.util.concurrent.Callable;

@CommandLine.Command(name = "EncodingIssue")
class EncodingIssue implements Callable<Void> {

    @CommandLine.Spec
    private CommandLine.Model.CommandSpec spec;

    public static void main(String... args) {
        System.exit(new CommandLine(new EncodingIssue()).execute(args));
    }

    @Override
    public Void call() throws Exception {
        String problem = "[abcµ]" + System.lineSeparator();
        // ko
        spec.commandLine().getOut().append("1" + spec.commandLine().getColorScheme().errorText(problem)).flush();
        spec.commandLine().getOut().append("2" + problem).flush();
        new OutputStreamWriter(System.out, Charset.defaultCharset()).append("3" + problem).flush();
        // ok
        System.out.append("4" + problem).flush();
        new OutputStreamWriter(System.out, getStdoutEncoding()).append("5" + problem).flush();
        return null;
    }

    static Charset getStdoutEncoding() {
        String charsetName = System.getProperty("sun.stdout.encoding");
        if (charsetName != null) return Charset.forName(charsetName);
        return Charset.defaultCharset();
    }
}
@charphi
Copy link
Contributor Author

charphi commented Feb 2, 2021

Screenshot of the problem:

EncodingIssue

@remkop remkop added type: bug 🐛 type: enhancement ✨ theme: shell An issue or change related to interactive (JLine) applications labels Feb 2, 2021
@remkop remkop added this to the 4.7 milestone Feb 2, 2021
@remkop
Copy link
Owner

remkop commented Feb 2, 2021

Interesting, I was not aware of this.
Thank you for raising this!

Will you be able to provide a pull request for this?

@remkop
Copy link
Owner

remkop commented Feb 2, 2021

In the example screenshot, what is the encoding used by your console/terminal? (Update: understood about cp437.)
There is still a ? question mark displayed in front of the µ glyph in lines 4 and 5. Why do you think this is?

I assume that your source code (that contains the µ character) is in unicode, and this is different from the encoding used by your console/terminal. What would you see if your source code also uses the same encoding used by your console/terminal? (you may need to specify the encoding to javac in order to compile.)

@charphi
Copy link
Contributor Author

charphi commented Feb 3, 2021

My code doesn't contain special characters but deal with .csv files which might contain some.

For the question mark, I don't know 😕 It might come from the font used by the terminal.

@charphi
Copy link
Contributor Author

charphi commented Feb 3, 2021

I've done some research and the question mark is the result of the use of an unmappable-character.

When an unmappable-character is encountered, the default behavior of PrintWriter is to replace the problematic character with a replacement value.

Here is some code to test:

String problem = "[abcµ]";
CharsetEncoder cp437 = Charset.forName("cp437").newEncoder();
System.out.println(cp437.canEncode(problem)); // returns false

cp437.onUnmappableCharacter(CodingErrorAction.REPLACE);
cp437.encode(CharBuffer.wrap(problem)); // doesn't throw exception

cp437.onUnmappableCharacter(CodingErrorAction.REPORT);
cp437.encode(CharBuffer.wrap(problem)); // throws exception

@remkop
Copy link
Owner

remkop commented Feb 4, 2021

Thanks for the PR!
Merged and documented in the release notes.

@remkop remkop modified the milestones: 4.7, 4.6.2 Feb 23, 2021
remkop added a commit that referenced this issue Mar 31, 2021
MarkoMackic pushed a commit to MarkoMackic/picocli that referenced this issue Oct 17, 2021
MarkoMackic pushed a commit to MarkoMackic/picocli that referenced this issue Oct 17, 2021
MarkoMackic added a commit to MarkoMackic/picocli that referenced this issue Oct 17, 2021
MarkoMackic added a commit to MarkoMackic/picocli that referenced this issue Oct 17, 2021
…of `sun.stdout.encoding` and `sun.stderr.encoding`"

This reverts commit f82fe75.
MarkoMackic added a commit to MarkoMackic/picocli that referenced this issue Oct 17, 2021
MarkoMackic added a commit to MarkoMackic/picocli that referenced this issue Oct 17, 2021
…of `sun.stdout.encoding` and `sun.stderr.encoding`"

This reverts commit f82fe75.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme: shell An issue or change related to interactive (JLine) applications type: bug 🐛 type: enhancement ✨
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants