Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charset support #277

Closed
ISQ-GTT opened this issue Oct 24, 2016 · 7 comments
Closed

Charset support #277

ISQ-GTT opened this issue Oct 24, 2016 · 7 comments
Labels

Comments

@ISQ-GTT
Copy link
Contributor

ISQ-GTT commented Oct 24, 2016

We would like to use sshj to communicate with various machines which are configured to use ISO 8859-1.
When using sshj, received byte streams which are encoded in ISO 8859-1 get decoded using UTF-8 by default resulting in encoding problems with Umlaute and such.
There's no possibiliy for us to change the encoding on server-side or extend sshj-code to support that charset.

All encoding problems seem to break down to the net.schmizz.sshj.common.Buffer classes readString()- and putString()-methods where UTF-8 is used hard-coded.

It would be great to support arbitrary charsets like other SSH-frameworks (e.g. jsch) or SSH-tools (e.g. PuTTY).

@hierynomus
Copy link
Owner

Do you have an example, and a more specific place on where this is happening? File copies, shell, exec's...

@ISQ-GTT
Copy link
Contributor Author

ISQ-GTT commented Oct 24, 2016

It applies to exec's, shell-operations and file copies.

For example:

  • I created a remote file "TestFileÄÖÜ.txt" using SFTP, SCP, Exec and Shell each.
    In WinSCP it gets displayed as "TestFileÄÖÜ.txt". In PuTTY it gets displayed in 2 lines like "TestFileÃ\nÃÃ.txt".
  • When listing files in sshj using "ls" or SFTPClient.ls(), a file "TestFileÄÖÜ.txt" that was remotely created using PuTTY or WinSCP gets displayed as "TestFile???.txt".
    Calling getBytes() returns "84, 101, 115, 116, 70, 105, 108, 101, -17, -65, -67, -17, -65, -67, -17, -65, -67, 46, 116, 120, 116". Each '?' is encoded as "-17, -65, -67".
    On the other hand, all files created with sshj will be displayed correctly.
  • Existing files can't be found. Executing "cat TestFileÄÖÜ.txt" results in "cat: 0652-050 Cannot open TestFileÄÖÜ.txt." and only works if the file was created using sshj, e.g. when "TestFileÄÖÜ.txt" exists. Same for SFTP/SCP downloads

@hierynomus hierynomus added the bug label Oct 25, 2016
@hierynomus
Copy link
Owner

Can you write a (unit) test to easily simulate this behaviour? I'd be more than happy to fix this.

@ISQ-GTT
Copy link
Contributor Author

ISQ-GTT commented Dec 1, 2016

When creating the file with sshj, the UTF-8 filename is remotely decoded with the local charset resulting in an odd filename.
Displaying that filename on sshj-side reverses that process so it will look normal again to the sshj-client.
Basically sshj-created files act as expected while using sshj.

To reproduce the behaviour, you need to create a file like "TestFileÄÖÜ.txt" using PuTTY, WinSCP or create it directly on that machine so the filename is encoded with the local charset. (In my case it was actually ISO 8859-15 instead of ISO 8859-1 btw.)
Those "native" files containing special characters will be decoded using UTF-8 on sshj-side and result in odd filenames again.

As far as i can tell, any test needs to rely on those "native" files created directly or using some tool. Please tell me if i missed something so i can take care of some tests.

Maybe i'll manage to fix this but unfortunately i won't be able to run tests until the end of the month.

@ISQ-GTT
Copy link
Contributor Author

ISQ-GTT commented Jan 19, 2017

I'm currently working on a fix.

At some points the RFCs don't specify which charset to use for a string. Some of them will/could use the remote charset, some of them probably won't. That should be up to the ssh server implementation.

I'll be only able to test this with the (fixed) charset of the machines i'm currently connecting to. So there will be some testing left.

@hierynomus
Copy link
Owner

hierynomus commented Jan 19, 2017 via email

@hierynomus
Copy link
Owner

Fixed with #305

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants