Skip to content
This repository has been archived by the owner on Aug 7, 2023. It is now read-only.

Encoding handling #235

Closed
weirdan opened this issue Jan 27, 2017 · 4 comments
Closed

Encoding handling #235

weirdan opened this issue Jan 27, 2017 · 4 comments
Labels

Comments

@weirdan
Copy link
Contributor

weirdan commented Jan 27, 2017

Turns out encoding handling is still broken for single-byte encodings (see comment on #211). That's because I assumed in my latest PRs that file content is passed as binary, however it's not the case.

linter-phpcs gets file contents as string (with textEditor.getText() call). Strings in javascript have no inherent encoding, instead they are sequences of characters. Later on, this string is passed to atom-linter.exec, which is actually imported from sb-exec.

sb-exec starts the process and eventually passes the string (it's still a string at that point) to subprocess' stdin.write() method (here), but does not specify any encoding. stdin itself is stream.Writable. Documentation for stream.Writable.write() says you need to specify encoding if you pass a string as chunk argument, however it's not enforced.

Tracing the calls deeper I found that default encoding used when no encoding was specified for a string is utf8 (here). However it really should be considered an implementation detail, as it's not documented. In fact, documentation implies encoding argument for stream.Writable.write() method is mandatory when passing string as chunk argument.

The end result is that regardless of the actual file encoding PHPCS always gets UTF-8 on stdin.

I propose to keep UTF-8 as the only encoding used when communicating with PHPCS, as it allows to drop any conversion mumbo-jumbo, including that manually created iconv-lite-to-libiconv JSON mapping. This has a slight chance of breaking custom PHPCS sniffs that expect a single-byte encoding and do not properly consider --encoding= parameter. But then it's probably their fault anyway. To make sure we do not rely on implicit default encoding I propose to convert the text to explicit UTF-8 encoded Buffer (with Buffer.from(fileText, 'utf8')) and pass it down to sb-exec to be used as stdin for the subprocess.

@weirdan
Copy link
Contributor Author

weirdan commented Jan 27, 2017

@Arcanemagus , I'll do a PR if you agree with this approach.

@weirdan
Copy link
Contributor Author

weirdan commented Jan 28, 2017

Oh, it doesn't work the way I hoped either. sb-exec.exec() outright rejects buffers 😒 here

Basically, sb-exec is broken, as it takes away a choice of encoding from you and forces input for the subprocess to be in default encoding (UTF-8, by coincidence).

weirdan added a commit to weirdan/linter-phpcs that referenced this issue Jan 28, 2017
PHPCS always gets UTF-8 as its input (see AtomLinter#235)
@Arcanemagus
Copy link
Member

Arcanemagus commented Jan 30, 2017

Filed a bug on sb-exec about needing to specify the encoding.

Thanks for tracing this mess down!

Arcanemagus pushed a commit that referenced this issue Jan 30, 2017
* Use a single fixed encoding

PHPCS always gets UTF-8 as its input (see #235)

* Added single-byte encoding tests
@Arcanemagus
Copy link
Member

Hopefully fixed by #236 in v1.5.8.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants