Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: Reject paths containing non-ASCII characters #3062

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions lib/_http_client.js
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,16 @@ function ClientRequest(options, cb) {
if (self.agent && self.agent.protocol)
expectedProtocol = self.agent.protocol;

if (options.path && / /.test(options.path)) {
if (options.path && ! /^[\x00-\x08\x0E-\x1F\x21-\x7F]*$/.test(options.path)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it wouldn't be better to use a negated character set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The highest Unicode code point is U+10FFFF, but a higher one could be introduced in the future. That, along with the complications of non-BMP code points in Javascript, make writing a correct future-proof regex with a negated class tricky. In any case, both this regex and one using a negated character should be O(n), so I see no need for a change.

// The actual regex is more like /[^A-Za-z0-9\-._~!$&'()*+,;=/:@]/
// with an additional rule for ignoring percentage-escaped characters
// but that's a) hard to capture in a regular expression that performs
// well, and b) possibly too restrictive for real-world usage. That's
// why it only scans for spaces because those are guaranteed to create
// an invalid request.
// well, and b) possibly too restrictive for real-world usage.
//
// This regex rejects paths with whitespace (U+0009, U+000A, U+000B,
// U+000C, U+000D and U+0020) as those can have special meaning in the HTTP
// protocol, as well as paths with non-ASCII characters, (to deprecate
// previous buggy behaviour).
throw new TypeError('Request path contains unescaped characters.');
} else if (protocol !== expectedProtocol) {
throw new Error('Protocol "' + protocol + '" not supported. ' +
Expand Down
19 changes: 15 additions & 4 deletions test/parallel/test-http-client-unescaped-path.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,18 @@ var common = require('../common');
var assert = require('assert');
var http = require('http');

assert.throws(function() {
// Path with spaces in it should throw.
http.get({ path: 'bad path' }, assert.fail);
}, /contains unescaped characters/);
for (var path of ['bad path', 'line\nfeed', 'carriage\rreturn', 'tab\t']) {
assert.throws(
function() { http.get({ path: path }, assert.fail); },
/contains unescaped characters/,
'Path "' + path + '" with whitespace should throw'
);
}

for (var path of ['caf\u00e9', '\u0649\u0644\u0649', '💩' /* U+1F4A9 */ ]) {
assert.throws(
function() { http.get({ path: path }, assert.fail); },
/contains unescaped characters/,
'Path "' + path + '" with non-ASCII characters should throw'
);
}