Skip to content
This repository has been archived by the owner on Jan 29, 2020. It is now read-only.

Fix #357 - invalid characters in path/query #372

Merged
merged 3 commits into from
Oct 10, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions src/Uri.php
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,7 @@

use Psr\Http\Message\UriInterface;

use function array_key_exists;
use function array_keys;
use function count;
use function explode;
use function get_class;
use function gettype;
Expand All @@ -23,10 +21,12 @@
use function is_string;
use function ltrim;
use function parse_url;
use function preg_match;
use function preg_replace;
use function preg_replace_callback;
use function rawurlencode;
use function sprintf;
use function str_split;
use function strpos;
use function strtolower;
use function substr;
Expand Down Expand Up @@ -560,6 +560,8 @@ private function filterScheme(string $scheme) : string
*/
private function filterUserInfoPart(string $part) : string
{
$part = $this->filterInvalidUtf8($part);

// Note the addition of `%` to initial charset; this allows `|` portion
// to match and thus prevent double-encoding.
return preg_replace_callback(
Expand All @@ -574,6 +576,8 @@ private function filterUserInfoPart(string $part) : string
*/
private function filterPath(string $path) : string
{
$path = $this->filterInvalidUtf8($path);

$path = preg_replace_callback(
'/(?:[^' . self::CHAR_UNRESERVED . ')(:@&=\+\$,\/;%]+|%(?![A-Fa-f0-9]{2}))/u',
[$this, 'urlEncodeChar'],
Expand All @@ -594,6 +598,25 @@ private function filterPath(string $path) : string
return '/' . ltrim($path, '/');
}

/**
* Encode invalid UTF-8 characters in given string. All other characters are unchanged.
*/
private function filterInvalidUtf8(string $string) : string
{
if (preg_match('//u', $string)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is doing, exactly.

I've tried it locally with a variety of strings from your query provider below, and every single one of them results in a positive match, which means the logic below never gets hit. Maybe you're not testing invalid UTF-8 characters anywhere?

Copy link
Member Author

@michalbundyra michalbundyra Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come?

See: https://3v4l.org/TntQf

var_dump(preg_match('//u', "\x21\x92"));
var_dump(preg_match('//u', "\x21"));
var_dump(preg_match('//u', "\x92"));

result:

bool(false)
int(1)
bool(false)

Somehow tests I've added work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned out I was using single quotes instead of double quotes when I was doing my tests here. Once I changed the quoting style, I was able to observe the same behaviour finally.

Might be good to drop a note in indicating what this is doing, though, so future contributors/maintainers know the purpose.

return $string;
}

$letters = str_split($string);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting here, because I had to look it up - str_split() splits into bytes instead of characters when it encounters a multibyte string.

foreach ($letters as $i => $letter) {
if (! preg_match('//u', $letter)) {
$letters[$i] = $this->urlEncodeChar([$letter]);
}
}

return implode('', $letters);
}

/**
* Filter a query string to ensure it is propertly encoded.
*
Expand Down Expand Up @@ -654,6 +677,8 @@ private function filterFragment(string $fragment) : string
*/
private function filterQueryOrFragment(string $value) : string
{
$value = $this->filterInvalidUtf8($value);

return preg_replace_callback(
'/(?:[^' . self::CHAR_UNRESERVED . self::CHAR_SUB_DELIMS . '%:@\/\?]+|%(?![A-Fa-f0-9]{2}))/u',
[$this, 'urlEncodeChar'],
Expand Down
6 changes: 5 additions & 1 deletion test/UriTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ public function userInfoProvider()
'at' => ['user@example.com', 'cred@foo', 'user%40example.com:cred%40foo'],
'percent' => ['%25', '%25', '%25:%25'],
'invalid-enc' => ['%ZZ', '%GG', '%25ZZ:%25GG'],
'invalid-utf' => ["\x21\x92", '!?', '!%92:!%3F'],
];
// @codingStandardsIgnoreEnd
}
Expand Down Expand Up @@ -616,7 +617,9 @@ public function utf8PathsDataProvider()
{
return [
['http://example.com/тестовый_путь/', '/тестовый_путь/'],
['http://example.com/ουτοπία/', '/ουτοπία/']
['http://example.com/ουτοπία/', '/ουτοπία/'],
["http://example.com/\x21\x92", '/%21%92'],
['http://example.com/!?', '/%21'],
];
}

Expand All @@ -635,6 +638,7 @@ public function utf8QueryStringsDataProvider()
return [
['http://example.com/?q=тестовый_путь', 'q=тестовый_путь'],
['http://example.com/?q=ουτοπία', 'q=ουτοπία'],
["http://example.com/?q=\x21\x92", 'q=!%92'],
];
}

Expand Down