Skip to content
This repository has been archived by the owner on Jan 29, 2020. It is now read-only.

Fix #357 - invalid characters in path/query #372

Merged
merged 3 commits into from
Oct 10, 2019
Merged

Fix #357 - invalid characters in path/query #372

merged 3 commits into from
Oct 10, 2019

Conversation

michalbundyra
Copy link
Member

@michalbundyra michalbundyra commented Oct 9, 2019

Fixes #357
Fixes #218

@michalbundyra michalbundyra added this to the 2.1.5 milestone Oct 9, 2019
@michalbundyra
Copy link
Member Author

@weierophinney I was looking for right solution of this issue, and I think I have found one finally.
In my opinion we need to do something with invalid characters - in that case encode - what I mean we can't just throw exception, as these can be user input.
So if invalid UTF-8 character is detected in the string, then I am checking the string letter by letter to encode that invalid character. This solution run regexp on one character in the loop, but I couldn't find anything better. Any suggestions appreciated.

/cc @YellowPepper @krowinski @freyr @oroszlanyzsolt

return $string;
}

$letters = str_split($string);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting here, because I had to look it up - str_split() splits into bytes instead of characters when it encounters a multibyte string.

*/
private function filterInvalidUtf8(string $string) : string
{
if (preg_match('//u', $string)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is doing, exactly.

I've tried it locally with a variety of strings from your query provider below, and every single one of them results in a positive match, which means the logic below never gets hit. Maybe you're not testing invalid UTF-8 characters anywhere?

Copy link
Member Author

@michalbundyra michalbundyra Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come?

See: https://3v4l.org/TntQf

var_dump(preg_match('//u', "\x21\x92"));
var_dump(preg_match('//u', "\x21"));
var_dump(preg_match('//u', "\x92"));

result:

bool(false)
int(1)
bool(false)

Somehow tests I've added work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned out I was using single quotes instead of double quotes when I was doing my tests here. Once I changed the quoting style, I was able to observe the same behaviour finally.

Might be good to drop a note in indicating what this is doing, though, so future contributors/maintainers know the purpose.

weierophinney added a commit that referenced this pull request Oct 10, 2019
@weierophinney weierophinney merged commit f86eaa9 into zendframework:master Oct 10, 2019
weierophinney added a commit that referenced this pull request Oct 10, 2019
@michalbundyra michalbundyra deleted the hotfix/357 branch October 10, 2019 17:53
@krowinski
Copy link

tx for fix!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
3 participants