[apps] Added url percent decoding to url query string keys and values. #2015

jschultz410 · 2021-05-20T17:48:15Z

Apps like srt-live-transmit take their source and destination parameters as urls. Urls require several common characters (e.g. - space) to be percent encoded. This PR adds a simple ASCII / Latin-1 percent decoding function to uriparser.cpp that I applied to the url query string keys and values, so that the url encoding doesn't persist in SRT fields such as streamid.

I don't apply the decoding to the whole url, just the query keys + vals, and I didn't write a matching encoder for re-building a uri out of the decoded components. I'm not even sure this is really the right spot to be doing this. Maybe it should instead just be done when passing the url parameters in to build an actual SRT connection rather than inside the parser's data itself?

Anyway, this PR partially addresses issues #1173 and #1871 and might be useful in a more full fledged approach to handling url encoding in the future.

jschultz410 · 2021-05-21T13:41:09Z

I looked at your unit test that failed. Your test and/or the thing it is testing is likely flaky because the code I changed should have absolutely zero impact on whether or not your thread synchronization and signaling primitives are working as expected.

The failed unit test was:

175: Test command: /home/runner/work/srt/srt/_build/test-srt "--gtest_filter=SyncEvent.WaitForTwoNotifyOne"

maxsharabayko · 2021-05-21T14:10:02Z

I looked at your unit test that failed. Your test and/or the thing it is testing is likely flaky because the code I changed should have absolutely zero impact (...)

Yes, please ignore those. They work at the edge of success :)

BTW I had some unfinished unit tests for the URI parser here.
Might be helpful if you could reuse them to test your change.

As I understand, the value part of say streamid=my%20stream is expected to be interpreted as my stream, right?

jschultz410 · 2021-05-21T14:59:20Z

As I understand, the value part of say streamid=my%20stream is expected to be interpreted as my stream, right?

Right. If the streamid's value contains characters that have to be encoded in a url, like a space, then when we actually get down into the SRT library we want the streamid that gets passed into the socket and to the other side to be decoded into utf-8 encoding instead, just like in your example.

I kind of did a hack and just did the decoding as part of the parsing of the uri, which isn't really great as I mentioned, because it doesn't support well things like rebuilding the uri back out of all of its components (i.e. - I didn't re-encode in the reverse direction). It also isn't really clear to me if the components of the uri should be internally stored and made accessible as decoded strings or left as percent encoded.

What is abundantly clear to me, however, is that when the uri parameters get passed into the SRT library to init the socket, etc., at that point those strings definitely should be percent decoded into ASCII / utf-8.

One other weakness here is that I didn't try to tackle percent encoded multi-byte unicode / utf-8 characters. Because we are using string's everywhere, which I believe only are single byte character strings, the necessary changes to truly support unicode characters would likely be much more pervasive in the types throughout the code.

I'll try to take a look at your uriparser unit tests when I can too.

ethouris · 2021-05-25T07:16:15Z

I think this version looks better:

string uri_decode(const string& in, bool printonly)
{
    char buf[4];

    std::ostringstream out;

    for (size_t i = 0; i < in.size(); ++i)
    {
        char c = in[i];
        if (c == '%' && i + 2 < in.size())
        {
            buf[0] = in[i+1];
            buf[1] = in[i+2];
            buf[2] = 0;
            unsigned hexval = 0;
            int st = sscanf(buf, "%x", &hexval);
            if (st == 1 && (!printonly || isprint(hexval)))
            {
                i += 2; // +1 will be added in the loop ++i
                out << char(hexval);
                continue;
            }
        }

        out << c;
    }

    return out.str();
}

jschultz410 · 2021-05-25T14:14:04Z

I think this version looks better:

Using an ostringstream and going character by character is probably better than doing searches, substr's, and string self additions. I considered doing that as well.

However, doing the sscanf without the checks that the characters are hex digits is more misleading when the input is bad. For example "%2z" will be improperly interpreted by sscanf as hexval == 2 with the bogus 'z' character being silently ignored and skipped and a wrong hexval being extracted and decoded.

This code also continues trying to decode even after a bad encoding sequence has been crossed, which might not be advisable on bad input (e.g. - "%%20").

I'm not sure about the printonly boolean saying un-printables shouldn't be decoded into the returned string and should be left as encoded sequences instead. (Originally, I was going to recommend restructuring the control flow before I realized you wanted to fall back to the main code path when decoding failed. Maybe a comment pointing that out would be good.)

The extra check I had for hexval == 0 can be dropped since std::string can contain nul characters these days.

ethouris · 2021-05-25T14:27:51Z

However, doing the sscanf without the checks that the characters are hex digits is more misleading when the input is bad. For example "%2z" will be improperly interpreted by sscanf as hexval == 2 with the bogus 'z' character being silently ignored and skipped and a wrong hexval being extracted and decoded.

Right. The condition should be fixed to:

        if (c == '%' && i + 2 < in.size() && isxdigit(in[i+2]))

No need to check the first one. If the first character isn't a hex digit, sscanf will not extract anything and will return 0.

This code also continues trying to decode even after a bad encoding sequence has been crossed, which might not be advisable on bad input.

Not sure exactly what the standard says about syntax errors here. This procedure checks if the % character is followed by two characters that can be deciphered as hex. If not, it just passes on. %%2a will then result in %*.

I'm not sure about the printonly boolean saying un-printables shouldn't be decoded into the returned string and should be left as encoded sequences instead. (Originally, I was going to recommend restructuring the control flow before I realized you wanted to fall back to the main code path when decoding failed. Maybe a comment pointing that out would be good.)

This was because I wasn't sure actually if the non-printable characters should be allowed or not. For printing in the logs or stdout, surely not. But for encoding as a string to be set, sure.

The extra check I had for hexval == 0 can be dropped since std::string can contain nul characters these days.

[apps] Added url percent decoding to url query string keys and values.

48d10f6

This comment has been minimized.

Sign in to view

maxsharabayko added [apps] Area: Test applications related improvements Type: Enhancement Indicates new feature requests labels May 21, 2021

maxsharabayko added this to the v1.4.4 milestone May 21, 2021

maxsharabayko modified the milestones: v1.4.4, Backlog, v1.4.5 Aug 12, 2021

maxsharabayko modified the milestones: v1.5.0, Next Release Jun 1, 2022

maxsharabayko modified the milestones: v1.5.1, Next release Sep 12, 2022

maxsharabayko modified the milestones: v1.5.2, v1.6.0 Dec 12, 2022

ethouris mentioned this pull request Apr 3, 2023

[EPIC] Base for the next release 1.6.0 #2709

Open

6 tasks

maxsharabayko mentioned this pull request Jun 9, 2023

[APP] passphrase and streamid characters restriction if passed as an URI query #2749

Open

maxsharabayko merged commit 4e8d036 into Haivision:master Sep 18, 2023

This was referenced Sep 18, 2023

[BUG] uriparser should decode pct-encoded #1173

Closed

embedding streamid in url leads to usage issue #1871

Closed

maxsharabayko mentioned this pull request Oct 7, 2024

Make it possible to set nul character in StreamID using the URI #3039

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[apps] Added url percent decoding to url query string keys and values. #2015

[apps] Added url percent decoding to url query string keys and values. #2015

jschultz410 commented May 20, 2021

This comment has been minimized.

jschultz410 commented May 21, 2021 •

edited

Loading

maxsharabayko commented May 21, 2021

jschultz410 commented May 21, 2021 •

edited

Loading

ethouris commented May 25, 2021 •

edited

Loading

jschultz410 commented May 25, 2021 •

edited

Loading

ethouris commented May 25, 2021 •

edited

Loading

[apps] Added url percent decoding to url query string keys and values. #2015

[apps] Added url percent decoding to url query string keys and values. #2015

Conversation

jschultz410 commented May 20, 2021

This comment has been minimized.

jschultz410 commented May 21, 2021 • edited Loading

maxsharabayko commented May 21, 2021

jschultz410 commented May 21, 2021 • edited Loading

ethouris commented May 25, 2021 • edited Loading

jschultz410 commented May 25, 2021 • edited Loading

ethouris commented May 25, 2021 • edited Loading

jschultz410 commented May 21, 2021 •

edited

Loading

jschultz410 commented May 21, 2021 •

edited

Loading

ethouris commented May 25, 2021 •

edited

Loading

jschultz410 commented May 25, 2021 •

edited

Loading

ethouris commented May 25, 2021 •

edited

Loading