Skip to content

Commit

Permalink
HTTPCLIENT-1960: URIBuilder incorrect handling of multiple leading sl…
Browse files Browse the repository at this point in the history
…ashes in path component
  • Loading branch information
ok2c committed Jan 12, 2019
1 parent 024f692 commit 8c04c6a
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 12 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -171,15 +171,6 @@ private static String normalizePath(final String path, final boolean relative) {
if (TextUtils.isBlank(s)) {
return "";
}
int n = 0;
for (; n < s.length(); n++) {
if (s.charAt(n) != '/') {
break;
}
}
if (n > 1) {
s = s.substring(n - 1);
}
if (!relative && !s.startsWith("/")) {
s = "/" + s;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,8 +147,20 @@ public static URI rewriteURI(
if (dropFragment) {
uribuilder.setFragment(null);
}
if (TextUtils.isEmpty(uribuilder.getPath())) {
final String path = uribuilder.getPath();
if (TextUtils.isEmpty(path)) {
uribuilder.setPath("/");
} else {
final StringBuilder buf = new StringBuilder(path.length());
boolean foundSlash = false;
for (int i = 0; i < path.length(); i++) {
final char ch = path.charAt(i);
if (ch != '/' || !foundSlash) {
buf.append(ch);
}
foundSlash = ch == '/';
}
uribuilder.setPath(buf.toString());

This comment has been minimized.

Copy link
@jalpa-pujara

jalpa-pujara Mar 22, 2019

Calling setPath here updates encodedPath to null and url encoded characters are removed from the path when uribuilder.build() is called in next line.

This comment has been minimized.

Copy link
@gjesse

gjesse Jun 13, 2019

This change is breaking some things for us, because the above call to final String path = uribuilder.getPath(); returns the decoded path, which is is then being worked with here. Then the formerly set encoded path is wiped out, so when the final uri is built, requests that previously included encoded colons now in 4.5.6 are unencoded. It looks like the build method does re-encode the path but based on the PATHSAFE field, which ultimately does not encode colons in the path, but the upshot is a URL that comes into this method with encoded colons leaves with decoded colons, which is a big behavior change :) Would appreciate any comments or insight here.

This comment has been minimized.

Copy link
@ok2c

ok2c Jun 14, 2019

Author Member

@gjesse Colons in the path component do not need to be encoded. Please refer to RFC 2396, section 3.3. Path Component.

path = [ abs_path | opaque_part ]
path_segments = segment *( "/" segment )
segment = *pchar *( ";" param )
param = *pchar
pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | ","

This comment has been minimized.

Copy link
@gjesse

gjesse Jun 14, 2019

That's true, but lots of other libraries like retrofit2 encode them. This change here takes what was previously encoded and decodes it, which breaks our hmac auth.

This comment has been minimized.

Copy link
@ok2c

ok2c Jun 14, 2019

Author Member

@gjesse It up to those libraries. It not illegal to encode non-reserved characters. But what is certainly wrong and broken is the server side code that fails to handle non encoded characters that are perfectly legal for path components.

This comment has been minimized.

Copy link
@gjesse

gjesse via email Jun 14, 2019

This comment has been minimized.

Copy link
@ok2c

ok2c Jun 14, 2019

Author Member

@gjesse
RFC 2161, 3.2.3 URI Comparison

   Characters other than those in the "reserved" and "unsafe" sets (see
   RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

This comment has been minimized.

Copy link
@gjesse

gjesse via email Jun 14, 2019

This comment has been minimized.

Copy link
@ok2c

ok2c Jun 15, 2019

Author Member

@gjesse That’s all well and good but obviously not everybody can blindly take low quality input. HttpClient must parse the original URI supplied by the user and split it into an authority component and a path component and also ensure the path component is valid.

This comment has been minimized.

Copy link
@gjesse

gjesse via email Jun 15, 2019

This comment has been minimized.

Copy link
@garydgregory

garydgregory Jun 15, 2019

Member

Two items:

  • you can turn off this normalization
  • we could change the default to off

This comment has been minimized.

Copy link
@ok2c

ok2c Jun 15, 2019

Author Member

we could change the default to off

@garydgregory No, we cannot. There are fringe cases such as http://host//crap that make path normalization necessary.

This comment has been minimized.

Copy link
@gjesse

gjesse Jun 17, 2019

ok so we're clear, previously using apache http client, if i requested a url like http://host/foo%3Abar, that is would would be sent to the server. now it's http://host/foo:bar, due to an apparently unrelated change to avoid multiple leading slashes.

This comment has been minimized.

Copy link
@amanduggal

amanduggal Jun 17, 2019

so @ok2c you accidentally made a breaking change and now you're arguing that you would rather lose users of this library instead of fix the issue and allow people to gracefully update their code?

This comment has been minimized.

Copy link
@ok2c

ok2c Jun 17, 2019

Author Member

@gjesse One last time.

  1. Both URIs are equivalent as per HTTP specification and should be referring to the same resource.
  2. Request URIs may be rewritten in transit. This is precisely the reason why the DIGEST authentication scheme embeds the original request URI in the authentication response header.
  3. I suspect in your particular case the problem can be solved by moving signature calculation to a request interceptor.
  4. There are multiple ways of avoiding HttpClient having to parse and normalize the original URI. Not to mention an option to simply disable request URI normalization if so desired.
  5. If you are not willing to do any of the above you are welcome to migrate to any other HTTP library you deem fit. I have been maintaining Apache HttpClient for nearly 20 years. Those kind of treats do not impress me.

This comment has been minimized.

Copy link
@gjesse

gjesse Jun 17, 2019

it's not a threat, it's just letting you know that it has put me in a difficult situation, and your change to the default behavior has cost me and lots of other folks quite a bit of effort. However now that i realize it can be disabled in 4.5.8 i will work around it that way. Glad to know you've been maintaining this for so long, thanks for your work.

This comment has been minimized.

Copy link
@michael-o

michael-o Jun 17, 2019

Member

I seriously do not understand the issue here. We, committers, have tightened the parsers for proper input, and yes your faulty input is failing. Same is doing Mark Thomas in Tomcat release by release for request params and header values. This is perfectly fine to do so. URL parsing is serious business and some fail.

This comment has been minimized.

Copy link
@mrhota

mrhota Jun 17, 2019

@ok2c Unless I'm missing something, I think the spec is irrelevant here. That section is called URI Comparison, and everyone will agree that the URIs are equivalent. But we aren't comparing URIs; we're signing a string. A string whose construction this commit changes and breaks. I think that's the only important detail here.

HMAC signatures based on string representations of equivalent URIs can differ.

Additionally, if the URIs are equivalent, why don't you leave encodedPath alone then (possibly apart from path normalization, of course)?

Surely it's possible to perform the path normalization and also retain the encoded-path-as-originally-provided which setPath here sets to null?

This comment has been minimized.

Copy link
@michael-o

michael-o Jun 17, 2019

Member

HMAC signatures based on string representations of equivalent URIs can differ.

If so, this approach is flawed.

This comment has been minimized.

Copy link
@mrhota

mrhota Jun 17, 2019

@michael-o That's not helpful.

Here are my questions again:

If the URIs are equivalent, why don't you leave encodedPath alone?

Surely it's possible to perform the path normalization and also retain the encoded-path-as-originally-provided which setPath here sets to null?

This comment has been minimized.

Copy link
@michael-o

michael-o Jun 17, 2019

Member

@ok2c That is a good question. If both are valid, why rewrite at all?

This comment has been minimized.

Copy link
@ok2c

ok2c Jun 17, 2019

Author Member

@michael-o I looked at the source code of URIUtils#rewriteURI. Yes, we could keep the original encoding if the original URI does not have consecutive / (empty path segments).

This comment has been minimized.

Copy link
@michael-o

michael-o Jun 17, 2019

Member

That's sound like a good compromise. This would make it stable, if it is not broken (i.e, valid) leave as-is.

This comment has been minimized.

Copy link
@jglick

jglick Dec 18, 2019

Spent a couple hours bisecting changes to find the cause of a test failure (jenkinsci/jenkins#4398) and landed on this. The peculiar case is that we are trying to use HtmlUnit to simulate URL-encoding attacks and verify that the server defends against them. To do that, it is necessary for the URI path to be sent to the server as written.

}
return uribuilder.build();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
import org.apache.http.Consts;
import org.apache.http.NameValuePair;
import org.apache.http.message.BasicNameValuePair;
import org.hamcrest.CoreMatchers;
import org.junit.Assert;
import org.junit.Test;

Expand Down Expand Up @@ -328,4 +329,23 @@ public void testRelativePathWithAuthority() throws Exception {
Assert.assertEquals(new URI("http://somehost/./mypath"), uri);
}

@Test
public void testMultipleLeadingPathSlashes() throws Exception {
final URI uri = new URIBuilder()
.setScheme("ftp")
.setHost("somehost")
.setPath("//blah//blah")
.build();
Assert.assertThat(uri, CoreMatchers.equalTo(URI.create("ftp://somehost//blah//blah")));
}

@Test
public void testPathNoLeadingSlash() throws Exception {
final URI uri = new URIBuilder()
.setScheme("ftp")
.setPath("blah")
.build();
Assert.assertThat(uri, CoreMatchers.equalTo(URI.create("ftp:/blah")));
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@ public void testRewrite() throws Exception {
URI.create("http://thishost/stuff"), null).toString());
Assert.assertEquals("/", URIUtils.rewriteURI(
URI.create("http://thishost//"), null).toString());
Assert.assertEquals("/stuff///morestuff", URIUtils.rewriteURI(
URI.create("http://thishost//stuff///morestuff"), null).toString());
Assert.assertEquals("/stuff/morestuff", URIUtils.rewriteURI(
URI.create("http://thishost//stuff/morestuff"), null).toString());
Assert.assertEquals("http://thathost/stuff", URIUtils.rewriteURI(
URI.create("http://thishost/stuff#crap"), target, true).toString());
Assert.assertEquals("http://thathost/stuff#crap", URIUtils.rewriteURI(
Expand Down

0 comments on commit 8c04c6a

Please sign in to comment.