Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(url): fix host extra trailing slash issue #1173

Closed

Conversation

zoilorys
Copy link

@zoilorys zoilorys commented Jan 26, 2024

Change Summary

Currently Url.build and MultiHostUrl.build methods both have a bug, where there is an extra slash always added after host before path, regardless of whether any of them already have trailing/leading slash. This PR adds checks to make sure that there will only single slash in the resulting URL. Additionally, added similar checks to query and fragment appends, so jsut in case any of the contain leading ?/#.

Related issue number

I haven't found any related issue in this repo, but there were related issues in pydantic/pydantic repo.

Checklist

  • [+] Unit tests for the changes exist
  • [+] Documentation reflects the changes where applicable
  • [+] Pydantic tests pass with this pydantic-core (except for expected changes)
  • [+] My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @davidhewitt

@zoilorys
Copy link
Author

please review

Copy link

codecov bot commented Jan 26, 2024

Codecov Report

Merging #1173 (6ef41b7) into main (758bc51) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1173      +/-   ##
==========================================
- Coverage   90.40%   90.39%   -0.02%     
==========================================
  Files         106      106              
  Lines       16515    16535      +20     
  Branches       36       36              
==========================================
+ Hits        14930    14946      +16     
- Misses       1578     1582       +4     
  Partials        7        7              
Files Coverage Δ
src/url.rs 97.34% <100.00%> (-0.98%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 758bc51...6ef41b7. Read the comment docs.

Copy link

codspeed-hq bot commented Jan 26, 2024

CodSpeed Performance Report

Merging #1173 will degrade performances by 23.15%

Comparing zoilorys:fix/url-build-host-trailing-slash-issue (6ef41b7) with main (758bc51)

Summary

❌ 2 regressions
✅ 144 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main zoilorys:fix/url-build-host-trailing-slash-issue Change
test_core_str 31.3 µs 38.8 µs -19.29%
test_variable_tuple 31.7 µs 41.3 µs -23.15%

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It looks like the main thread discussing all these cases is pydantic/pydantic#7186

I agree there's definitely a normalization bug there with path, so let's get that fixed 👍

Regarding the more extensive changes you've made to query and fragment, perhaps the more correct solution is to get servo/rust-url#835 implemented upstream and then we can move to a proper method to build these URLs rather than fiddling around with normalising into a string here. Maybe best to leave those off this PR for now; if you've got interest in tackling the upstream issue it sounds like there's a lot of demand for it!

Comment on lines +180 to +182
if !url.ends_with('/') {
url.push('/');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should expect that url never ends with /, as that's technically not a valid hostname? I wonder if there's a case to raise an error if url_host contains /?

Comment on lines +183 to +187
if path.starts_with('/') {
url.push_str(path.trim_start_matches('/'));
} else {
url.push_str(path);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think trimming the path is too aggressive, because ///foo as a path is technically meaningful and

Suggested change
if path.starts_with('/') {
url.push_str(path.trim_start_matches('/'));
} else {
url.push_str(path);
}
if !path.starts_with('/') {
url.push('/');
}

Comment on lines +190 to +192
if !query.starts_with('?') {
url.push('?');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a change in functionality which is slightly different, so perhaps needs to be split out into a separate PR. Similar to my comment about hostname, there should probably be some validation here.

Comment on lines 417 to +436
if let Some(path) = path {
url.push('/');
url.push_str(path);
if !url.ends_with('/') {
url.push('/');
}
if path.starts_with('/') {
url.push_str(path.trim_start_matches('/'));
} else {
url.push_str(path);
}
}
if let Some(query) = query {
url.push('?');
if !query.starts_with('?') {
url.push('?');
}
url.push_str(query);
}
if let Some(fragment) = fragment {
url.push('#');
if !fragment.starts_with('#') {
url.push('#');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above, I guess.

Comment on lines +54 to +60
# 3) with host trailing slash/no path leading slash in the input
url = Url.build(scheme='https', host='example.com/', path='foo/bar', query='baz=qux', fragment='quux')
assert_example_url(url)

# 4) with host trailing slash/with path leading slash in the input
url = Url.build(scheme='https', host='example.com/', path='/foo/bar', query='baz=qux', fragment='quux')
assert_example_url(url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I would be tempted to argue that these cases should never be valid, because the host isn't valid? Maybe that needs to wait until V3 given it's technically breaking? Or is it kinda broken anyway already because of the unconditional / addition?

@zoilorys
Copy link
Author

zoilorys commented Feb 5, 2024

@davidhewitt , this is in regards to all of your comments - you're right. I guess I got into "fix every case" mood here :)
While ///path is technically valid, it created issues in our and other people's setups, so I think there is definitely something to be done about it, but it does needs to be fixed in rust-url.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants