Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix trailing slash dropped since v1.9.1 #1023

Closed
wants to merge 3 commits into from

Conversation

youtux
Copy link
Contributor

@youtux youtux commented Jun 9, 2024

What do these changes do?

Add back the ability to join with an empty string "", resulting in the / being appended to the URL. This was broken in v1.9.1, and since then it's not possible to create a URL with a trailing backslash (without doing string manipulation on the last segment).

Are there changes in behavior for the user?

Yes, fix a regression introuduced in v1.9.1, where you can't join "" expecting to have a trailing "/" appended to the URL.

Related issue number

Fixes #984.

Fixes #926.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes

@commonism
Copy link
Contributor

When it was changed, I considered this a inconvenient change, time went by and I started using it to drop trailing slashes

import yarl
yarl.__version__
# '1.9.4'
a = yarl.URL("schema://host/path/")
a
# URL('schema://host/path/')
a / ""
# URL('schema://host/path')
if (nf := (yarl.URL(c).with_fragment(None) / "")) in

What is the recommended way to drop a trailing slash after this change?

@youtux
Copy link
Contributor Author

youtux commented Jun 10, 2024

I don't think it was ever meant to be doing that, to me it's just buggy behaviour.

If you need something like that, I guess you'll need to do something like:

if a.path.endswith("/"):
     a = a.with_path(u.path[:-1])

@commonism
Copy link
Contributor

As mentioned, current behavior matches pathlib, but pathlib always removes trailing /, on the other hand internally "" was used to append a "/".

I'm fine with whatever gets documented as behavior.

New regressions:

>>> URL("scheme://host/path").joinpath("bar", "", "baz")
URL('scheme://host/path/bar//baz')

>>> URL("scheme://host/path").joinpath("", "", "")
URL('scheme://host/path///')

@youtux
Copy link
Contributor Author

youtux commented Jun 11, 2024

@webknjaz any chance you can review this?

Copy link
Contributor

@commonism commonism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked down the origin.
The detection to add a trailing / in _make_child did not identify the empty string as last part as indicator.

@@ -718,9 +718,7 @@ def _make_child(self, segments, encoded=False):
# keep the trailing slash if the last segment ends with /
parsed = [""] if segments and segments[-1][-1:] == "/" else []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is origin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parsed = [""] if segments and segments[-1][-1:] == "/" else []
parsed = [""] if segments and (segments[-1][-1:] == "/" or segments[-1] == "") else []

),
"http://example.com/path/",
id="trailing-slash-empty-string",
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    pytest.param(
        "",
        (
            "path",
            "",
            "",
        ),
        "http://example.com/path/",
        id="trailing-double-slash-empty-string",
    ),
    pytest.param(
        "",
        ("path", "", "", "path"),
        "http://example.com/path/path",
        id="embedded-double-slash-empty-string",
    ),
    pytest.param(
        "",
        ("path/", ""),
        "http://example.com/path/",
        id="path-slash-follows-embedded-slash-empty-string",
    ),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t agree with the first 2 parametrizations. IMO they should result in double slashes.
otherwise how can you build a url with double slashes in the path?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalized path do not have double slashes?

And from a technical point of view,
("path/", "") is the same as ("path", "", "")
So the third would be bad as well.

The requested behavior does not match the test_joinpath_relative[empty-element] - the test you broke.

pytest.param(URL("a"), ("b", "", "c"), ("a", "b", "c"), id="empty-element"),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure these would be a normalised paths?
Even if so, the lib should allow to generate non-normalised paths, since many web servers allow to differentiate for paths with and without slashes at the end for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to RFC 3986, the path is allowed to have empty strings as segment of the path, as defined in the grammar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youtux how to you think about:
commonism@56db9d3

it passes all current tests and satisfies the requirements set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that all looks correct to me, with no opinion on whether the last example should have 3 or 4 slashes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last example is meant to have 3 slashes - 4 would be a special case.
I've refactored things a little to match the wording in rfc3986 and added more tests to document expected behavior.
URL.join is inconsistent to URL.joinpath now, it uses urlib.parse.join, stripping empty segments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a consistent URL.join & rfc3986 port normalization
https://github.com/commonism/yarl/compare/youtux-fix-join-trailing-slash...commonism:yarl:rfc3986-port-normalization?expand=1

How do you want this to proceed @youtux?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should open a new pull request since it's more complete.
Also, I'd split other changes you made in different PRs (like the query params changes)

if not seg:
continue
if seg[0] == "/":
if seg and seg[0] == "/":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

Suggested change
if seg and seg[0] == "/":
if not seg:
continue
if seg and seg[0] == "/":

@commonism
Copy link
Contributor

Your versions fails the unit tests.
https://github.com/aio-libs/yarl/actions/runs/9434956133/job/25990802776?pr=1023

  FAILED ../../../project/tests/test_url.py::test_joinpath_relative[empty-element] - AssertionError: assert ('a', 'b', '', 'c') == ('a', 'b', 'c')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants