Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About scope[path] and scope[root_path] #424

Open
abersheeran opened this issue Dec 7, 2023 · 11 comments
Open

About scope[path] and scope[root_path] #424

abersheeran opened this issue Dec 7, 2023 · 11 comments

Comments

@abersheeran
Copy link
Contributor

Related links:

In the first link, @andrewgodwin stated the differences between root_path and path in WSGI. After Starlette actively followed up, this led to the problem in the second link.

Yes, we can make some changes to make it work again, but the question is what is the existing community implementation?

I searched for the code via GitHub and can view the results via the third link. I noticed that in the first page of results, everyone except hypercorn and starlette treat path as the equivalent of PATH_INFO in WSGI. They get the full path by concatenating root_path and path.

So I think we should modify the specification document so that path is exactly the same as PATH_INFO. Otherwise, this will have a strong impact on the existing implementation of the community.

A polite mention of the author of code in the search results, I hope this doesn't bother you. @Kludex @pgjones @kennethreitz @Goldziher @simonw @RobertoPrevato @long2ice If you have time, could you express some thoughts on this?

@abersheeran
Copy link
Contributor Author

abersheeran commented Dec 7, 2023

"PATH_INFO": scope["path"].encode("utf8").decode("latin1"),

asgiref itself seems to consider path and PATH_INFO to be completely equivalent. But if it wants to comply with ASGI's current specification, it should be modified as in this PR - remove SCRIPT_NAME from PATH_INFO. abersheeran/a2wsgi#40

https://asgi.readthedocs.io/en/latest/specs/www.html#wsgi-compatibility

@Goldziher
Copy link

Agree 💯, they are considered equal

@andrewgodwin
Copy link
Member

I put this in the WSGI compat docs very deliberately:

  • SCRIPT_NAME is root_path
  • PATH_INFO can be derived by stripping root_path from path

The WSGI adapter in asgiref does not support hosting things under a prefix/non-root-path; that was mostly because I told myself at the time I'd go back and add it later on. As you can see, I still haven't...

@abersheeran
Copy link
Contributor Author

I put this in the WSGI compat docs very deliberately:

  • SCRIPT_NAME is root_path
  • PATH_INFO can be derived by stripping root_path from path

I mean, it seems that most of the existing ASGI-related procedures do not comply with this standard. Do we still need to insist on this?

@andrewgodwin
Copy link
Member

Standards are obtained by consensus, not by dictat, so if there's strong consensus on another way of doing it from all the main servers we can change it.

@abersheeran
Copy link
Contributor Author

Okay, let's wait for the other people's opinions. As far as the code I looked up, most people still use exactly the same approach as WSGI.

@RobertoPrevato
Copy link

RobertoPrevato commented Dec 10, 2023

Hi @abersheeran no problem with mentioning me, thank you for bringing this up.
I can confirm how my web framework is using root_path today: it is used to obtain the full URL of a web request, including any path fragment a proxy server might use, like "/example-api" in the following example:

  • https://api.foo.example/example-api/ --> going to back-end exposed at https://example.com/

I started using root_path especially to obtain a redirect URL to the app itself when using OpenID Connect integration. This is mentioned here Neoteroi/BlackSheep#372.

Personally, I find natural to think about the "path" as the path from application's perspective, therefore not including the root path, but I don´t have a problem in updating my code to follow the specification.

@tiangolo
Copy link
Contributor

Hello all!

Just to make sure I'm on the same page, I'm gonna share my understanding of the current state, differences, and the current question.

Mounting

I see that the discussion/difference is in the end mainly about WSGI's PATH_INFO vs ASGI's path. Where PATH_INFO includes only the last part of the path after any prefix and path has everything, including any path prefix.

I see the discussion/problem and questions are only relevant to "mounting" ASGI apps, where "mounting" refers to putting the ASGI app behind a path prefix in a situation where the app's code doesn't know about this prefix.

For example, the app defines a route for /users/ in its code, but it is actually behind some frontend proxy like Nginx, under a path prefix of /api/v1/, so, the client URL path is actually /api/v1/users/ even though the app's code doesn't define anything with /api/v1/.

WSGI

In WSGI, PATH_INFO refers to the last part of the path when an app is "mounted" (behind a path prefix that the app is not aware of, e.g. behind a frontend proxy like Nginx in a path /api/v1/), so, if the app is behind Nginx under /api/v1/, and the client requests /api/v1/users/, the WSGI app would get in PATH_INFO only the last part: /users/.

When the WSGI app wants to check which of its own route/path handlers should handle that request, it checks the value of PATH_INFO and matches it as-is with a route handler for that value, in this case, for /users/. So, when finding a route to handle a request, it doesn't need to use SCRIPT_NAME.

The WSGI app would get the path prefix in SCRIPT_NAME, in this example, the value would be /api/v1/. So, if the app wanted to provide the url_for some specific route that the client should use (including the prefix), it would find the route handler for that path, and then concatenate SCRIPT_NAME and PATH_INFO.

ASGI

In ASGI, path contains the full path from the URL the client used, so, in the example above, it would include the prefix of /api/v1/, and the value would be /api/v1/users/.

The ASGI app would receive the path prefix at path_root, containing the same value that SCRIPT_NAME would have contained, in this example, /api/v1/.

When the ASGI app needs to find the route/path handler for a request, it would remove the prefix value in the root_path from the path, so, removing /api/v1/ from /api/v1/users/, to get /users/ and then find the route/path handler in its code for /users/.

When the ASGI app needs to find the url_for a route/path, it would, again, remove the prefix in root_path to find the route/path handler, and then it would concatenate the root_path with the newly found sub-path from the router.

Relevance

Now, this is mainly relevant for things that are mounted behind a proxy, or behind/mounted in another app.

For example, say there's a Starlette app at /, and it mounts a BlackShip app at /api/v2/. Then a client requests for /api/v2/users/. In this case, Starlette would take this request, and transfer it to BlackSheep.

In the current (unmodified) state of the spec, Starlette should send a scope including a root_path with a value of /api/v2/ and a path with a value of /api/v2/users/. And then BlackSheep would use those two values to find its own route/path handler for the path /users/.

The same for a2wsgi (which it already does), e.g. mounting a Flask app inside a FastAPI app, under /api/v1/.

Moving Forward

Even though it could mean a couple of changes and "work" for all of us, I would consider it better to follow the spec as it is, without changing it, and fix this (small) detail in the tools as a "bug fix".

I fear otherwise it could become a scenario of "in version X of SuperASGITool it used the ASGI spec 2.3 as of 2023, but in version Y of SuperASGITool it started following the updated version of the ASGI spec 2.4 updated in 2024".

Something very similar happened with OpenAPI and JSON Schema

OpenAPI followed an old version of JSON Schema that didn't include examples and defined its own examples field with some structure. Then JSON Schema defined its own examples field and OpenAPI was still for a long while using its own examples field, so many things would be partially compatible with one or the other.

Now the latest OpenAPI follows the latest JSON Schema, and things are much simpler, but still, there are some places that accept the old examples field from OpenAPI, with a slightly different format than JSON Schema. So, in a single OpenAPI document, there might be places with a field called examples in one format, and in other specific places other examples fields with another slightly different format.

All that is not terrible, and the difference is very small, but it ends up being somewhat confusing, in particular to users, and requires long text (like this one) explaining all the state of things, what works and what doesn't, when, how, etc.

More details about all the examples stuff with OpenAPI, JSON Schema, FastAPI, and Pydantic here: https://fastapi.tiangolo.com/tutorial/schema-extra-example/#technical-details

...that's the type of long explanations I would hope we can save our users from. 😅

Starlette was recently using the old behavior, but it was fixed to follow the spec by @Kludex and me in encode/starlette#2352 and encode/starlette#2400.

Again, the places that need changing are only those that might be related to "mounting" apps under a path prefix (when that's supported).

@abersheeran
Copy link
Contributor Author

abersheeran commented Jan 11, 2024

Yes, it is good to follow the rules. However, among the existing codes in the community, only a few frameworks or servers comply with this specification. Personally I think it's better to use exactly the same approach as WSGI. (Although I have made a2wsgi compliant spec)

WSGI has been used for more than ten years, and it is really confusing to be similar and different from WSGI in such small details. And this change does not seem to have any good impact. Instead, it makes the framework make more judgments when processing PATH.

We should let as few people as possible modify the code, rather than specify a specification that most people don't follow and then say "Hey, your code is wrong" when their users have problems.

@andrewgodwin
Copy link
Member

This was one of the few explicit divergences I made from WSGI - along with things like improving the string/bytes separation - and it was because I believed that having a "true" HTTP path was the most helpful thing to have in the path attribute rather than some subset path that you can't actually build URLs on without knowing the mounting root anyway.

In ASGI, direct path manipulation with relative paths works perfectly fine; in WSGI, it does not, and you can easily shoot yourself in the foot. Ideally, in both specifications you are going to work out the third missing piece of info, but I regarded it as more important to have the path right out of the gate for simpler ASGI applications (assuming frameworks will have the relevant 3 lines of code to do URL matching, as they tend to).

ASGI itself is now seven years old, so an appeal to how old and fixed something is may not be entirely the right approach!

@carltongibson
Copy link
Member

carltongibson commented Apr 5, 2024

FWIW this behaviour was brought into line in the last Daphne and Channels releases too. Django also is compliant.

carltongibson added a commit to carltongibson/django that referenced this issue Apr 5, 2024
Following the ASGI HTTP Connection Scope docs[0], the provided `path`
is already the correct value that Django requires.

In combination with `root_path`, from which `script_name` is derived,
the `path_info` variable is set. It's then redundant to
re-calculate `path` from `script_name` and `path_info`.

See also, a clarifying discussion on the ASGIref repo[1].

[0]: https://asgi.readthedocs.io/en/latest/specs/www.html#http-connection-scope
[1]: django/asgiref#424
gabn88 pushed a commit to gabn88/django that referenced this issue Jun 4, 2024
Following the ASGI HTTP Connection Scope docs[0], the provided `path`
is already the correct value that Django requires.

In combination with `root_path`, from which `script_name` is derived,
the `path_info` variable is set. It's then redundant to
re-calculate `path` from `script_name` and `path_info`.

See also, a clarifying discussion on the ASGIref repo[1].

[0]: https://asgi.readthedocs.io/en/latest/specs/www.html#http-connection-scope
[1]: django/asgiref#424
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants