Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP: Ensure REQUEST_URI immutability. #1162

Merged
merged 4 commits into from
May 9, 2024
Merged

Conversation

hongzhidao
Copy link
Contributor

@hongzhidao hongzhidao commented Feb 28, 2024

  1. Another implementation of proxy request creation.</de.
  2. Made the $request_uri constant.
  3. Passed constant REQUEST_URI to applications.

Need to separate the PR after tests pass.

Here's a manual test.

# conf.json

{
    "listeners": {
        "*:8080": {
            "pass": "routes/a"
        },
        "*:8081": {
            "pass": "routes/b"
        }
    },
    "routes": {
        "a": [
            {
                "action": {
                    "rewrite": "/foo.php",
                    "proxy": "http://127.0.0.1:8081"
                }
            }
        ],
        "b": [
            {
                "action": {
                    "rewrite": "/index.php",
                    "pass": "applications/app"
                }
            }
        ]
    },
    "applications": {
        "app": {
            "type": "php",
            "root": "/www",
            "index": "index.php"
        }
    }
}
# /www/index.php

<?php

phpinfo();
> curl http://127.1:8080
$_SERVER['REQUEST_URI'] => /foo.php

@hongzhidao
Copy link
Contributor Author

Hi @andrey-zelenkov,
Since this is a change, we need to add more tests to it.

  1. add rewrite on "proxy" which is similar to "php application rewrite". It doesn't depend on the PR.
  2. change the tests for $request_uri if possible.
  3. change the tests for "applications" if possible.
    Thanks.

@ac000
Copy link
Member

ac000 commented Mar 5, 2024

Hmm, I thought the idea was that REQUEST_URI doesn't get munged?

> curl http://127.1:8080
$_SERVER['REQUEST_URI'] => /foo.php

In which case I would have expected REQUEST_URI would be set to / in the above instance.

@hongzhidao
Copy link
Contributor Author

Hmm, I thought the idea was that REQUEST_URI doesn't get munged?

> curl http://127.1:8080
$_SERVER['REQUEST_URI'] => /foo.php

In which case I would have expected REQUEST_URI would be set to / in the above instance.

Here are a few namings:

  1. REQEUST_URI is for applications, it must be unchanged.
  2. $request_uri corresponding internal r->target, both of them are unchanged.

So your understanding is correct.
For the above configuration, there are two requests. The first one is the client request and is accepted by 8080.

"action": {
                    "rewrite": "/foo.php",
                    "proxy": "http://127.0.0.1:8081"
                }

The request_uri is unchanged. Then Units generates the second request with a rewritten uri to pass to 8081.
Now the second request is /foo.php, and it will be accepted by 8081.

{
                "action": {
                    "rewrite": "/index.php",
                    "pass": "applications/app"
                }
            }

r->path = nxt_mp_alloc(r->mem_pool, sizeof(nxt_str_t));
if (nxt_slow_path(r->path == NULL)) {
return NXT_ERROR;
}

*r->path = rp.path;

r->quoted_target = rp.quoted_target;
r->uri_changed = 1;

if (nxt_slow_path(r->log_route)) {
nxt_log(task, NXT_LOG_NOTICE, "URI rewritten to \"%V\"", &r->target);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to fix this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friends, just asking , any chance to get in in next release?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should definitely make 1.33, I'm just waiting for this commit to be appropriately split up before reviewing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it will be in the next release.
We'll do it like this:

  1. Andrei adds more tests for the proxy that has been in the Tests: added $request_uri tests with proxy #1211.
  2. I'll create a PR split from HTTP: Ensure REQUEST_URI immutability. #1162.
  3. Continue on the HTTP: Ensure REQUEST_URI immutability. #1162 and related tests.

@hongzhidao
Copy link
Contributor Author

Hi,
I created a new PR #1214 prepared for this patch, and it will simplify this PR a lot.
I'll rework this PR after #1214 and #1211 are shipped.

@hongzhidao hongzhidao changed the title HTTP: improved request uri. HTTP: Ensure REQUEST_URI immutability. Apr 15, 2024
@javorszky
Copy link
Contributor

javorszky commented Apr 26, 2024

@andrey-zelenkov @hongzhidao: I want to double check the failing tests: https://github.com/nginx/unit/actions/runs/8718395388/job/23915553886?pr=1162#step:41:1060

I tested manually, and given the configuration in the test and the request, the log line ends up being

::1 /bar /blah%2fblah?a=b
192.168.65.1 /foo /blah%2fblah?a=b

This is correct given the request URI. I think those tests should be changed given the outcome of the code they're testing has been changed in this PR.

I'm working on a PR to this branch at hongzhidao#1, but wanted to run this by you as you're most familiar with the tests 🙂

I'll keep working on that PR as I fix the tests.

@andrey-zelenkov
Copy link
Contributor

Thank you for pointing this out! I discussed it with @hongzhidao and we agreed to include test fixes in his PR. This will demonstrate how these changes affect tests and facilitate GitHub Actions checks.

Copy link
Contributor Author

@hongzhidao hongzhidao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @javorszky,

tests: Change request_uri tests for changed behaviour Unverified

Any idea why this commit showed Unverified on GH?

@javorszky
Copy link
Contributor

javorszky commented Apr 30, 2024

I think it's because you brought in my change, the change is of a different sha as my signature is for, so it fails. The original commit is verified if that helps.

Specifically the differences between hongzhidao@ae4081f (Verified) and 6d6c68c (Unverified) include:

  • 6d6c has more code changes
  • 6d6c has a different commit message (title has test: prefix and empty commit message body)
  • 6d6c has two authors

Each of these on their own would change the commit's digest that the original signature is for 🙂

Copy link
Member

@ac000 ac000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From 1ff64a7dc183a2f73940582cab61f125e59e341a Mon Sep 17 00:00:00 2001
From: Zhidao HONG <z.hong@f5.com>
Date: Tue, 30 Apr 2024 14:30:24 +0800
Subject: [PATCH] http: Use consistent target in nxt_h1p_peer_header_send()

No functional changes.
---
 src/nxt_h1proto.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/nxt_h1proto.c b/src/nxt_h1proto.c
index c3a656793..54fe77762 100644
--- a/src/nxt_h1proto.c
+++ b/src/nxt_h1proto.c
@@ -2306,7 +2306,7 @@ nxt_h1p_peer_header_send(nxt_task_t *task, nxt_http_peer_t *peer)
 
     p = nxt_cpymem(p, r->method->start, r->method->length);
     *p++ = ' ';
-    p = nxt_cpymem(p, r->target.start, r->target.length);
+    p = nxt_cpymem(p, target.start, target.length);
     p = nxt_cpymem(p, " HTTP/1.1\r\n", 11);
     p = nxt_cpymem(p, "Connection: close\r\n", 19);

Hi @hongzhidao

Are you sure this has no functional change?

target may be set to r->target in nxt_h1p_peer_request_target() in which case it would be no functional change. However it may also be set to something different.

I agree with the change in of itself, as that's what we use in the previous size calculation., but it looks like it may actually fix a potential issue.

Copy link
Member

@ac000 ac000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From f8d106920ebda1671d56d6978cfc1cc9da4b5625 Mon Sep 17 00:00:00 2001
From: Zhidao HONG <z.hong@f5.com>
Date: Tue, 30 Apr 2024 14:45:18 +0800
Subject: [PATCH] http: Ensure REQUEST_URI immutability

Previously, the REQUEST_URI within Unit could be modified,
for example, during uri rewritting. We plan to make $request_uri
immutable and pass constant REQUEST_URI to applications.
Based on the new requirement, we remove `r->target` rewritting
in the rewrite module.

Closes: https://github.com/nginx/unit/issues/#916

Some small typos: s/rewritting/rewriting/g and s/#916/916/ in the closes tag...

We plan to make $request_uri immutable and pass constant REQUEST_URI to applications.

This commit does that right? as it makes it sounds as if there is further work to do...

@hongzhidao
Copy link
Contributor Author

Are you sure this has no functional change?

Yes, I think so. Before we change the rule, I mean REQUEST_URI is changeable, the target and r->target have the same value.

target = path + args;
r->target = path + args;

When URL rewriting happens, both path and r->target are changed.

@hongzhidao
Copy link
Contributor Author

From f8d106920ebda1671d56d6978cfc1cc9da4b5625 Mon Sep 17 00:00:00 2001
From: Zhidao HONG <z.hong@f5.com>
Date: Tue, 30 Apr 2024 14:45:18 +0800
Subject: [PATCH] http: Ensure REQUEST_URI immutability

Previously, the REQUEST_URI within Unit could be modified,
for example, during uri rewritting. We plan to make $request_uri
immutable and pass constant REQUEST_URI to applications.
Based on the new requirement, we remove `r->target` rewritting
in the rewrite module.

Closes: https://github.com/nginx/unit/issues/#916

Some small typos: s/rewritting/rewriting/g and s/#916/916/ in the closes tag...

We plan to make $request_uri immutable and pass constant REQUEST_URI to applications.

This commit does that right? as it makes it sounds as if there is further work to do...

Good point, it is a decision already made, not something that will happen in the future.

@hongzhidao
Copy link
Contributor Author

I think it's because you brought in my change, the change is of a different sha as my signature is for, so it fails. The original commit is verified if that helps.

Specifically the differences between hongzhidao@ae4081f (Verified) and 6d6c68c (Unverified) include:

  • 6d6c has more code changes
  • 6d6c has a different commit message (title has test: prefix and empty commit message body)
  • 6d6c has two authors

Each of these on their own would change the commit's digest that the original signature is for 🙂

I did it based on your patch and committed it with --author="...", is this way correct?

@ac000
Copy link
Member

ac000 commented Apr 30, 2024

Yes, I think so. Before we change the rule, I mean REQUEST_URI is
changeable, the target and r->target
have the same value.

target = path + args;
r->target = path + args;

When URL rewriting happens, both path and r->target are changed.

I was more thinking about

nxt_h1p_peer_header_send()
    nxt_h1p_peer_request_target()
        nxt_encode_complex_uri()

Could r->target & target be different after the call to
nxt_encode_complex_uri()?

@javorszky
Copy link
Contributor

I did it based on your patch and committed it with --author="...", is this way correct?

Yes, but the verification is expected to fail. Depends what's more important to the project. In all honesty I don't care to get my patch included with me as an author, as long as the tests are fixed, this PR merged, and a new release with the request uri changes published.

@ac000
Copy link
Member

ac000 commented Apr 30, 2024

We currently don't really worry about signing commits, so I'd say don't worry about it.

@hongzhidao
Copy link
Contributor Author

Yes, I think so. Before we change the rule, I mean REQUEST_URI is
changeable, the target and r->target
have the same value.

target = path + args;                                                                                     
r->target = path + args;                                                                                  

When URL rewriting happens, both path and r->target are changed.

I was more thinking about

nxt_h1p_peer_header_send()
    nxt_h1p_peer_request_target()
        nxt_encode_complex_uri()

Could r->target & target be different after the call to nxt_encode_complex_uri()?

Hmm, they should be the same.
Take /blah%2Fblah as an example.
The r->target is /blah%2Fblah, and the r->quoted_target is set after the uri parsed.
Then in the peer header send, we did something like this.

    if (r->quoted_target) {
        encode = nxt_encode_complex_uri(NULL, r->path->start,
                                        r->path->length);
    } else {
        encode = 0;
    }

@ac000
Copy link
Member

ac000 commented Apr 30, 2024

Just so I fully understand this.

static nxt_int_t
nxt_h1p_peer_request_target(nxt_http_request_t *r, nxt_str_t *target)
{
    u_char  *p;
    size_t  size, encode;

    if (!r->uri_changed) {
        *target = r->target;
        return NXT_OK;
    }

    if (!r->quoted_target && r->args->length == 0) {
        *target = *r->path;
        return NXT_OK;

So you're saying that at this point ->target and ->path are the
same?

Carrying on

...

    target->start = nxt_mp_nget(r->mem_pool, size);
    if (target->start == NULL) {
        return NXT_ERROR;
    }

    if (r->quoted_target) {
        p = (u_char *) nxt_encode_complex_uri(target->start, r->path->start,
                                              r->path->length);

    } else {
        p = nxt_cpymem(target->start, r->path->start, r->path->length);
    }

    if (r->args->length > 0) {
        *p++ = '?';
        p = nxt_cpymem(p, r->args->start, r->args->length);
    }

    target->length = p - target->start;

...

After all that, target would be the same as r->target?

@hongzhidao
Copy link
Contributor Author

So you're saying that at this point ->target and ->path are the
same?

Yep, and it might be not changed, in this case, we don't need to allocate a new memory.

After all that, target would be the same as r->target?

Correct.

@ac000
Copy link
Member

ac000 commented Apr 30, 2024

After all that, target would be the same as r->target?

Correct.

That then begs the question, why even call
nxt_h1p_peer_request_target() ? If it's going to essentially return
r->target...

@hongzhidao
Copy link
Contributor Author

hongzhidao commented Apr 30, 2024

I'm afraid of forgetting something here. Let me explain it again, and let's assume we are talking about the new change that makes REQUEST_URI constant.

After all that, target would be the same as r->target?

Without uri rewriting, they are the same.
Before making REQUEST_URI constant, take the first commit as an example, target from the calculation in nxt_h1p_peer_request_target() has the same value with `r->target.

Then we decide to make REQUEST_URI constant, and internally, I keep r->target unchanged, since the variable $request_uri corresponds to r->target.
But for the proxy module, it needs to pass changeable request uri (path + arguments) to the backend.
Note that path is changeable with URI rewriting.

After all that, target would be the same as r->target?

If we are talking about the new change of REQUEST_URI rule, I'd say not.

@ac000
Copy link
Member

ac000 commented Apr 30, 2024

OK, I think we're getting there...

To clarify, I'm talking specifically about the first change

From: Zhidao HONG <z.hong@f5.com>
Date: Tue, 30 Apr 2024 14:30:24 +0800
Subject: [PATCH] http: Use consistent target in nxt_h1p_peer_header_send()

No functional changes.

So at the point of this change, I.e before the second patch,
target & r->target would be the same.

After the second patch http: Ensure REQUEST_URI immutability they
may be different.

Assuming I've got that right, I think I'd like a little more detail in
the first commit message saying that this change is required for the
second commit, after which target & r->target may be different.

@hongzhidao
Copy link
Contributor Author

OK, I think we're getting there...

To clarify, I'm talking specifically about the first change

From: Zhidao HONG <z.hong@f5.com>
Date: Tue, 30 Apr 2024 14:30:24 +0800
Subject: [PATCH] http: Use consistent target in nxt_h1p_peer_header_send()

No functional changes.

So at the point of this change, I.e before the second patch, target & r->target would be the same.

After the second patch http: Ensure REQUEST_URI immutability they may be different.

Assuming I've got that right, I think I'd like a little more detail in the first commit message saying that this change is required for the second commit, after which target & r->target may be different.

Makes sense, thanks.

@ac000
Copy link
Member

ac000 commented May 7, 2024

Hi @hongzhidao

You can add my Reviewed-by: to the two http: patches (not the tests as I haven't really looked at them), you should also add your s-o-b, so if you can add

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>
Signed-off-by: Zhidao HONG <z.hong@f5.com>

(for the second commit they should come immedaitely after the Closes: tag... you know how it goes...)

@hongzhidao
Copy link
Contributor Author

hongzhidao commented May 8, 2024

Hi @ac000 @andrey-zelenkov,
It only failed one test, but it didn't seem to have anything to do with the patches.

[test (ruby-3.2, ubuntu-latest)](https://github.com/nginx/unit/actions/runs/8996125783/job/24712141194#logs)
failed 11 minutes ago in 2m 55s
Search logs
    
        assert 'success' in client.conf(
            '"/ruby/status_int/config.ru"',
            'applications/status_int/script',
        )
    
        assert 'success' in client.conf(
            '"/ruby/status_int"',
            'applications/status_int/working_directory',
        )
    
>       assert client.get()['status'] == 200, 'status int'

test/test_ruby_isolation.py:43: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test/unit/http.py:165: in get
    return self.http('GET', **kwargs)
test/unit/http.py:103: in http
    resp = self.recvall(sock, **recvall_kwargs).decode(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <unit.applications.lang.ruby.ApplicationRuby object at 0x7f0200cac1c0>
sock = <socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 57976), raddr=('127.0.0.1', 8080)>
kwargs = {}, timeout_default = 60, timeout = 60, buff_size = 4096, data = b''
rlist = []

    def recvall(self, sock, **kwargs):
        timeout_default = 60
    
        timeout = kwargs.get('read_timeout', timeout_default)
        buff_size = kwargs.get('buff_size', 4096)
    
        data = b''
        while True:
            rlist = select.select([sock], [], [], timeout)[0]
            if not rlist:
                # For all current cases if the "read_timeout" was changed
                # than test do not expect to get a response from server.
                if timeout == timeout_default:
>                   pytest.fail("Can't read response from server.")
E                   Failed: Can't read response from server.

test/unit/http.py:189: Failed
=========================== short test summary info ============================
FAILED test/test_ruby_isolation.py::test_ruby_isolation_rootfs - Failed: Can't read response from server.
ERROR test/test_ruby_isolation.py::test_ruby_isolation_rootfs - SystemExit: Could not unmount filesystems in tmpdir ({temporary_dir}).
========= 1 failed, 38 passed, 7 skipped, 1 error in 132.30s (0:02:12) =========
Error: Process completed with exit code 1.

@ac000
Copy link
Member

ac000 commented May 8, 2024

Yes, it's a known issue.

@hongzhidao
Copy link
Contributor Author

Ok. Let me know if it's ready to ship.

@ac000
Copy link
Member

ac000 commented May 8, 2024

Yes, it looks OK to me...

hongzhidao and others added 4 commits May 9, 2024 09:51
This change is required for the next commit, after which target
and r->target may be different. Before the next patch, target and
r->target would be the same.

No functional changes.

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>
Signed-off-by: Zhidao HONG <z.hong@f5.com>
Previously, the REQUEST_URI within Unit could be modified,
for example, during uri rewriting. We decide to make $request_uri
immutable and pass constant REQUEST_URI to applications.
Based on the new requirement, we remove `r->target` rewriting
in the rewrite module.

Closes: nginx#916
Reviewed-by: Andrew Clayton <a.clayton@nginx.com>
Signed-off-by: Zhidao HONG <z.hong@f5.com>
@hongzhidao hongzhidao merged commit 0000976 into nginx:master May 9, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants