Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled exception on startup (and other activities) for 1.9.0 on Windows #5009

Closed
byjrack opened this issue Jun 21, 2023 · 21 comments
Closed
Assignees
Labels
area/proxy kind/bug Something isn't working regression Functionality was working in a previous release and is now broken
Milestone

Comments

@byjrack
Copy link

byjrack commented Jun 21, 2023

Actual Behavior

Getting unhandled exceptions which I believe is related to electron supporting the system proxy, but no 407 handling implemented. We use a PAC file model and based on the little bit in the exception I am guessing it's found a proxy, but is trying to access a resource that would require authentication.

The exception continues to fire over and over sort of breaking the GUI and creating deadlocks waiting for the msgbox to be closed.

Steps to Reproduce

Upgrade from 1.8.1
Have Automatic Proxy setup specifying a PAC file
Start 1.9.0

Result

image

Expected Behavior

No exceptions

Additional Information

Once RD was up and initialization had completed i could use rdctl to set an address, port, and enabled.

Just an aside... the enabled flag looks to not have a boolean param so you just set that w no param. If you wanted to disable proxy via rdctl do you just call enabled again?

Even after setting a direct proxy that doesn't force auth for most destinations, I was still getting the exceptions. They seem to correlate with attempts to hit example.com which is firing a 407 and the electron side is not handling that. That looks to be tied to the "online" test. Note setting creds in the experimental flag didn't seem to clue the code into handling the 407 via Basic.

Rancher Desktop Version

1.9.0

Rancher Desktop K8s Version

1.25.6

Which container engine are you using?

containerd (nerdctl)

What operating system are you using?

Windows

Operating System / Build Version

19044.2965

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

system proxy provided via PAC file URL and default authentication required

@byjrack byjrack added the kind/bug Something isn't working label Jun 21, 2023
@byjrack
Copy link
Author

byjrack commented Jun 21, 2023

I never had these issues on 1.8.1 or earlier so not sure if the GUI code was tied into the proxy config for WSL that came in 1.9.0.

I know the whole am I connected thing using example.com is up for discussion in #4639

@byjrack
Copy link
Author

byjrack commented Jun 21, 2023

And just refreshed my mac (aarch) to 1.9. Looks like it isn't just restricted to Windows and the new proxy support. Guessing some OS proxy awareness got activated in 1.9 and it is grabbing a proxy as upstream and it's tripping up somewhere in the middle

image

@jandubois jandubois added triage/patch-candidate regression Functionality was working in a previous release and is now broken labels Jun 21, 2023
@jandubois jandubois added this to the 1.10 milestone Jun 21, 2023
@byjrack
Copy link
Author

byjrack commented Jun 21, 2023

And looks like the error is coming from Ps.checkForUpdates at least on Mac, but not seeing a similar log event on Windows. Why this suddenly changed I am still figuring out.

@adamkpickering
Copy link
Member

@byjrack I messaged you on slack but you must not have seen it yet. Is there any chance you could try reproducing this using dev builds before and after 235870f? It's possible that this change has something to do with the problem.

@Nino-K
Copy link
Member

Nino-K commented Jun 23, 2023

Hi @byjrack I set up an environment to test out this scenario, I will explain a bit so that you can point out if I missed anything. I set up a squid proxy server with basic auth, I have then created a basic .pac file and configured my windows machine to detect the .pac setting automatically. However, when I start the rancher desktop, I'm not getting the dialog box you mentioned above.

Furthermore, looking at the error in the dialog box, it looks as though the new proxy-agent bump that we did in 1.9 is no longer recognizing the response from your proxy server. It looks as though the response from your proxy is missing headers or trimmed off (maybe an old HTTP 0.9 response?!). looking at the error:

Error Invalid Header: <HTML><HEAD>

One way that we can rectify this in the short term is to change the http-proxy-agent back to 5.0.1 and investigate what has changed in 6.1.0 that is causing the issue.

Also, are you able to capture the response from your proxy server? Of course, feel free to redact anything sensitive but it allows me to understand the scenario better.

@Nino-K
Copy link
Member

Nino-K commented Jun 23, 2023

In addition to what I mentioned above, I just noticed that we bumped https-proxy-agent and not http-proxy-agent so I would probably need to change my .pac file change the way I was testing this.

@byjrack
Copy link
Author

byjrack commented Jun 23, 2023

Yeah @Nino-K I am working on getting a dev env going so i can do some tests using the commit prior and the one after the 6.1 upgrade. Lots of the build assumes no proxy so it's fun working through all the gates.

Did a cap and is what I expected
56 6.163638 10.20.13.63 10.196.133.20 HTTP 140 CONNECT example.com:443 HTTP/1.1

CONNECT example.com:443 HTTP/1.1
Host: example.com:443
Proxy-Connection: close

59 6.185083 10.196.133.20 10.20.13.63 HTTP 292 HTTP/1.1 407 Proxy Authentication Required (text/html)

<TITLE>HTTP 401 Unauthorized: Access Denied </TITLE> ... HTTP 401: Your credentials could not be authenticated: "Credentials are missing.". You will not be permitted access until your credentials can be verified. ...

My gut is its not the PAC file itself. Maybe 6.1 introduced the ability to parse PAC files though which is why its showing up. The structure of the agents repo significantly changed between 5 and 6.1, but the code seems pretty similar so possibly a transitive dep. I may even just stub something out with https-proxy-agent to test it out tomorrow rather than get a whole dev env going.

The issue is the when it gets the 407 and the 401 embedded in the HTML content that does not seem to be handled correctly. On your test proxy are you forcing authentication? For squid it would be something like what is described in this SO post. What you are hoping to get back from the proxy is a 407 with a Proxy-Authenticate: Basic.

@adamkpickering
Copy link
Member

Hi @byjrack, I'm going to continue our conversation here so others can see what we're doing.

It sounds like it might not be feasible for you to get a dev environment up. If so, I think have another way to test this. You might actually be aware of it... when a PR is merged to main, the package.yml github actions workflow is run on it. The result of this workflow is installable packages. These packages are uploaded to the Summary page of the actions run. Are you able to download and install RD from there?

The actions run from when 4619 was merged (before 4622 was merged) is https://github.com/rancher-sandbox/rancher-desktop/actions/runs/4929519429.

The actions run from when 4622 was merged is https://github.com/rancher-sandbox/rancher-desktop/actions/runs/4929730309.

If the first build works for you, and the second doesn't, we know that 4622 introduced the problem.

@byjrack
Copy link
Author

byjrack commented Jun 23, 2023

Was hoping to get some proxy tweaks approved to avoid the stop/start, but 💯 pre-built tags are definitely easier to deal with than my compliance folks.

I didn't even think to check if your Actions runs for the builds. 😅 Let me clean up my existing install and run through. Stay tuned.

@Nino-K
Copy link
Member

Nino-K commented Jun 23, 2023

On your test proxy are you forcing authentication? For squid it would be something like what is described in this SO post. What you are hoping to get back from the proxy is a 407 with a Proxy-Authenticate: Basic.

Yes, I'm forcing the authentication:

auth_param basic program /usr/lib/squid/basic_ncsa_auth /etc/squid/passwords
auth_param basic realm Squid proxy-caching web server
auth_param basic credentialsttl 24 hours
auth_param basic casesensitive off
acl authenticated proxy_auth REQUIRED //<------ this should force auth
http_access allow authenticated
http_access deny all //<------ this should deny anything else
dns_v4_first on
forwarded_for delete
via off

A simple way to test that is accessing any web resource on the browser prompts me for the basic credentials.

@byjrack
Copy link
Author

byjrack commented Jun 23, 2023

Had to manually clean up App Support directories as the event wasn't triggering if things immediately crashed. and this is my mac.

Version: 1.7.0-1273-g9b2ad4a2
No errors, clean hits to the proxy.

2023-06-23 18:48:44 username OBSERVED CONNECT tcp://example.com:443/ 200 - "Technology/Internet" - TCP_TUNNELED - 235 5417 969 tcp example.com 443 / - - - 93.184.216.34 443 - 8001 - 8089 HTTP/1.1

Version: 1.7.0-1276-g235870fb
Unhandled

2023-06-23 18:49:09 - DENIED CONNECT tcp://example.com:443/ 407 - "Technology/Internet" - authentication_failed TCP_DENIED - 1 318 115 tcp example.com 443 / - - - 93.184.216.34 443 - 8001 - 0 HTTP/1.1

1273/5.0.1 seems to use a different mechanism (eg not curlrc, https_proxy, NSURLSession, etc) so it is able to ride a preexisting auth while 1276/6.1.0 seems to panic when it gets the 407. Still trying to figure out why as the agent source doesn't seem like much changed.

@byjrack
Copy link
Author

byjrack commented Jun 23, 2023

Agreed @Nino-K that looks solid. Are you dumping the access logs to a file? Can you see the 407 being sent from squid on CONNECT back to RD when it tries to hit example.com?

@Nino-K
Copy link
Member

Nino-K commented Jun 23, 2023

@byjrack Interestingly looking at the access logs it looks as though RD traffic is always getting denied:

1687546947.448      0 10.0.0.97 TCP_DENIED/407 3948 CONNECT example.com:443 - HIER_NONE/- text/html
1687546952.453      0 10.0.0.97 TCP_DENIED/407 3948 CONNECT example.com:443 - HIER_NONE/- text/html
1687546957.468      0 10.0.0.97 TCP_DENIED/407 3948 CONNECT example.com:443 - HIER_NONE/- text/html
1687546962.484      0 10.0.0.97 TCP_DENIED/407 3948 CONNECT example.com:443 - HIER_NONE/- text/html
1687546967.498      0 10.0.0.97 TCP_DENIED/407 3948 CONNECT example.com:443 - HIER_NONE/- text/html

I expected that since we don’t have any way of providing credentials to Rancher Desktop when an authenticated proxy is configured via .pac for the front end (electron). So, something that we know for sure needs to be addressed is a way of handling proxy credentials when one is configured.
Of course, this requires a bigger discussion since we don’t probably want to store the user credentials in any setting files, etc. The ideal approach would be if RD requests the credentials directly from the system, since so many Windows enterprise users may not even know their credentials.

Based on the above, how did this ever work in 1.8? Since, in that release, we never handled any credentials either. Could this issue now be triggered by a lack of proxy credential management from RD along with some changes in your upstream proxy?

@byjrack
Copy link
Author

byjrack commented Jun 23, 2023

It is a head scratcher agreed.

Never had a problem up to 1.9.0 and been using every major release on both windows and mac for a while as I have been trying to get more options in this space for developers.

Was looking at 5.0.1 vs 6.1 and the exception was in the parse response

image

The header decode function is all new. Before it was basically a status code return, but now it gives you everything or it is supposed to.

Line 65 and how it gets the headers may be a bit bugged in certain situations. I am wondering if our proxy is sending back the text content without "\r\n" and just "\n" so the split isn't functioning as expected and so the header detection process falls apart. I need to look at my capture again and see if i can make some educated guesses. As well as see if I can replicate the issue out of RD.

@byjrack
Copy link
Author

byjrack commented Jun 26, 2023

Yup did a quick mjs to prove things out.

https://github.com/TooTallNate/proxy-agents/blob/main/packages/https-proxy-agent/src/parse-proxy-response.ts#L66 assumes that it only has the headers in the buffer which in my case shows that the content (a simple text 401 page talking about the problem) was also added to the array. As soon as it hits that element it tosses an exception which is not handled by Rancher Desktop.

The buffer logic I am trying to figure out the intent of. I can see that The end of the buffers looks to be found in the byte array and my call is small so it includes the whole transaction in a single shot. It feels like headerParts should have not split the entire buffer, but only the slice stopping at endOfHeaders.

const headerParts = buffered.slice(0,endOfHeaders).toString('ascii').split('\r\n');

In my testing it's solved the unhandled exception. Checking upstream w the author on intent and their thoughts.

@byjrack
Copy link
Author

byjrack commented Jun 26, 2023

And @Nino-K I think to trigger the bug you will need a deny_info page so that some text content is returned along with the 407 in the header. That I think will trigger the bug since the custom error page should get split into the headerParts and trigger the "invalid" exception.

@byjrack
Copy link
Author

byjrack commented Jun 26, 2023

I also retested using 5.0.1 and things work as expected (no exception) as the parse code only grabs the status code from the firstLine logic so all the rest of the buffer (headers and content) are tossed. That gets to why 1.9 has this issue while 1.8 and before does not.

@rudironsonijr
Copy link

Yup did a quick mjs to prove things out.

https://github.com/TooTallNate/proxy-agents/blob/main/packages/https-proxy-agent/src/parse-proxy-response.ts#L66 assumes that it only has the headers in the buffer which in my case shows that the content (a simple text 401 page talking about the problem) was also added to the array. As soon as it hits that element it tosses an exception which is not handled by Rancher Desktop.

The buffer logic I am trying to figure out the intent of. I can see that The end of the buffers looks to be found in the byte array and my call is small so it includes the whole transaction in a single shot. It feels like headerParts should have not split the entire buffer, but only the slice stopping at endOfHeaders.

const headerParts = buffered.slice(0,endOfHeaders).toString('ascii').split('\r\n');

In my testing it's solved the unhandled exception. Checking upstream w the author on intent and their thoughts.

Hi, @byjrack. Is it possible to create a patch release for this issue?

@byjrack
Copy link
Author

byjrack commented Jun 26, 2023

@rudironsonijr I think that is what the Rancher team is planning, but when it will land I will leave to them. You can see the work in flight in #5052.

@byjrack
Copy link
Author

byjrack commented Jun 26, 2023

And looks like it made the cut for 1.9.1 based on that merge so shouldn't be waiting long.

@gaktive
Copy link
Contributor

gaktive commented Jun 26, 2023

@Nino-K to track things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy kind/bug Something isn't working regression Functionality was working in a previous release and is now broken
Projects
None yet
Development

No branches or pull requests

6 participants