-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http connections are not properly closed without a http.Client Timeout property #1040
Comments
Hi @BogdanEAM, Thanks a lot for providing the thorough details and observations! Based on the details you provided, below is my speculation:
To debug further, we need to know or confirm about the reading behaviour of the application - as it holds the connection or not. It would be good, if you could provide gcsfuse logs with --debug_fs and --debug_gcs flags for all the three cases specially the first one: (1) application dead stop (2) in case of explicit time-out (3) when we provide more connections. Regards and thanks, |
Hi @raj-prince, I'm gathering those log files for you. Thanks, |
Hi @BogdanEAM, Sorry for the late response! The concept of time-out was introduce in gcsfuse via the go-storage-client library (existing default client). Before, with jacobsa/gcloud, we were not using the "time-out" property in http client. During the testing, we found an issue (pre-condition error) due to newly introduced timeout property as mentioned in this PR. Since, we were not using this property in earlier version hence removed while using go-storage-client library also. We will discuss within the team, should we keep time-out property or not and accordingly update this thread. Regards, |
Thanks for the info. I checked and it looks like the JVM is holding the locks for all the jar files that are loaded. This is how the JVM works, not something specific to our application. Another solution that I can think of is to update the gcsfuse code to close the connection when a file has been fully retrieved from GCS and not when the file handle is released. Thanks you, |
Hey @BogdanEAM , Thanks for your reply. We are investigating the possible options here and we will get back to you once we take a decision. |
@BogdanEAM , Thanks for your patience. We have supported http client timeout in GCSFuse v0.42.4. The timeout can be controlled with |
Thank you very much. |
Describe the bug
We are starting a Java applications from a gcsfuse mount folder.
The app has 139 lib jars and many more prop files and classes.
With the gcsfuse version v0.41.12 it starts fine. However with versions v0.42.2 and v0.42.3 it does not start.
The difference that is causing the issue between these version is the removal of "Timeout" property from the http.Client created in "internal\storage\storage_handle.go" method NewStorageHandle.
Runtime flags
used for all versions:
-o noexec --dir-mode 700 --stackdriver-export-interval 60s --type-cache-ttl 20h -o ro -o nosuid -o nodev --file-mode 400 --enable-storage-client-library=true --implicit-dirs --enable-nonexistent-type-cache --stat-cache-ttl 20h --stat-cache-capacity 40960
Behaviour
The startup of the app always hits a dead stop when gcsfuse is reading a certain (always the same) file from the lib folder. With the _gcs flag we can see that it is always the same file.
By doing a netstat in parallel with the application startup we can see that the number of "ESTABLISHED" connections for the respective gcsfuse process grows until it reaches the number of 101 connections which is the default value for "max-conns-per-host".
Workaround 1
If we add the flag "--max-conns-per-host 300" the application start fine and the number of connections reported by netstat caps at around 115 connections.
Workaround 2
I have rebuild the code for v0.42.3 by reading the "Timeout" flag in "internal\storage\storage_handle.go" in the NewStorageHandle with the previous default value of 800 milliseconds.
The neststat in this case shows that connections are not pilling up, their number increases and decreases while application starts and never reaches a number higher than 15 connections.
If I set the Timeout to 10 seconds the max number of connections observed was 35 and again the app starts fine.
The application start fine.
Running netstat in parallel as described above shows that connections are not pilling up, their number increases and decreases while application starts and never reaches a number higher than 20 connections.
To Collect more Debug logs
By adding some "log.Println" in the Close() method from "vendor\cloud.google.com\go\storage\http_client.go" we can see that the "r.body" is of different type depending if the Timeout property is set or not:
Another thing that we observed is that, after starting the applications with "--max-conns-per-host 300" or with Timeout property, after we stop the application we see some --debug_gcs specific logs like this:
Notice the duration times are in minutes and they are approximately how long the app was running.
This happens regardless if the Timeout is present or not but as I said, with this set the application starts.
System (please complete the following information):
NAME="CentOS Linux"
VERSION="7 (Core)"
Description: CentOS Linux release 7.9.2009 (Core)
Release: 7.9.2009
The text was updated successfully, but these errors were encountered: