Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-5124 configurable http timeouts #5125

Merged
merged 3 commits into from
Oct 21, 2024

Conversation

hmottestad
Copy link
Contributor

@hmottestad hmottestad commented Sep 13, 2024

GitHub issue resolved: #5124

Briefly describe the changes proposed in this PR:

  • introduced timeout for the underlying Apache HttpClient
  • made them configurable
  • made different timeouts for when using with a SPARQL SERVICE call

PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

@hmottestad
Copy link
Contributor Author

hmottestad commented Sep 18, 2024

	/**
 	 * Default HTTP connection timeout in milliseconds for general use. Set to 1 hour.
 	 */
 	public static final int DEFAULT_CONNECTION_TIMEOUT = 60 * 60 * 1000; // 1 hour

 	/**
 	 * Default HTTP connection request timeout in milliseconds for general use. Set to 10 days.
 	 */
 	public static final int DEFAULT_CONNECTION_REQUEST_TIMEOUT = 10 * 24 * 60 * 60 * 1000; // 10 days

 	/**
 	 * Default HTTP socket timeout in milliseconds for general use. Set to 10 days.
 	 */
 	public static final int DEFAULT_SOCKET_TIMEOUT = 10 * 24 * 60 * 60 * 1000; // 10 days

 	// Default timeout values for SPARQL SERVICE calls

 	/**
 	 * Default HTTP connection timeout in milliseconds for SPARQL SERVICE calls. Set to 10 minutes.
 	 */
 	public static final int DEFAULT_SPARQL_CONNECTION_TIMEOUT = 10 * 60 * 1000; // 10 minutes

 	/**
 	 * Default HTTP connection request timeout in milliseconds for SPARQL SERVICE calls. Set to 6 hours.
 	 */
 	public static final int DEFAULT_SPARQL_CONNECTION_REQUEST_TIMEOUT = 6 * 60 * 60 * 1000; // 6 hours

 	/**
 	 * Default HTTP socket timeout in milliseconds for SPARQL SERVICE calls. Set to 6 hours.
 	 */
 	public static final int DEFAULT_SPARQL_SOCKET_TIMEOUT = 6 * 60 * 60 * 1000; // 6 hours

@hmottestad
Copy link
Contributor Author

  • connection timeout is the amount of time it will wait for the TCP connection to be established
  • socket timeout is the amount of time it will wait for data on the socket, network level
  • connection request timeout is the amount of time it will wait to get a connection from the pool

@hmottestad
Copy link
Contributor Author

The socket timeout should be sufficiently long that a complex query won't timeout while waiting for a response. When queries are sent using the SERVICE call then we assume that there is a remote server that we don't really have control over, so we don't want to wait as long as we otherwise would on a query to our own workbench/server.

The connection timeout should be long enough not to be disruptive for background tasks or batch request that we would prefer to wait for the server to respond rather than error out. We don't want it to be too long though, in case the server actually isn't responsive. For SERVICE calls we want to fail a bit faster, but still give the server time to respond if it is overloaded.

Finally the connection request timeout is how long we are willing to wait for a connection from the pool. Since we own the connection pool we should be in control of how many connections we have and should be patient if the pool is empty. Here it's important not to break things for our users. Since the timeout was infinite before, then we should be fairly generous now. Since SERVICE calls are performed as part of a SPARQL query we should consider it more acceptable to timeout sooner.

All 6 timeouts are configurable through system properties:

	/**
 	 * Configurable system property {@code org.eclipse.rdf4j.client.http.connectionTimeout} for specifying the HTTP
 	 * connection timeout in milliseconds for general use. Default is 1 hour.
 	 *
 	 * <p>
 	 * The connection timeout determines the maximum time the client will wait to establish a TCP connection to the
 	 * server. A default of 1 hour is set to allow for potential network delays without causing unnecessary timeouts.
 	 * </p>
 	 */
 	public static final String CONNECTION_TIMEOUT_PROPERTY = "org.eclipse.rdf4j.client.http.connectionTimeout";

 	/**
 	 * Configurable system property {@code org.eclipse.rdf4j.client.http.connectionRequestTimeout} for specifying the
 	 * HTTP connection request timeout in milliseconds for general use. Default is 10 days.
 	 *
 	 * <p>
 	 * The connection request timeout defines how long the client will wait for a connection from the connection pool. A
 	 * longer timeout is acceptable here since operations like large file uploads may need to wait for an available
 	 * connection.
 	 * </p>
 	 */
 	public static final String CONNECTION_REQUEST_TIMEOUT_PROPERTY = "org.eclipse.rdf4j.client.http.connectionRequestTimeout";

 	/**
 	 * Configurable system property {@code org.eclipse.rdf4j.client.http.socketTimeout} for specifying the HTTP socket
 	 * timeout in milliseconds for general use. Default is 10 days.
 	 *
 	 * <p>
 	 * The socket timeout controls the maximum period of inactivity between data packets during data transfer. A longer
 	 * timeout is appropriate for large data transfers, ensuring that operations are not interrupted prematurely.
 	 * </p>
 	 */
 	public static final String SOCKET_TIMEOUT_PROPERTY = "org.eclipse.rdf4j.client.http.socketTimeout";

 	// System property constants for SPARQL SERVICE timeouts

 	/**
 	 * Configurable system property {@code org.eclipse.rdf4j.client.sparql.http.connectionTimeout} for specifying the
 	 * HTTP connection timeout in milliseconds when used in SPARQL SERVICE calls. Default is 10 minutes.
 	 *
 	 * <p>
 	 * A shorter connection timeout is set for SPARQL SERVICE calls to quickly detect unresponsive endpoints in
 	 * federated queries, improving overall query performance by avoiding long waits for unreachable servers.
 	 * </p>
 	 */
 	public static final String SPARQL_CONNECTION_TIMEOUT_PROPERTY = "org.eclipse.rdf4j.client.sparql.http.connectionTimeout";

 	/**
 	 * Configurable system property {@code org.eclipse.rdf4j.client.sparql.http.connectionRequestTimeout} for specifying
 	 * the HTTP connection request timeout in milliseconds when used in SPARQL SERVICE calls. Default is 6 hours.
 	 *
 	 * <p>
 	 * This timeout controls how long the client waits for a connection from the pool when making SPARQL SERVICE calls.
 	 * A shorter timeout than general use ensures that queries fail fast if resources are constrained, maintaining
 	 * responsiveness.
 	 * </p>
 	 */
 	public static final String SPARQL_CONNECTION_REQUEST_TIMEOUT_PROPERTY = "org.eclipse.rdf4j.client.sparql.http.connectionRequestTimeout";

 	/**
 	 * Configurable system property {@code org.eclipse.rdf4j.client.sparql.http.socketTimeout} for specifying the HTTP
 	 * socket timeout in milliseconds when used in SPARQL SERVICE calls. Default is 6 hours.
 	 *
 	 * <p>
 	 * The socket timeout for SPARQL SERVICE calls is set to a shorter duration to detect unresponsive servers during
 	 * data transfer, ensuring that the client does not wait indefinitely for data that may never arrive.
 	 * </p>
 	 */
 	public static final String SPARQL_SOCKET_TIMEOUT_PROPERTY = "org.eclipse.rdf4j.client.sparql.http.socketTimeout";

@hmottestad hmottestad force-pushed the GH-5124-configurable-http-timeouts branch from 16db967 to 20416e2 Compare September 18, 2024 08:53
@hmottestad hmottestad marked this pull request as ready for review September 18, 2024 08:54
Copy link
Contributor

@aschwarte10 aschwarte10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing the PR for configurable timeouts. Some feedback from my side

  • I would suggest to decrease the default timeouts, specifically the connect timeout (see inside comments)
  • it should be understood that the instance is shared via the repository manager to all managed SPARQL or HTTP repositories

Note that in our application we have made the timeouts also configurable, essentially by supplying a custom HttpClientBuilder to the client session manager

Maybe you find some helpful insights here.

.setRetryHandler(retryHandlerStale)
.setServiceUnavailableRetryStrategy(serviceUnavailableRetryHandler)
.useSystemProperties()
.setDefaultRequestConfig(RequestConfig.custom().setCookieSpec(CookieSpecs.STANDARD).build())
.setDefaultRequestConfig(requestConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are also using a configurable value for the max number of connections

.setMaxConnPerRoute(maxConnections)
.setMaxConnTotal(maxConnections)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good idea. What are your defaults?

I think that part of the issue is simply that the default number of connections is too low, maybe even 1, which is quite terrible for the way that RDF4J typically does joins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have currently 25 connections as default value (for quite some time). Note that we haven't done systematic benchmarks on the impact of adjusting the number of connections.

/**
* Default HTTP connection timeout in milliseconds for general use. Set to 1 hour.
*/
public static final int DEFAULT_CONNECTION_TIMEOUT = 60 * 60 * 1000; // 1 hour
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the connection timeout should be typically rather small (in the range of seconds): this is the timeout for establishing the first connection. In our application we have set the default to 5 seconds

Maybe you can explain the different timeouts also better in the javadoc?

https://www.baeldung.com/httpclient-timeout#properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an explanation on the timeout variables, but not on the default constants.

I'm essentially afraid of breaking existing code, since the default timeouts for the Apache HttpClient is infinite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having the default be 60 seconds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having the default be 60 seconds?

I think for 99% of practical cases this is very long: this timeout is only for establishing the first connection (which is usually really fast or does not work at all). As a user you would want to know rather fast if the connection cannot be established (i.e. if the packets go into the nirvana). See also my comment w.r.t query timeout below

60s is definitely better than 1 hour though

/**
* Default HTTP socket timeout in milliseconds for general use. Set to 10 days.
*/
public static final int DEFAULT_SOCKET_TIMEOUT = 10 * 24 * 60 * 60 * 1000; // 10 days
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to define this value in the range of expected query timeouts, 10 days sounds unreasonable.

What the value does:

the time waiting for data – after establishing the connection; maximum time of inactivity between two data packets

so, in essence how long the connection remains open waiting between two packets in an established transfer. We have configured default values of 60 seconds for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm worried about here is what happens if you have some computationally heavy operation on the server side, SHACL validation or maybe inference. If the users actions trigger this then I'm curious what happens if the timeout is low.

I know we have some sort of ping thread that makes sure that one or maybe both parties are still alive during a transaction, but if that thread is sharing the same http client pool then it might be a bit brittle.

It's essentially the same situation as for the other timeouts. The default in the Apache HttpClient is infinite, and I don't want to break any existing code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the main use-case is to distinguish interactive from long-running queries. I am not sure whether this is possible, but can the read timeout be co-related with the query timeout on the query that is using the specific connection?

@hmottestad
Copy link
Contributor Author

hmottestad commented Sep 28, 2024

I'm a bit wary of setting the timeouts low since the defaults in the Apache HttpClient are, very unfortunately, infinite. And that's what our users have been living with until now. So 10 days timeout is essentially me hoping that there is no unforeseen use case that today runs for several days. I agree that 10 days is unreasonably long for a timeout.

Maybe we can do a phased rollout of the timeouts. Have them very lax now, then tighten them for RDF4J 5.2 and then again for 5.3?

I haven't really looked into other ways of making this configurable since the user that reported the issue is using the Workbench and needs something that can be configured on the command line.

@hmottestad
Copy link
Contributor Author

And thank you for the review @aschwarte10 :)

@aschwarte10
Copy link
Contributor

Maybe some additional thougts of direction:

From our experience we also use "low" timeouts on the HTTP level, as the query timeout configured on RDF4J API level currently only becomes active once the first result item is transferred (i.e. the result is wrapped in a timelimit iteration). If something goes wrong on the HTTP level and there are no timeouts, the user will not get any feedback and the query processing actually just continues.

A second thought regarding configurability: typically you would want to distinguish interactive from long-running analytical queries. The knowledge is only known by the user, who can on the RDF4J side set an appropriate query timeout on the query (or more globally). Maybe it is worth taking this into consideration, i.e. is it possible to co-relate the HTTP level timeouts with the query timeouts set on the RDF4J level - not sure though if the HTTP timeouts can really be adjusted per query execution.

An alternate way: maybe in the workbench you start with explicitly configuring longer default timeouts, while the default timeouts in the implementation code are reasonably small. Not sure if there is a common entry point for starting up the workbench, when looking at the container there clearly is, for plain-processes I do not know how it is usually started.

Might also be worth to get another opinion on timeouts. I guess the most important is as always to have some defined timeout to avoid endless processing without any feedback.

…squashed commit)

Squashed commits:
[9aa87b594c] GH-5124 introduce number of connections and also reduce timeouts
@hmottestad hmottestad enabled auto-merge October 21, 2024 20:27
@hmottestad hmottestad merged commit 5d25e15 into develop Oct 21, 2024
9 checks passed
@hmottestad hmottestad deleted the GH-5124-configurable-http-timeouts branch October 21, 2024 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants