Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WISH: Increase limit of maximum number of open connections (currently 125+3) #28

Open
HenrikBengtsson opened this issue Jul 9, 2016 · 7 comments

Comments

@HenrikBengtsson
Copy link
Owner

HenrikBengtsson commented Jul 9, 2016

Background

As documents in help("connections", package="base") the maximum number of connections one can have open in R (in addition to the three always reserved) is 125;

"A maximum of 128 connections can be allocated (not necessarily open) at any one time. Three of these are pre-allocated (see stdout). The OS will impose limits on the numbers of connections of various types, but these are usually larger than 125."

Here is an example showing what happens when we try to open too many connections:

> cons <- list()
> for (ii in 1:126) { cons[[ii]] <- textConnection("foo") }
Error in textConnection("foo") : all connections are in use
> nrow(showConnections())
[1] 125
> head(showConnections())
  description class            mode text   isopen   can read can write
3 "\"foo\""   "textConnection" "r"  "text" "opened" "yes"    "no"     
4 "\"foo\""   "textConnection" "r"  "text" "opened" "yes"    "no"     
5 "\"foo\""   "textConnection" "r"  "text" "opened" "yes"    "no"     
6 "\"foo\""   "textConnection" "r"  "text" "opened" "yes"    "no"     
7 "\"foo\""   "textConnection" "r"  "text" "opened" "yes"    "no"     
8 "\"foo\""   "textConnection" "r"  "text" "opened" "yes"    "no"     
> tail(showConnections())
    description class            mode text   isopen   can read can write
122 "\"foo\""   "textConnection" "r"  "text" "opened" "yes"    "no"     
123 "\"foo\""   "textConnection" "r"  "text" "opened" "yes"    "no"     
124 "\"foo\""   "textConnection" "r"  "text" "opened" "yes"    "no"     
125 "\"foo\""   "textConnection" "r"  "text" "opened" "yes"    "no"     
126 "\"foo\""   "textConnection" "r"  "text" "opened" "yes"    "no"     
127 "\"foo\""   "textConnection" "r"  "text" "opened" "yes"    "no"     

Issue

There are several use cases where one might hit the upper limit of number of open connections possible in R. A common use case where one is may face this issue is when using SNOW compute clusters. SNOW clusters as implemented by the parallel package (a core R package) uses one connection per SNOW worker. These days more users have access to large clusters or machines with a large number of cores, making it more likely to try to use clusters with > 125 nodes.

> library("parallel")
> cl <- makeCluster(126L)
Error in socketConnection(port = port, server = TRUE, blocking = TRUE,  : 
  all connections are in use
> nrow(showConnections())
[1] 125

The problem with the low NCONNECTIONS limit in relationship to SNOW clusters has been discussed by others in the past, e.g.

Troubleshooting

The total limit of 128 connections is hardcoded into the R source code as constant / macro NCONNECTIONS in src/main/connections.c;

#define NCONNECTIONS 128 /* snow needs one per slave node */

which is used to preallocate a set of Rconnection:s of this size;

static Rconnection Connections[NCONNECTIONS];

The NCONNECTIONS limit was increased from 50 to 128 in R 2.4.0 (released October 2006), which appears to have been done for the same reason as explained here.

Wish

  • Increase the NCONNECTIONS limit, to say, 1024.
    • I've verified that it works with NCONNECTIONS=16384 on Linux (see comment below). Similar checks may have to be done on macOS and Windows as well.
    • This would only require a simple update of the above constant / macro.
    • The disadvantage of increasing the limit is that it will also increase the linear-search time of internal int ConnIndex(Rconnection con) for non-existing connections. Using a linked list would avoid this particular problem (see below).
  • Make the error message informative about the actual limit, e.g. all 128 connections are in use.
  • An alternative, and possibly better, approach would be to re-implement Connections as a linked list, which (including its memory usage) could grow and shrink as needed. This could even remove having a limit at all. This would require redesign of the code and increase the risk of introducing bugs. (This idea was proposed by @mtmorgan).
@HenrikBengtsson HenrikBengtsson changed the title WISH: Increase limit of maximum number of open connection (currently 125+3) WISH: Increase limit of maximum number of open connections (currently 125+3) Jul 9, 2016
@HenrikBengtsson
Copy link
Owner Author

HenrikBengtsson commented Jul 10, 2016

UPDATE: I've tested with #define NCONNECTIONS 16384 on 64-bit Linux (Ubuntu 16.04) and successfully:

  • opened 16381 user-defined textConnection():s.
  • opened 900 SNOW workers, i.e. cl <- makeCluster(900L). Didn't have the memory to try with a larger number (the workers run in separate R sessions each consuming ~200 MiB RAM).

@HenrikBengtsson
Copy link
Owner Author

A first quick mod that would allow us to modify NCONNECTIONS (via compiler flag -DNCONNECTIONS=1024) when building R from source would be:

svn diff src/main/connections.c 
Index: src/main/connections.c
===================================================================
--- src/main/connections.c	(revision 71879)
+++ src/main/connections.c	(working copy)
@@ -120,7 +120,9 @@
 # include <Startup.h>
 #endif
 
-#define NCONNECTIONS 128 /* snow needs one per slave node */
+#ifndef NCONNECTIONS
+# define NCONNECTIONS 128 /* snow needs one per slave node */
+#endif
 #define NSINKS 21
 
 static Rconnection Connections[NCONNECTIONS];

@romanzenka
Copy link

I ran into the exact same issue in a different context - we are writing software that very rapidly polls small subsets of a large amount of files and it is beneficial for us to keep the file connections open so we do not incur the file opening/closing penalty.

jmaspons added a commit to jmaspons/MLTools that referenced this issue Sep 17, 2020
…raster operations

When using many cores to parallelise training replicates + ncores-1 used by default in raster package, R session can overpass the maximum number of allowed connections (HenrikBengtsson/Wishlist-for-R#28)

Probably fix a bug where filename is not passed to summarize_pred.Raster in summary.process_NN function
@HenrikBengtsson
Copy link
Owner Author

R-devel thread 'Is it a good choice to increase the NCONNECTION value?' started on 2021-08-23 https://stat.ethz.ch/pipermail/r-devel/2021-August/081033.html has a good discussion on this where R Core's shares a willingness to bump this up.

@HenrikBengtsson
Copy link
Owner Author

Simple Bash instructions to tweak NCONNECTIONS in the source code:

$ NCONNECTIONS=1024
$ sed -i -E "s/^(#define NCONNECTIONS) [[:digit:]]+/\1 $NCONNECTIONS/" src/main/connections.c

To update help("connections") accordingly, do:

$ sed -i -E "s/[[:digit:]]+ (connections)/$NCONNECTIONS \1/" src/library/base/man/connections.Rd

These sed expressions work not only on the default 128 value, but any integer you've previously changed it to.

@HenrikBengtsson
Copy link
Owner Author

HenrikBengtsson commented May 30, 2023

Added to R-devel (to become R 4.4.0) on 2023-05-30:

\item New startup option \option{--max-connections} to set the
      maximum number of connections for the session.  Defaults to 128 as
      before: allowed values up to 4096 (but resource limits may in practice
      restrict to smaller values).

Source: wch/r-source@7efae40, and later also wch/r-source@d896f86 and wch/r-source@70827de.

A simple test run:

$ R
> parallelly::availableConnections()
[1] 128
$ R --max-connections=512
> parallelly::availableConnections()
[1] 512

and

$ Rscript -e "parallelly::availableConnections()"
[1] 128

$ Rscript --max-connections=512 -e "parallelly::availableConnections()"
[1] 512

@Breezezcl
Copy link

Added to R-devel (to become R 4.4.0) on 2023-05-30:

\item New startup option \option{--max-connections} to set the
      maximum number of connections for the session.  Defaults to 128 as
      before: allowed values up to 4096 (but resource limits may in practice
      restrict to smaller values).

Source: wch/r-source@7efae40, and later also wch/r-source@d896f86 and wch/r-source@70827de.

A simple test run:

$ R
> parallelly::availableConnections()
[1] 128
$ R --max-connections=512
> parallelly::availableConnections()
[1] 512

and

$ Rscript -e "parallelly::availableConnections()"
[1] 128

$ Rscript --max-connections=512 -e "parallelly::availableConnections()"
[1] 512

Great! How could we change the default in Rprofile and use the lager number in the Rstudio environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants