Try to connect to both IPv4 and IPv6 addresses #430

polmum · 2016-05-13T13:41:50Z

If a host has both IPv4 and IPv6 addresses, try to connect to the preferred one first and the other one after a delay. Keep the first connection to succeed and discard the other one.

This pull request is a possible solution to the issue described in https://github.com/robbiehanson/XMPPFramework/issues/718. It is based on the idea of the Happy Eyeballs algorithm, although it is not a complete implementation of it, as it does not cache connection successes and failures.

I tried to modify as few lines of code as possible, although I had to extract a few methods to be able to reuse them for the connection to both addresses.

…ferred one first and the other one after a delay, and keep the first one to succeed.

chrisballinger · 2016-05-13T20:04:44Z

Wow, great job! Would you mind adding some small unit tests? There are some example tests with a simple server socket / client socket that you can repurpose to test ipv4-only server sockets / ipv6-only server sockets against the new client behavior.

chrisballinger · 2016-05-14T00:53:06Z

Hmm my last comment disappeared. We need to also translate IPv4 to IPv6 when using raw addresses according to Use System APIs to Synthesize IPv6 Addresses:

If your app needs to connect to an IPv4-only server without a DNS hostname, use getaddrinfo to resolve the IPv4 address literal. If the current network interface doesn’t support IPv4, but supports IPv6, NAT64, and DNS64, performing this task will result in a synthesized IPv6 address.

Listing 10-1 shows how to resolve an IPv4 literal using getaddrinfo. Assuming you have an IPv4 address stored in memory as four bytes (such as {192, 0, 2, 1}), this example code converts it to a string (such as "192.0.2.1"), uses getaddrinfo to synthesize an IPv6 address (such as a struct sockaddr_in6 containing the IPv6 address "64:ff9b::192.0.2.1") and tries to connect to that IPv6 address.

#include <sys/socket.h>
#include <netdb.h>
#include <arpa/inet.h>
#include <err.h>

    uint8_t ipv4[4] = {192, 0, 2, 1};
    struct addrinfo hints, *res, *res0;
    int error, s;
    const char *cause = NULL;

    char ipv4_str_buf[INET_ADDRSTRLEN] = { 0 };
    const char *ipv4_str = inet_ntop(AF_INET, &ipv4, ipv4_str_buf, sizeof(ipv4_str_buf));

    memset(&hints, 0, sizeof(hints));
    hints.ai_family = PF_UNSPEC;
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_DEFAULT;
    error = getaddrinfo(ipv4_str, "http", &hints, &res0);
    if (error) {
        errx(1, "%s", gai_strerror(error));
        /*NOTREACHED*/
    }
    s = -1;
    for (res = res0; res; res = res->ai_next) {
        s = socket(res->ai_family, res->ai_socktype,
                   res->ai_protocol);
        if (s < 0) {
            cause = "socket";
            continue;
        }

        if (connect(s, res->ai_addr, res->ai_addrlen) < 0) {
            cause = "connect";
            close(s);
            s = -1;
            continue;
        }

        break;  /* okay we got one */
    }
    if (s < 0) {
        err(1, "%s", cause);
        /*NOTREACHED*/
    }
    freeaddrinfo(res0);

sdstrowes · 2016-05-14T00:55:16Z

Couple of quick comments!

Does this fix the default from v4 to v6? rfc3484 should guide the address selection.
I think you've set the failover pace to 500ms. You can probably shoot for something lower, perhaps 100ms-200ms.
To meet that delay, this should only have to handle the situation where a connect() is attempted and packets are blackholed somewhere, which is rare. In the case that v4 or v6 fails fast at the device, do these changes still help protocol selection? Without v4 connectivity, the current default is to retry a v4 address forever even if a v6 address is available.

polmum · 2016-05-14T19:11:20Z

@chrisballinger I'm working on adding a few unit tests to test ipv4-only server sockets / ipv6-only server sockets.
About the raw IPv4 case, I'm not really sure if we should be doing this. Unless I misunderstood something, the link you provided to Apple's Documentation talks about resolving an IPv4 literal, not a raw address. And that is being handled in the lookupHost: method of GCDAsyncSocket.m.
If someone is using the connectToAddress: method and passing a raw address, he might be expecting the socket to use that address only, and not one with a different IP protocol.
This is my understanding of this issue, but I could obviously be mistaken. In either case, perhaps this should be handled as a different issue and a different pull request?

polmum · 2016-05-14T19:18:42Z

@sdstrowes

I have not changed the preferred protocol (it's still defaulting to IPv4). This is a simple change to do, but I'd rather have more feedback before doing it. Perhaps a new issue should be opened to address specifically this?
I used 500 ms just to be on the safe side, but we could use something lower. Reading about some Happy Eyeballs algorithm implementation, Chrome and Mozilla go for 300 ms, so I think I'll settle for that.
I'll need to check this to be sure, but I think these changes do work when the connection is failing fast.

…reduced the alternateAddressDelay.

sdstrowes · 2016-05-15T00:28:10Z

@polmum: Cool, thanks. Opened a specific issue on the default preferred: #432

chrisballinger · 2016-06-18T19:15:48Z

@polmum So the last hesitation I have is, what happens when network lag causes the (correct) protocol chosen first to timeout after 300ms, and then it falls back to the incorrect protocol and fails.

For instance, on NAT64 networks like how Apple recommends, getaddrinfo can return both an IPv4 and a synthesized IPv4->IPv6 address. Although the IPv4 address is unreachable, for whatever reason both addresses are returned. We can prefer connection to IPv6, but if network lag causes the IPv6 connection to take longer than 300ms, we will fall back to IPv4 and fail.

chrisballinger · 2016-06-20T20:57:44Z

Source/GCD/GCDAsyncSocket.m

+    if (alternateAddress)
+    {
+        NSTimeInterval alternateAddressDelay = 0.3;
+        dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(alternateAddressDelay * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{


@polmum This should probably be dispatched on the socketQueue right?

chrisballinger · 2016-06-20T21:38:37Z

Okay I've taken a second look and my initial assumptions were wrong and it does connect to both protocols and picks the winner.

How important is the connection success/failure cache? For most applications I imagine it would be of negligible difference, but it should probably be implemented at some point.

chrisballinger · 2016-06-20T22:11:02Z

I made alternateAddressDelay configurable in the ipv6 branch, use socketQueue for dispatch_after, and merges another fix regarding incorrect IPv6 ports: https://github.com/robbiehanson/CocoaAsyncSocket/tree/ipv6

Will probably make a release with this soon assuming no other issues come up.

awmwong · 2016-06-24T18:55:07Z

@chrisballinger @polmum
Hey guys, thanks for all the great work thus far. Please let me know if this isn't the right place for this and I'll move it accordingly.

We've recently shipped an update to our app with the ipv6 branch (which includes this pull among other changes), and we've been receiving crash reports with the following:

Crashed: GCDAsyncSocket
EXC_GUARD 0x00000001a0e33460

0  libsystem_kernel.dylib         0x181fa234c close + 8
1  App                            0x1003de76c -[GCDAsyncSocket closeSocket:] (GCDAsyncSocket.m:2684)
2  App                            0x1003de2f0 -[GCDAsyncSocket connectSocket:address:stateIndex:] (GCDAsyncSocket.m:2627)
3  libdispatch.dylib              0x181e6d4bc _dispatch_call_block_and_release + 24
4  libdispatch.dylib              0x181e6d47c _dispatch_client_callout + 16
5  libdispatch.dylib              0x181e78d08 _dispatch_after_timer_callback + 92
6  libdispatch.dylib              0x181e6d47c _dispatch_client_callout + 16
7  libdispatch.dylib              0x181e84090 _dispatch_source_latch_and_call + 2556
8  libdispatch.dylib              0x181e6f970 _dispatch_source_invoke + 808
9  libdispatch.dylib              0x181e79694 _dispatch_queue_drain + 1332
10 libdispatch.dylib              0x181e70f80 _dispatch_queue_invoke + 464
11 libdispatch.dylib              0x181e7b390 _dispatch_root_queue_drain + 728
12 libdispatch.dylib              0x181e7b0b0 _dispatch_worker_thread3 + 112
13 libsystem_pthread.dylib        0x182085470 _pthread_wqthread + 1092
14 libsystem_pthread.dylib        0x182085020 start_wqthread + 4

Digging around into EXC_GUARD seems to reveal that we're trying to close a socket descriptor that got marked guarded by the system somehow. I think this is happening when we connect on the winner socket and are trying to close out the other socket?
Would appreciate any guidance to help resolve this issue for everyone.

jpickering · 2016-06-24T20:23:36Z

@awmwong @chrisballinger @polmum We found the closeSocket: trying to close the same socket twice, because

dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(alternateAddressDelay * NSEC_PER_SEC)), socketQueue, ^{
      [self connectSocket:alternateSocketFD address:alternateAddress stateIndex:aStateIndex];
    });

fired after the the alternateSocketFD had already been closed.

We added a check to make sure the socket was either socket6FD or socket4FD before we closed it. An attempt to close the socket the second time will skip the close() because it was set to SOCKET_NULL the first time.

- (void)closeSocket:(int)socketFD
{
  if (socketFD != SOCKET_NULL)
  {
    if(socketFD == socket6FD || socketFD == socket4FD){  <--------------------------
      close(socketFD);

      if (socketFD == socket4FD)
      {
        socket4FD = SOCKET_NULL;
      }
      else if (socketFD == socket6FD)
      {
        socket6FD = SOCKET_NULL;
      }
    }
  }
}

chrisballinger · 2016-06-24T20:36:21Z

@awmwong @jpickering Thanks for the feedback! Good thing we caught this before shipping it out to thousands of other apps.

chrisballinger · 2016-06-24T20:40:11Z

Pushed the fix to ipv6 branch.

chrisballinger · 2016-07-06T23:37:43Z

Thanks everyone! I plan to make a new release soon that includes this PR.

If a host has both IPv4 and IPv6 addresses, try to connect to the pre…

03d921c

…ferred one first and the other one after a delay, and keep the first one to succeed.

chrisballinger mentioned this pull request May 14, 2016

IPv4 as a default transport fails without IPv4 #431

Closed

Added Unit Tests, fixed a bug that was making one of the tests fail, …

7877b0a

…reduced the alternateAddressDelay.

sdstrowes mentioned this pull request May 15, 2016

Default preferred protocol should be IPv6. #432

Closed

chrisballinger reviewed Jun 20, 2016
View reviewed changes

chrisballinger mentioned this pull request Jul 5, 2016

Added failover. #445

Closed

chrisballinger merged commit 7877b0a into robbiehanson:master Jul 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to connect to both IPv4 and IPv6 addresses #430

Try to connect to both IPv4 and IPv6 addresses #430

polmum commented May 13, 2016

chrisballinger commented May 13, 2016

chrisballinger commented May 14, 2016

sdstrowes commented May 14, 2016

polmum commented May 14, 2016

polmum commented May 14, 2016

sdstrowes commented May 15, 2016

chrisballinger commented Jun 18, 2016

chrisballinger Jun 20, 2016

chrisballinger commented Jun 20, 2016

chrisballinger commented Jun 20, 2016

awmwong commented Jun 24, 2016

jpickering commented Jun 24, 2016

chrisballinger commented Jun 24, 2016

chrisballinger commented Jun 24, 2016

chrisballinger commented Jul 6, 2016

Try to connect to both IPv4 and IPv6 addresses #430

Try to connect to both IPv4 and IPv6 addresses #430

Conversation

polmum commented May 13, 2016

chrisballinger commented May 13, 2016

chrisballinger commented May 14, 2016

sdstrowes commented May 14, 2016

polmum commented May 14, 2016

polmum commented May 14, 2016

sdstrowes commented May 15, 2016

chrisballinger commented Jun 18, 2016

chrisballinger Jun 20, 2016

Choose a reason for hiding this comment

chrisballinger commented Jun 20, 2016

chrisballinger commented Jun 20, 2016

awmwong commented Jun 24, 2016

jpickering commented Jun 24, 2016

chrisballinger commented Jun 24, 2016

chrisballinger commented Jun 24, 2016

chrisballinger commented Jul 6, 2016