Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to connect to both IPv4 and IPv6 addresses #430

Merged
merged 2 commits into from
Jul 6, 2016

Conversation

polmum
Copy link

@polmum polmum commented May 13, 2016

If a host has both IPv4 and IPv6 addresses, try to connect to the preferred one first and the other one after a delay. Keep the first connection to succeed and discard the other one.

This pull request is a possible solution to the issue described in https://github.com/robbiehanson/XMPPFramework/issues/718. It is based on the idea of the Happy Eyeballs algorithm, although it is not a complete implementation of it, as it does not cache connection successes and failures.

I tried to modify as few lines of code as possible, although I had to extract a few methods to be able to reuse them for the connection to both addresses.

…ferred one first and the other one after a delay, and keep the first one to succeed.
@chrisballinger
Copy link
Collaborator

Wow, great job! Would you mind adding some small unit tests? There are some example tests with a simple server socket / client socket that you can repurpose to test ipv4-only server sockets / ipv6-only server sockets against the new client behavior.

@chrisballinger
Copy link
Collaborator

Hmm my last comment disappeared. We need to also translate IPv4 to IPv6 when using raw addresses according to Use System APIs to Synthesize IPv6 Addresses:

If your app needs to connect to an IPv4-only server without a DNS hostname, use getaddrinfo to resolve the IPv4 address literal. If the current network interface doesn’t support IPv4, but supports IPv6, NAT64, and DNS64, performing this task will result in a synthesized IPv6 address.

Listing 10-1 shows how to resolve an IPv4 literal using getaddrinfo. Assuming you have an IPv4 address stored in memory as four bytes (such as {192, 0, 2, 1}), this example code converts it to a string (such as "192.0.2.1"), uses getaddrinfo to synthesize an IPv6 address (such as a struct sockaddr_in6 containing the IPv6 address "64:ff9b::192.0.2.1") and tries to connect to that IPv6 address.

#include <sys/socket.h>
#include <netdb.h>
#include <arpa/inet.h>
#include <err.h>

    uint8_t ipv4[4] = {192, 0, 2, 1};
    struct addrinfo hints, *res, *res0;
    int error, s;
    const char *cause = NULL;

    char ipv4_str_buf[INET_ADDRSTRLEN] = { 0 };
    const char *ipv4_str = inet_ntop(AF_INET, &ipv4, ipv4_str_buf, sizeof(ipv4_str_buf));

    memset(&hints, 0, sizeof(hints));
    hints.ai_family = PF_UNSPEC;
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_DEFAULT;
    error = getaddrinfo(ipv4_str, "http", &hints, &res0);
    if (error) {
        errx(1, "%s", gai_strerror(error));
        /*NOTREACHED*/
    }
    s = -1;
    for (res = res0; res; res = res->ai_next) {
        s = socket(res->ai_family, res->ai_socktype,
                   res->ai_protocol);
        if (s < 0) {
            cause = "socket";
            continue;
        }

        if (connect(s, res->ai_addr, res->ai_addrlen) < 0) {
            cause = "connect";
            close(s);
            s = -1;
            continue;
        }

        break;  /* okay we got one */
    }
    if (s < 0) {
        err(1, "%s", cause);
        /*NOTREACHED*/
    }
    freeaddrinfo(res0);

@sdstrowes
Copy link

Couple of quick comments!

  • Does this fix the default from v4 to v6? rfc3484 should guide the address selection.
  • I think you've set the failover pace to 500ms. You can probably shoot for something lower, perhaps 100ms-200ms.
  • To meet that delay, this should only have to handle the situation where a connect() is attempted and packets are blackholed somewhere, which is rare. In the case that v4 or v6 fails fast at the device, do these changes still help protocol selection? Without v4 connectivity, the current default is to retry a v4 address forever even if a v6 address is available.

@polmum
Copy link
Author

polmum commented May 14, 2016

@chrisballinger I'm working on adding a few unit tests to test ipv4-only server sockets / ipv6-only server sockets.
About the raw IPv4 case, I'm not really sure if we should be doing this. Unless I misunderstood something, the link you provided to Apple's Documentation talks about resolving an IPv4 literal, not a raw address. And that is being handled in the lookupHost: method of GCDAsyncSocket.m.
If someone is using the connectToAddress: method and passing a raw address, he might be expecting the socket to use that address only, and not one with a different IP protocol.
This is my understanding of this issue, but I could obviously be mistaken. In either case, perhaps this should be handled as a different issue and a different pull request?

@polmum
Copy link
Author

polmum commented May 14, 2016

@sdstrowes

  • I have not changed the preferred protocol (it's still defaulting to IPv4). This is a simple change to do, but I'd rather have more feedback before doing it. Perhaps a new issue should be opened to address specifically this?
  • I used 500 ms just to be on the safe side, but we could use something lower. Reading about some Happy Eyeballs algorithm implementation, Chrome and Mozilla go for 300 ms, so I think I'll settle for that.
  • I'll need to check this to be sure, but I think these changes do work when the connection is failing fast.

@sdstrowes
Copy link

@polmum: Cool, thanks. Opened a specific issue on the default preferred: #432

@chrisballinger
Copy link
Collaborator

@polmum So the last hesitation I have is, what happens when network lag causes the (correct) protocol chosen first to timeout after 300ms, and then it falls back to the incorrect protocol and fails.

For instance, on NAT64 networks like how Apple recommends, getaddrinfo can return both an IPv4 and a synthesized IPv4->IPv6 address. Although the IPv4 address is unreachable, for whatever reason both addresses are returned. We can prefer connection to IPv6, but if network lag causes the IPv6 connection to take longer than 300ms, we will fall back to IPv4 and fail.

if (alternateAddress)
{
NSTimeInterval alternateAddressDelay = 0.3;
dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(alternateAddressDelay * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@polmum This should probably be dispatched on the socketQueue right?

@chrisballinger
Copy link
Collaborator

Okay I've taken a second look and my initial assumptions were wrong and it does connect to both protocols and picks the winner.

How important is the connection success/failure cache? For most applications I imagine it would be of negligible difference, but it should probably be implemented at some point.

@chrisballinger
Copy link
Collaborator

I made alternateAddressDelay configurable in the ipv6 branch, use socketQueue for dispatch_after, and merges another fix regarding incorrect IPv6 ports: https://github.com/robbiehanson/CocoaAsyncSocket/tree/ipv6

Will probably make a release with this soon assuming no other issues come up.

@awmwong
Copy link

awmwong commented Jun 24, 2016

@chrisballinger @polmum
Hey guys, thanks for all the great work thus far. Please let me know if this isn't the right place for this and I'll move it accordingly.

We've recently shipped an update to our app with the ipv6 branch (which includes this pull among other changes), and we've been receiving crash reports with the following:

Crashed: GCDAsyncSocket
EXC_GUARD 0x00000001a0e33460

0  libsystem_kernel.dylib         0x181fa234c close + 8
1  App                            0x1003de76c -[GCDAsyncSocket closeSocket:] (GCDAsyncSocket.m:2684)
2  App                            0x1003de2f0 -[GCDAsyncSocket connectSocket:address:stateIndex:] (GCDAsyncSocket.m:2627)
3  libdispatch.dylib              0x181e6d4bc _dispatch_call_block_and_release + 24
4  libdispatch.dylib              0x181e6d47c _dispatch_client_callout + 16
5  libdispatch.dylib              0x181e78d08 _dispatch_after_timer_callback + 92
6  libdispatch.dylib              0x181e6d47c _dispatch_client_callout + 16
7  libdispatch.dylib              0x181e84090 _dispatch_source_latch_and_call + 2556
8  libdispatch.dylib              0x181e6f970 _dispatch_source_invoke + 808
9  libdispatch.dylib              0x181e79694 _dispatch_queue_drain + 1332
10 libdispatch.dylib              0x181e70f80 _dispatch_queue_invoke + 464
11 libdispatch.dylib              0x181e7b390 _dispatch_root_queue_drain + 728
12 libdispatch.dylib              0x181e7b0b0 _dispatch_worker_thread3 + 112
13 libsystem_pthread.dylib        0x182085470 _pthread_wqthread + 1092
14 libsystem_pthread.dylib        0x182085020 start_wqthread + 4

Digging around into EXC_GUARD seems to reveal that we're trying to close a socket descriptor that got marked guarded by the system somehow. I think this is happening when we connect on the winner socket and are trying to close out the other socket?
Would appreciate any guidance to help resolve this issue for everyone.

@jpickering
Copy link

@awmwong @chrisballinger @polmum We found the closeSocket: trying to close the same socket twice, because

dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(alternateAddressDelay * NSEC_PER_SEC)), socketQueue, ^{
      [self connectSocket:alternateSocketFD address:alternateAddress stateIndex:aStateIndex];
    });

fired after the the alternateSocketFD had already been closed.

We added a check to make sure the socket was either socket6FD or socket4FD before we closed it. An attempt to close the socket the second time will skip the close() because it was set to SOCKET_NULL the first time.

- (void)closeSocket:(int)socketFD
{
  if (socketFD != SOCKET_NULL)
  {
    if(socketFD == socket6FD || socketFD == socket4FD){  <--------------------------
      close(socketFD);

      if (socketFD == socket4FD)
      {
        socket4FD = SOCKET_NULL;
      }
      else if (socketFD == socket6FD)
      {
        socket6FD = SOCKET_NULL;
      }
    }
  }
}

@chrisballinger
Copy link
Collaborator

@awmwong @jpickering Thanks for the feedback! Good thing we caught this before shipping it out to thousands of other apps.

@chrisballinger
Copy link
Collaborator

Pushed the fix to ipv6 branch.

@chrisballinger chrisballinger mentioned this pull request Jul 5, 2016
@chrisballinger chrisballinger merged commit 7877b0a into robbiehanson:master Jul 6, 2016
@chrisballinger
Copy link
Collaborator

Thanks everyone! I plan to make a new release soon that includes this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants