Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"local and remote addresses" test fails on macOS 11 #88

Closed
armanbilge opened this issue Nov 14, 2022 · 12 comments · Fixed by #91
Closed

"local and remote addresses" test fails on macOS 11 #88

armanbilge opened this issue Nov 14, 2022 · 12 comments · Fixed by #91
Labels
bug Something isn't working

Comments

@armanbilge
Copy link
Owner

This has becoming a recurring issue. I first observed it in #85 (comment).

So far I've only seen this test failure on macOS, and possibly only macOS 11, but I am less certain of that.

The issue seems to be that getsockname on a client socket sometimes returns 0:0:0:0:0:0:7F00:1. (Is this a valid address for localhost?)

I believe it is should be returning 127.0.0.1 aka 0:0:0:0:0:FFFF:7F00:1.

I've pushed a commit that adds debug logging for the output of getsockname in 8afb7be. When I run it, I frequently (but not always) get a test failure like this:

sbt:root> testsNative/testOnly epollcat.TcpSuite
[info] Starting process '/Users/armanbilge/code/epollcat/tests/native/target/scala-2.13/tests-test-out' on port '63434'.
List(28, 30, -9, -52, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 127, 0, 0, 1, 0, 0, 0, 0)
List(28, 30, -9, -52, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 127, 0, 0, 1, 0, 0, 0, 0)
List(28, 30, -9, -51, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 0, 1, 0, 0, 0, 0)
[info] epollcat.TcpSuite:
[info] ==> X epollcat.TcpSuite.local and remote addresses 0.01s munit.ComparisonFailException: /Users/armanbilge/code/epollcat/tests/shared/src/test/scala/epollcat/TcpSuite.scala:133
[info] 132:                assertEquals(clientRemote, serverLocal)
[info] 133:                assertEquals(serverRemote, clientLocal)
[info] 134:              }
[info] values are not the same
[info] => Obtained
[info] /127.0.0.1:63437
[info] => Diff (- obtained, + expected)
[info] -/127.0.0.1:63437
[info] +/0:0:0:0:0:0:7F00:1:63437
[error] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error] 	epollcat.TcpSuite
[error] (testsNative / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 1 s, completed Nov 13, 2022, 6:09:29 PM
@armanbilge armanbilge added the bug Something isn't working label Nov 14, 2022
@armanbilge
Copy link
Owner Author

@LeeTibbert if you have a chance to take a look at this as well I would be much obliged. I am a bit at a loss what to do next, besides disabling this test in CI.

@LeeTibbert
Copy link
Collaborator

@armanbilge I suggest you disable, but not delete, the test whilst I study this. Sorry that it is wasting your time.

Don't you just love intermittent failures? Unfortunately, sometimes they are the most informative.

0:0:0:0:0:0:7F00:1 is a valid address, but one which the epollcat software should never see. It is an
IPv4 address, expressed as an IPv6 address, not an IPv6-compatible address. You & I are expecting
the latter (0:0:0:0:0:FFFF:7F00:1) form. There is a name for the former form, but it is so seldom
seen that I have forgotten it.

I suspect some form of round-robining, possibly in how clients are connecting to the server socket,
or in some getaddrinfo() call.

Anyway, as you surmise, a fix is probably days, not hours away.

@armanbilge
Copy link
Owner Author

0:0:0:0:0:0:7F00:1 is a valid address, but one which the epollcat software should never see.

Thank you, this is a small comfort :)

@LeeTibbert
Copy link
Collaborator

By "by never see" I mean that it is used in InetAddress* internally.

By my reading it is the second assert which is failing

:                assertEquals(serverRemote, clientLocal)

if that is true, I need to look clientLocal and the InetAddress equals override.

My plan is to clone the epollcat repository onto my macM1 and see if I can
replicate over 20 or 30 runs.

@armanbilge
Copy link
Owner Author

armanbilge commented Nov 16, 2022

I've now identified this is a regression in SN 0.4.8. If I take my debug commit 8afb7be and revert the SN update, I now consistently get this as output.

sbt:root> testsNative/testOnly epollcat.TcpSuite
[info] Starting process '/Users/armanbilge/code/epollcat/tests/native/target/scala-2.13/tests-test-out' on port '65059'.
List(28, 30, -2, 37, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
List(28, 30, -2, 37, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
List(28, 30, -2, 38, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
[info] epollcat.TcpSuite:
[info]   + local and remote addresses 0.01s
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1

@LeeTibbert
Copy link
Collaborator

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 is the IPv6 loopback address "::1"
which may or may not be reasonable, depending upon how the client made the connection (IPv4 or IPv6).

Glad this bug is not a University final exam question.

@LeeTibbert
Copy link
Collaborator

I am finding that I can avoid having to do a total reboot by "sbt> testsNative/clean" and then re-running my
test (only the remote & local test). In (native) TcpSuite (i.e. "localhost") it fails the first time and succeeds thereafter.
Tcp6Suite (::1) seems to always work. All this on macOS 12.6(?). I have been mainly using Scala 3.2.1 but
fell back to Scala 2.13.10 to see what happens there.

More by the light of day, my brain is totally confuzzled.

@LeeTibbert
Copy link
Collaborator

One of the many changes in InetAddress* that happened between SN 0.4.7 and 0.4.8 is that
the resolution of "localhost" may have changed from (IPv6) "::1" (all zero, then 1) to
IPv4 (127.0.0.1). This change was to make SN act the same as JVM by using the default "false"
for "java.net.preferIPv6Addresses" (SN Issue #2849).

I am still tracing trying to figure out what is going on here and what should be going on here.

@LeeTibbert
Copy link
Collaborator

A quick note as I end my sprint.

The reported bug also occurs on macOS 12.6 M1 hardware. It happens intermittently, almost always as
the first run after the executable starts up.

It looks like the fix in SN 0.4.8 to follow ScalaJVM java behavior in preferring "localhost" as an IPv4mapped IPv6
address "::FFFF:127.0.0.1" rather than an IPv6 address "0:0:0:0:0:0:0:1", better known as "::1" revealed
a "quirk" in macOS getsockname().

It appears that macOS getsockname() can and does, especially on the first call after an executable has
started, return an IPv4 compatible IPv6 address (i.e. bytes 10 & 11 are 0 and not FF). The rest of the time
it appears to be consistently returning an IPv4 mapped IPv6 address. The IPv4 compatible address is not
wrong just unexpected & unusual.

I have not tried on Linux or other operating systems. I also need to check
if SN javalib uses getsockname().

I have a provisional fix to epollcat SocketHelpers#toInet6SocketAddress which I hope to
submit next sprint.

Even though this appears to be a macOS quirk, my current thinking is to enable the
fix across the board and be done with it.

I had been suspecting accept, especially since slightly different paths are taken on
Linux and macOS, but accept proved itself to work as desired & expected.

@armanbilge
Copy link
Owner Author

armanbilge commented Nov 18, 2022

Thanks for all your investigating!

It appears that macOS getsockname() can and does, especially on the first call after an executable has
started, return an IPv4 compatible IPv6 address (i.e. bytes 10 & 11 are 0 and not FF). The rest of the time
it appears to be consistently returning an IPv4 mapped IPv6 address. The IPv4 compatible address is not
wrong just unexpected & unusual.

I see. A "quick" seems appropriate indeed. I look forward to seeing your fix :)

Could 0:0:0:0:0:0:7F00:1 be a valid IPv6 address? My concern is whether it is possible to distinguish between when that is an IPv6 address vs an IPv4-compatible IPv6 address.

I am also curious if or how other libraries work around this quirk, or even the JDK itself.

@LeeTibbert
Copy link
Collaborator

LeeTibbert commented Nov 18, 2022

0:0:0:0:0:0:7F00:1 is a valid IPv6 address and is as valid as any other, in that it is
well described in the RFC. It is an IPv4 compatible IPv6 address (not "mapped").

There is a rat's nest of applicable and obsoleted RFC's and drafts (which may
not be final or "standard", is the common practice. https://www.rfc-editor.org/rfc/rfc4291.html
is one I found easily, I am not sure if it is the latest. Not to be an RFC lawyer, but to return
to original sources, Section 2.5.5 describes:


2.5.5.  IPv6 Addresses with Embedded IPv4 Addresses

   Two types of IPv6 addresses are defined that carry an IPv4 address in
   the low-order 32 bits of the address.  These are the "IPv4-Compatible
   IPv6 address" and the "IPv4-mapped IPv6 address".

2.5.5.1.  IPv4-Compatible IPv6 Address

   The "IPv4-Compatible IPv6 address" was defined to assist in the IPv6
   transition.  The format of the "IPv4-Compatible IPv6 address" is as
   follows:

   |                80 bits               | 16 |      32 bits        |
   +--------------------------------------+--------------------------+
   |0000..............................0000|0000|    IPv4 address     |
   +--------------------------------------+----+---------------------+

   Note: The IPv4 address used in the "IPv4-Compatible IPv6 address"
   must be a globally-unique IPv4 unicast address.

   The "IPv4-Compatible IPv6 address" is now deprecated because the
   current IPv6 transition mechanisms no longer use these addresses.
   New or updated implementations are not required to support this
   address type.

Section 2.5.5.2 goes on the describe the "IPv4-Mapped IPv6 Address" (::FFFF:127.0.0.1)
address we know, love, & expect.

These days, the ::0000:127.0.0.1 is almost always considered a mistake.
Java 17 does not even mention their handling:

Special IPv6 address

    IPv4-mapped address 	Of the form ::ffff:w.x.y.z, this IPv6 address is used to represent an IPv4 address. It allows the native program to use the same address data structure and also the same socket when communicating with both IPv4 and IPv6 nodes.

    In InetAddress and Inet6Address, it is used for internal representation; it has no functional role. Java will never return an IPv4-mapped address. These classes can take an IPv4-mapped address as input, both in byte array and text representation. However, it will be converted into an IPv4 address.

As I think you demonstrated earlier, one can create them explicitly and they are not forced to
"mapped"

import java.net._

val a = Array(0, 0,0,0,0,0,0,0,0,0,0,0,127,0,0,1).map(x => x.toByte)
val ia6 = Inet6Address.getByAddress(":", a, 0)

val f = ia6.asInstanceOf[Inet6Address].isIPv4CompatibleAddress()
printf(s"f: $f\n")
print(ia6.toString())

If we want to maintain that behavior, and I think we do and I think your last entry
implies that, I can put the quirk correction close to the quirk source (getsockname() and
leave the general handling undisturbed. "Least strong intervention".

What do you think (beyond, "I am glad I never became a network engineer. And I thought async was
bad!")?

@armanbilge
Copy link
Owner Author

0:0:0:0:0:0:7F00:1 is a valid IPv6 address and is as valid as any other, in that it is
well described in the RFC. It is an IPv4 compatible IPv6 address (not "mapped").

Ok, thanks, so if I understand correctly the single unique interpretation of 0:0:0:0:0:0:7F00:1 is as an IPv4 compatible IPv6 address: there is no other interpretation. That's a small relief.

Thanks for quoting the RFC. That was helpful. I think 😅

Your "quick quirk correction" sounds good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants