-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky test test-dns #2468
Comments
It seems that this test always fails on the platforms where it fails, and always passes where it passes. Perhaps this can be related to the DNS configuration on the machines. |
It should perhaps be noted that the flaky test is |
On the CentOS and Ubuntu failures, it seems likely that the issue (or maybe just an issue?) is that they are not configured for IPv6? |
could be, on DO you have to be in the right datacenter to get IPv6 and these might not be |
For the two CentOS 5 setups that fail with this test, I strongly suspect they would be fixed by adding this to /etc/hosts:
I don't think that failure is a bug in Node. I think Node is correctly getting the name resolution response from the operating system.
|
@Trott added, care to submit a few runs https://ci.nodejs.org/job/node-test-commit-linux/ to try it out? |
Looks like that did the trick. The two CentOS 5 setups are now passing the test.
All looks good on CentOS 5. |
By the way, any guidance would be welcome on how I might be able to troubleshoot stuff like this on the CI server without playing quite so many shenanigans. (For example, I submitted a test job with code in it that dumped /etc/hosts so I could see what was in it.) With 4.0 about to drop, I imagine I should wait until after the dust settles from that and ask again. But just to get the question/request out there while I'm thinking of it... |
Looks like the failure on
That causes the lookup to return |
The FreeBSD boxes are giving an error on this test file because the test for IPv6 lookups with hints uses the
Sure enough, the test fails with I'll open a PR to skip the one relevant test on FreeBSD and include a comment explaining that if/when the bug is fixed, then the code skipping the test can be removed. |
@Trott FWIW, we're not using 7.2-RELEASE (currently on a rc of 10, moving to 10.2 in a few days) -- support for V4MAPPED was removed in more recent versions. We filtered out this flagged if passed since a while back: https://github.com/nodejs/node/blob/master/lib/net.js#L954 (relevant commit) 9bc2e26 |
@jbergstroem OK, so if I'm understanding correctly, we shouldn't add code to the test to skip FreeBSD. Node tries to filter out the |
It seems that the code to screen out the flag on FreeBSD is in a private function that is only called by |
Just had a look as well -- seems about right. Do we add a similar check to |
I don't know. Is it better to honor the flags the user sends and if the OS blows up, then throw an error so the user knows what went wrong? Or do we try to smooth the path, but then risk inserting magic that frustrates the user? "I'm specifying the flag! Why is it ignoring it?!?!" |
In this case its probably better to just let it bubble up. DNS error messages are already a bit of a mess. |
Good enough for me. So skipping the test on FreeBSD it is. |
Cool, feel free to R=me. |
Looks like
|
@Trott they already had:
but I've added |
ooops, wrong machine, you said ubunu1404, that didn't have a |
|
So, the That leaves the two Windows builds and
That seems weird to me, as 127.0.0.1 should not resolve to the fully qualified domain name. Or at least, it seems to me that it shouldn't. Can we just remove that line somehow? /cc @rvagg UPDATE: And regardless of whether it should resolve to the fqdn, the fact is that it doesn't on that box. I imagine that might be because the fqdn may not actually be set. Just a guess though. That line causes |
FreeBSD does not support the V4MAPPED flag so expect an error. This is a partial fix for nodejs#2468. It only fixes it on FreeBSD. Failures on other platforms are due to other reasons and need to be fixed separately. PR-URL: nodejs#2724 Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Fixes: nodejs#2468
FreeBSD issue fixed in f8152df. We're down to just |
I'm not particularly happy with having to modify test machines to make this work because it indicates that we maybe need more robust tests -- when someone tries to run our test suite on their computer and it fails are we happy to tell them that their distro shipped a screwey /etc/hosts, obviously we're dealing with a multiplicity of opinions about exactly what a 127.0.0.* and ::1 should map to so maybe we should be taking that into account in our tests? |
Perhaps if |
In other news, it looks like this test is no longer a problem on https://ci.nodejs.org/job/node-test-commit-windows/578/ |
So here's another option for We could (if we think there's any value in it) still run the hostname through |
Me and @jasnell ran into this while trying to run the tests on awful nodeconf.eu wifi. it's possible that with a connection without full connectivity, these tests get timed out from c-ares, while working when there is no internet at all. |
Operating systems can and do return invalid hostnames if that's what they have (for example) in /etc/hosts. Test passes if no error is thrown and the hostname string is not empty. Fixes: nodejs#2468
FreeBSD does not support the V4MAPPED flag so expect an error. This is a partial fix for #2468. It only fixes it on FreeBSD. Failures on other platforms are due to other reasons and need to be fixed separately. PR-URL: #2724 Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Fixes: #2468
FreeBSD does not support the V4MAPPED flag so expect an error. This is a partial fix for #2468. It only fixes it on FreeBSD. Failures on other platforms are due to other reasons and need to be fixed separately. PR-URL: #2724 Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Fixes: #2468
At this point, the only PR remaining to be merged to close out this bug is #2802. It's a pretty straightforward "split up this monolith of too many tests that cumulatively take too long and cause the file to timeout in CI on Windows" thing, so I'm hoping someone will give it the ol' |
For whatever reason, the CI win2012 machine was timing out on the internet test-dns file. Split out ipv4 and ipv6 specific tests to separate files so tests do not time out. (Each file is given a 60 second timeout on CI. Tests within a file are run in sequence.) PR-URL: #2802 Fixes: #2468 Reviewed-By: Roman Reiss <me@silverwind.io>
For whatever reason, the CI win2012 machine was timing out on the internet test-dns file. Split out ipv4 and ipv6 specific tests to separate files so tests do not time out. (Each file is given a 60 second timeout on CI. Tests within a file are run in sequence.) PR-URL: #2802 Fixes: #2468 Reviewed-By: Roman Reiss <me@silverwind.io>
For whatever reason, the CI win2012 machine was timing out on the internet test-dns file. Split out ipv4 and ipv6 specific tests to separate files so tests do not time out. (Each file is given a 60 second timeout on CI. Tests within a file are run in sequence.) PR-URL: #2802 Fixes: #2468 Reviewed-By: Roman Reiss <me@silverwind.io>
Examples of failures:
win2008r2
win2012r2
centos5-32
centos5-64
fedora22
armv7-ubuntu1404
freebsd101-32
freebsd101-64
The text was updated successfully, but these errors were encountered: