Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns: fix resolving STRICT_DNS cluster endpoint FQDN with trailing dot '.' #13644

Merged
merged 6 commits into from
Oct 21, 2020
Merged

dns: fix resolving STRICT_DNS cluster endpoint FQDN with trailing dot '.' #13644

merged 6 commits into from
Oct 21, 2020

Conversation

lmarszal
Copy link
Contributor

Commit Message: dns: fix resolving STRICT_DNS cluster endpoint FQDN with trailing dot '.'
Additional Description: bump c-ares dependency to a version having c-ares/c-ares@ba59781 issue resolved
Risk Level: Low
Testing: Manual test - repro steps from #13587 now results in expected behaviour
Docs Changes: feature of adding trailing dot '.' to FQDN was never documented; not sure if applicable
Release Notes: N/A
Platform Specific Features: N/A

Fixes #13587

@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Oct 20, 2020
@repokitteh-read-only
Copy link

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt).

🐱

Caused by: #13644 was opened by lmarszal.

see: more, trace.

@lmarszal
Copy link
Contributor Author

CC @htuch

Signed-off-by: Łukasz Marszał <lmarszal@gmail.com>
Signed-off-by: Łukasz Marszał <lmarszal@gmail.com>
Signed-off-by: Łukasz Marszał <lmarszal@gmail.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, couple of short questions.

bazel/repository_locations.bzl Outdated Show resolved Hide resolved
"//bazel:windows_x86_64": "cp -L $EXT_BUILD_ROOT/external/com_github_c_ares_c_ares/nameser.h $INSTALLDIR/include/nameser.h",
"//conditions:default": "",
"//bazel:windows_x86_64": "cp -L $EXT_BUILD_ROOT/external/com_github_c_ares_c_ares/src/lib/nameser.h $INSTALLDIR/include/nameser.h && cp -L $EXT_BUILD_ROOT/external/com_github_c_ares_c_ares/src/lib/ares_dns.h $INSTALLDIR/include/ares_dns.h",
"//conditions:default": "cp -L $EXT_BUILD_ROOT/external/com_github_c_ares_c_ares/src/lib/ares_dns.h $INSTALLDIR/include/ares_dns.h",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this now needed in non-Windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a change in directory structure in c-ares (c-ares/c-ares@0bf721c). We're using macros from ares_dns.h in dns_imp_test.cc - this header file is no longer available after make install, so I had to make a "hack" in order to make it visible for us.

@lmarszal
Copy link
Contributor Author

Run into some concurrency (tsan and asan) problems in upgraded c-ares:

WARNING: ThreadSanitizer: heap-use-after-free (pid=15)
  Read of size 1 at 0x7b04000018cf by main thread:
    #0 as_is_first /b/f/w/external/com_github_c_ares_c_ares/src/lib/ares_getaddrinfo.c:765:7 (dns_impl_test+0x2e658eb)
    #1 next_dns_lookup /b/f/w/external/com_github_c_ares_c_ares/src/lib/ares_getaddrinfo.c:693:11 (dns_impl_test+0x2e648b2)
    #2 next_lookup /b/f/w/external/com_github_c_ares_c_ares/src/lib/ares_getaddrinfo.c:513:15 (dns_impl_test+0x2e64714)
    #3 next_lookup /b/f/w/external/com_github_c_ares_c_ares/src/lib/ares_getaddrinfo.c:527:11 (dns_impl_test+0x2e647df)
    #4 ares_getaddrinfo /b/f/w/external/com_github_c_ares_c_ares/src/lib/ares_getaddrinfo.c:681:3 (dns_impl_test+0x2e63c22)
    #5 Envoy::Network::DnsResolverImpl::PendingResolution::getAddrInfo(int) /proc/self/cwd/source/common/network/dns_impl.cc:283:3 (dns_impl_test+0x2e457e4)
    #6 Envoy::Network::DnsResolverImpl::resolve(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, Envoy::Network::DnsLookupFamily, std::__1::function<void (Envoy::Network::DnsResolver::ResolutionStatus, std::__1::list<Envoy::Network::DnsResponse, std::__1::allocator<Envoy::Network::DnsResponse> >&&)>) /proc/self/cwd/source/common/network/dns_impl.cc:252:25 (dns_impl_test+0x2e46543)
    #7 Envoy::Network::DnsImplTest::resolveWithExpectations(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, Envoy::Network::DnsLookupFamily, Envoy::Network::DnsResolver::ResolutionStatus, std::__1::list<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::list<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::optional<std::__1::chrono::duration<long long, std::__1::ratio<1l, 1l> > >) /proc/self/cwd/test/common/network/dns_impl_test.cc:483:23 (dns_impl_test+0x1a7d0b1)
    #8 Envoy::Network::DnsImplTest_DestroyChannelOnRefused_Test::TestBody() /proc/self/cwd/test/common/network/dns_impl_test.cc:681:3 (dns_impl_test+0x1a63b27)
    #9 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2580:10 (dns_impl_test+0x52da1cc)
    #10 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2616:14 (dns_impl_test+0x52babbe)
    #11 testing::Test::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2655:5 (dns_impl_test+0x529cbc1)
    #12 testing::TestInfo::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2832:11 (dns_impl_test+0x529da99)
    #13 testing::TestSuite::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2986:28 (dns_impl_test+0x529e70e)
    #14 testing::internal::UnitTestImpl::RunAllTests() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5697:44 (dns_impl_test+0x52af03c)
    #15 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2580:10 (dns_impl_test+0x52e174c)
    #16 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2616:14 (dns_impl_test+0x52be9ee)
    #17 testing::UnitTest::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5280:10 (dns_impl_test+0x52ae8cb)
    #18 RUN_ALL_TESTS() /proc/self/cwd/external/com_google_googletest/googletest/include/gtest/gtest.h:2485:46 (dns_impl_test+0x2d06d27)
    #19 Envoy::TestRunner::RunTests(int, char**) /proc/self/cwd/test/test_runner.cc:155:10 (dns_impl_test+0x2d05d45)
    #20 main /proc/self/cwd/test/main.cc:34:10 (dns_impl_test+0x2d0319d)

I think I know where the problem is - I will update this PR after resolving this problem upstream.

@htuch
Copy link
Member

htuch commented Oct 20, 2020

Interesting, let us know if you need any pointers on Envoy's DNS implementation. I just merged #13582, which makes the mentioned last_updated --> release_date rename, so you might want to merge master to pick that up. Thanks.

Signed-off-by: Łukasz Marszał <lmarszal@gmail.com>
Signed-off-by: Łukasz Marszał <lmarszal@gmail.com>
@lmarszal
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13644 (comment) was created by @lmarszal.

see: more, trace.

@lmarszal
Copy link
Contributor Author

@htuch It's all green.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmarszal wow, thanks for the upstream fix! 👍

@htuch
Copy link
Member

htuch commented Oct 21, 2020

/lgtm deps

@repokitteh-read-only repokitteh-read-only bot removed the deps Approval required for changes to Envoy's external dependencies label Oct 21, 2020
@htuch htuch merged commit 58f6dfa into envoyproxy:master Oct 21, 2020
@lmarszal lmarszal deleted the 13587-fqdn-trailing-dot branch October 22, 2020 08:30
@daveey
Copy link

daveey commented Oct 24, 2020

Thanks for fixing! We hit this problem when upgrading envoy from 1.14 -> 1.16. Will this change make its way into a release any time soon?

@ramaraochavali
Copy link
Contributor

@lizan @mattklein123 @PiotrSikora Is it possible to backport this to 1.16 release?

@ramaraochavali
Copy link
Contributor

Discussed offline with @lizan . Since this has dependencies and hence an element of risk involved, backport for this is not recommended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when resolving STRICT_DNS cluster endpoint FQDN with trailing dot '.'
4 participants