Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent "Record reader index out of sync." error on SRV record Resolve. #51

Closed
fbeauche opened this issue Feb 7, 2020 · 25 comments · Fixed by #62
Closed

Intermittent "Record reader index out of sync." error on SRV record Resolve. #51

fbeauche opened this issue Feb 7, 2020 · 25 comments · Fixed by #62
Assignees
Milestone

Comments

@fbeauche
Copy link

fbeauche commented Feb 7, 2020

We're using MongoD.Driver that use DNS Client to resolve SRV record of our mongo db Altas cluster.

The code run from a kubernetes cluster in AKS (Azure kubernetes services). On a newly created cluster, we have lots of connection error with this stacktrace :

A timeout occured after 30000ms selecting a server using CompositeServerSelector 
{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 } }. 
Client view of cluster state is { ClusterId : \"1\", ConnectionMode : \"ReplicaSet\", Type : \"ReplicaSet\", State : \"Disconnected\", Servers : [], DnsMonitorException : \"DnsClient.DnsResponseException: Unhandled exception ---> 
**System.InvalidOperationException: Record reader index out of sync.
**  at DnsClient.DnsRecordFactory.GetRecord(ResourceRecordInfo info)
   at DnsClient.DnsMessageHandler.GetResponseMessage(ArraySegment`1 responseData)
   at DnsClient.DnsUdpMessageHandler.Query(IPEndPoint server, DnsRequestMessage request, TimeSpan timeout)
   at DnsClient.LookupClient.ResolveQuery(IReadOnlyCollection`1 servers, DnsMessageHandler handler, DnsRequestMessage request, Boolean useCache, LookupClientAudit continueAudit)
   --- End of inner exception stack trace ---
   at DnsClient.LookupClient.ResolveQuery(IReadOnlyCollection`1 servers, DnsMessageHandler handler, DnsRequestMessage request, Boolean useCache, LookupClientAudit continueAudit)
   at DnsClient.LookupClient.QueryInternal(IReadOnlyCollection`1 servers, DnsQuestion question, Boolean useCache)
   at DnsClient.LookupClient.Query(String query, QueryType queryType, QueryClass queryClass)
   at MongoDB.Driver.Core.Misc.DnsClientWrapper.ResolveSrvRecords(String service, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Clusters.DnsMonitor.Monitor()\" 

I've made a small test program that use direclty DNSClient and it seems the SRV lookup fails from time to time.

Any clues on whats going on ?

@MichaCo
Copy link
Owner

MichaCo commented Feb 8, 2020

That error indicates invalid data sent/received from the DNS Server, most likely fewer bytes arrived than expected.
With corrupted data, the client cannot really do much to fix it.

This could be network related, e.g. if the system is under load or something. It is really hard to tell though what exactly causes this -.-
If you have any additional information or ways to reproduce it, let me know

@TraGicCode
Copy link

Closing. Turned out to be a DNS Server in DHCP that didn't get removed!

@TraGicCode
Copy link

@MichaCo Can you please close this?

@fbeauche
Copy link
Author

@TraGicCode are you sure it is the same of your issue #50 ?

If you have any additional information or ways to reproduce it, let me know
For now i've workaround the problem my not using the mongodb+srv connection string but its not ideal. I will try to get more information later.

One thing that i can tell is that i had no problem with nsloookup or dig ( event tried with a python library that does SRV lookup )

@MichaCo MichaCo closed this as completed Feb 11, 2020
@JasonJhuboo
Copy link

I've been running into the same issue the last couple of days and can re-create this consistently in a number of different AKS clusters.

To recreate - run the following code within AKS:

        static void Main(string[] args)
        {
            try
            {
                LookupClient client = new LookupClient();
                IDnsQueryResponse result = client.Query("_mongodb._tcp.clusterm10-u06qs.mongodb.net", QueryType.SRV);
                Console.WriteLine($"Error: {result.HasError}");

                foreach (DnsResourceRecord dnsResourceRecord in result.AllRecords) 
                    Console.WriteLine(dnsResourceRecord.ToString());
            }
            catch (Exception e)
            {
                Console.WriteLine(e);
            }

            Console.ReadLine();
        }

I was using .NET Core 3.1 (alpine docker container) and trying to query MongoDB for an SRV record.
if the above host address is dead (because i will be taking my own test database down soon) - just create a free mongodb using mongodb atlas and replace in the program.

I find that it works fine on Windows 10, fine in linux outside of AKS, but inside AKS I get:

DnsClient.DnsResponseException: Unhandled exception
 ---> System.InvalidOperationException: Record reader index out of sync.
   at DnsClient.DnsRecordFactory.GetRecord(ResourceRecordInfo info) in C:\Users\jason\Source\Repos\DnsClientTest\DnsClientTest\Src\DnsRecordFactory.cs:line 181
   at DnsClient.DnsMessageHandler.GetResponseMessage(ArraySegment`1 responseData) in C:\Users\jason\Source\Repos\DnsClientTest\DnsClientTest\Src\DnsMessageHandler.cs:line 114
   at DnsClient.DnsUdpMessageHandler.Query(IPEndPoint server, DnsRequestMessage request, TimeSpan timeout) in C:\Users\jason\Source\Repos\DnsClientTest\DnsClientTest\Src\DnsUdpMessageHandler.cs:line 66
   at DnsClient.LookupClient.ResolveQuery(IReadOnlyCollection`1 servers, DnsQuerySettings settings, DnsMessageHandler handler, DnsRequestMessage request, LookupClientAudit continueAudit) in C:\Users\jason\Source\Repos\DnsClientTest\DnsClientTest\Src\LookupClient.cs:line 724
   --- End of inner exception stack trace ---
   at DnsClient.LookupClient.ResolveQuery(IReadOnlyCollection`1 servers, DnsQuerySettings settings, DnsMessageHandler handler, DnsRequestMessage request, LookupClientAudit continueAudit) in C:\Users\jason\Source\Repos\DnsClientTest\DnsClientTest\Src\LookupClient.cs:line 835
   at DnsClient.LookupClient.QueryInternal(DnsQuestion question, DnsQuerySettings settings, IReadOnlyCollection`1 useServers) in C:\Users\jason\Source\Repos\DnsClientTest\DnsClientTest\Src\LookupClient.cs:line 650
   at DnsClient.LookupClient.Query(String query, QueryType queryType, QueryClass queryClass, DnsQueryOptions queryOptions) in C:\Users\jason\Source\Repos\DnsClientTest\DnsClientTest\Src\LookupClient.cs:line 401
   at DnsClientTest.Program.Main(String[] args) in C:\Users\jason\Source\Repos\DnsClientTest\DnsClientTest\Program.cs:line 15

The file paths look a bit funny because I downloaded the latest source for use inside my test application.

The C# MongoDb driver relies on the DnsClient package, and is broken when attempting to query using SRV records. I can get round this by not using an SRV connection string, but using the SRV connection string is far preferable for maintainability reasons.

@MichaCo @TraGicCode @fbeauche
I don't believe that this issue is resolved - please can this issue be reopened?

@MichaCo
Copy link
Owner

MichaCo commented Feb 14, 2020

Very interesting,

@JasonJhuboo First of all, thanks for trying to re-produce the issue.
That at least narrows it down a bit to AKS related issues with DNS resolution.

I tried to find out more and found that there are actually a bunch of issues DNS issues tracked on the AKS github repo.
https://aka.ms/aks/io-throttle-issue for example. A couple other issues are referenced in there.

From what I found so far, it could be an issue with core-dns and/or alpine images.
At least there is a known issue maybe related to this: Azure/AKS#667

Not sure yet if I can do anything to improve the situation/stability, I'll have a closer look over the weekend I guess.

@MichaCo MichaCo reopened this Feb 14, 2020
@JasonJhuboo
Copy link

JasonJhuboo commented Feb 14, 2020

@MichaCo Thank you for taking the time to look into this. I also should have added that according to their change log, Microsoft updated their CoreDNS very recently to v1.6.6 (the timing of which seems a bit suspicious) https://github.com/Azure/AKS/blob/master/CHANGELOG.md#release-2020-02-03

Hope that helps!

@fbeauche
Copy link
Author

fbeauche commented Feb 16, 2020

@JasonJhuboo not sure 1.6.6 of core DNS is related. I an old cluster where i dont have issue and a newer one where i have it and both have 1.6.6 of coreDNS.

if it help, i've added a console.writeline (encoded in base64 so it can be pasted here ) of the buffer received in DnsUdpMessageHandler class. ( around line 56 if i remember correclty)

Here is an example with the error :

UsWBgAABAAMAAAAECF9tb25nb2RiBF90Y3ANcXNsLWRldi1wYXJ2dQVhenVyZQdtb25nb2RiA25ldAAAIQABCF9tb25nb2RiBF90Y3ANcXNsLWRldi1wYXJ2dQVhenVyZQdtb25nb2RiA25ldAAAIQABAAAAHgAzAAAAAGmJGXFzbC1kZXYtc2hhcmQtMDAtMDAtcGFydnUFYXp1cmUHbW9uZ29kYgNuZXQACF9tb25nb2RiBF90Y3ANcXNsLWRldi1wYXJ2dQVhenVyZQdtb25nb2RiA25ldAAAIQABAAAAHgAzAAAAAGmJGXFzbC1kZXYtc2hhcmQtMDAtMDEtcGFydnUFYXp1cmUHbW9uZ29kYgNuZXQACF9tb25nb2RiBF90Y3ANcXNsLWRldi1wYXJ2dQVhenVyZQdtb25nb2RiA25ldAAAIQABAAAAHgAzAAAAAGmJGXFzbC1kZXYtc2hhcmQtMDAtMDItcGFydnUFYXp1cmUHbW9uZ29kYgNuZXQAAAApEAAAAAAAABT/nAAQnAgTYxmUVUGBo1503OWO6Blxc2wtZGV2LXNoYXJkLTAwLTAxLXBhcnZ1BWF6dXJlB21vbmdvZGIDbmV0AAABAAEAAAAeAAQoVtVbGXFzbC1kZXYtc2hhcmQtMDAtMDAtcGFydnUFYXp1cmUHbW9uZ29kYgNuZXQAAAEAAQAAAB4ABChFZfIZcXNsLWRldi1zaGFyZC0wMC0wMi1wYXJ2dQVhenVyZQdtb25nb2RiA25ldAAAAQABAAAAHgAENOVxjwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==

@emilwangaa
Copy link

We are also running into this issue with an application trying to connect to MongoDb Atlas from a AKS cluster running in the EU West datacenter. I know that it worked on february 19th where we did some load tests against the app, so a good guess is that Microsoft rolled out something that is causing the error.

One thing I did test was to spin up a pod in the cluster with Mongo shell installed and here I didn't experience any issues connecting to the db.

For info: The failing application is running the aspnet 3.1-buster-slim docker image that is based of Debian and using MongoDB.driver v. 2.10.2.

@deyanp
Copy link

deyanp commented Mar 4, 2020

We are experiencing the same problem when integration tests run in Azure DevOps pipelines try to connect to M10 MongoDB Atlas instance ... Strange enough it started happening in the last few weeks, we had no such problems previously ...

@MichaCo
Copy link
Owner

MichaCo commented Mar 5, 2020

Hi @deyanp @emilwangaa @fbeauche
I'm trying to a) improve the capability to get trace information from this library in production see #60 and b) I added another retry mechanism to those kind of parser errors in case the response has invalid data.

Not sure if those changes are the best idea or if it will solve those issues you guys are seeing.
Could you maybe give the latest beta version from myget and maybe also attach to the log output and let me know what how that works? ;)

Changes are currently in this PR #58

@MichaCo
Copy link
Owner

MichaCo commented Mar 7, 2020

@fbeauche turns out this is (most likely ) the same bug as #55 and will be fixed with 1.3.0.
Using your example data I was able to reproduce and that issue and confirm that it's in the next release.

The opt record in your example payload contains 20 bytes and version 1.2.0 of DnsClient doesn't read those bytes which results in an pretty unfortunate bug.

@MichaCo MichaCo self-assigned this Mar 7, 2020
@MichaCo MichaCo added this to the 1.3.0 milestone Mar 7, 2020
@ghost
Copy link

ghost commented Mar 9, 2020

This seems to fix it in the meantime:
`

 apiVersion: v1
 kind: ConfigMap
 metadata:
   name: coredns-custom
   namespace: kube-system
 data:
   google-dns.server: |
     foo-bar.azure.mongodb.net:53 {
         errors
         cache 300
         forward . 8.8.8.8
     }

`

It essentially forwards all DNS requests for foo-bar.azure.mongodb.net to Google's DNS server and increases the TTL from 30s to 300.

@MichaCo
Copy link
Owner

MichaCo commented Mar 11, 2020

Not sure about that workaround @JoeSainsburys but at anyone having this issue, a beta version of 1.3.0, which should fix this, is available now on NuGet.org, too.

I would really appreciate if someone could try this out and verify if the issue is fixed or not.
Thanks!

@emilwangaa
Copy link

@MichaCo I've just tested with the beta and it works perfectly in our app. Thanks for taking the time to look into it and fix it 💪

@MichaCo MichaCo linked a pull request Mar 11, 2020 that will close this issue
@DmitryLukyanov
Copy link

@MichaCo, am I right that the OPT/SRV records problem (in particular described in this comment) appears since now AKS can return OPT records in DNS lookup response for SRV?

@MichaCo
Copy link
Owner

MichaCo commented Mar 11, 2020

The DNS subsystem in AKS probably always return OPT records, but it started sending them with a body. The bug was that the library didn't read the body and didn't progress the reader which caused parser issues.

In general, it is totally valid for the OPT record to have a data body, and it was clearly a bug in 1.2.0 of this library not reading it.

Some background about OPT records:
There are some experimental features in DNS which use the key/value body of OPT to transport random stuff... Maybe something started using the OPT records to transport some additional information. The type code of the data used in the example above isn't registered, so it is clearly experimental - meaning, I have no idea what the data is about or where it comes from.

You can find all known codes here:
https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-11

The code from the example above is in that 65001-65534 group.

@JasonJhuboo
Copy link

JasonJhuboo commented Mar 12, 2020

@MichaCo I've run the same test as before (this time using your new beta version), on a Windows 10 machine, on a Linux (Centos 7) machine outside of AKS and on a container based on Alpine (mcr.microsoft.com/dotnet/core/sdk:3.1.100-alpine3.10) within AKS. I now get a consistent response when querying for the SRV record, and I don't get any errors.

To echo others, thanks for your time putting in a fix for this, and I'm looking forward to the next release of DnsClient, as my problem was with MongoDB and it looks like their package just needs a version of DnsClient greater than 1.2.0, so I won't even have to wait for them to update :)

@devjoes
Copy link

devjoes commented Mar 12, 2020

@MichaCo sorry for the late response. I tried your fix and it didn't appear to work (hence the work around) - but it was late and I was in a hurry so its possible that I just applied it incorrectly. Glad it seems to fix it though. I'll try again soon. Thanks for resolving it.

@MichaCo
Copy link
Owner

MichaCo commented Mar 12, 2020

That's great,
thanks @JasonJhuboo and @emilwangaa for taking the time to confirm this.

@julian-vp
Copy link

Thanks @JoeSainsburys. The custom dns solution works for us!

@MichaCo
Copy link
Owner

MichaCo commented Mar 17, 2020

The fix is now released on NuGet https://www.nuget.org/packages/DnsClient/1.3.0

@scarreno
Copy link

scarreno commented Apr 2, 2020

I was getting the same error "Record reader index out of sync." when I tried to connect to mongodb from AKS.

Project: net Core 3.1 / MongoDB.Driver 2.10.2

Installing the package DnsClient 1.3.0 fixed the problem.

Thanks a lot!

@scadorel
Copy link

scadorel commented Apr 2, 2020

We had the same issue this morning too (.net framework 4.8, mongodb.driver 2.9, atlas cluster).
Upgrading to 1.3.0 fixed the issue.
Thanks !

@fauxcoding
Copy link

@scarreno @scadorel

If you upgrade the MongoDB.Driver to 2.10.3 it now references MongoDB.Driver.Core@2.10.0 which in turn references DnsClient@1.3.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.