Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing jcifs resolveOrder makes connection issue due to wrong server IP used #258

Closed
courville opened this issue Dec 6, 2020 · 13 comments

Comments

@courville
Copy link

I figured that I should report a strange issue encountered by some users in the field (not systematic and not reproduced by me) with my android application.
In the application, jcifs resolveOrder is changed to jcifs.resolveOrder="BCAST,DNS" instead of default one.
And following error happens sometimes:

D ListingFragment: onCreateView SmbListingFragment
D ListingFragment: startListing smb://SERVER/smbShare/
D JcifListingEngine: JcifListingThread: listFiles for: smb://SERVER/smbShare/
E JcifListingEngine: JcifListingThread: SmbException
E JcifListingEngine: jcifs.smb.SmbException: Failed to connect: SERVER<20>/192.168.25.1
E JcifListingEngine:    at jcifs.smb.SmbTransportImpl.ensureConnected(SourceFile:689)
E JcifListingEngine:    at jcifs.smb.SmbTransportPoolImpl.getSmbTransport(SourceFile:217)
E JcifListingEngine:    at jcifs.smb.SmbTransportPoolImpl.getSmbTransport(SourceFile:48)
E JcifListingEngine:    at jcifs.smb.SmbTreeConnection.connectHost(SourceFile:565)
E JcifListingEngine:    at jcifs.smb.SmbTreeConnection.connectHost(SourceFile:489)
E JcifListingEngine:    at jcifs.smb.SmbTreeConnection.connect(SourceFile:465)
E JcifListingEngine:    at jcifs.smb.SmbTreeConnection.connectWrapException(SourceFile:426)
E JcifListingEngine:    at jcifs.smb.SmbFile.ensureTreeConnected(SourceFile:558)
E JcifListingEngine:    at jcifs.smb.SmbEnumerationUtil.doEnum(SourceFile:221)
E JcifListingEngine:    at jcifs.smb.SmbEnumerationUtil.listFiles(SourceFile:279)
E JcifListingEngine:    at jcifs.smb.SmbFile.listFiles(SourceFile:1280)
E JcifListingEngine:    at com.archos.filecorelibrary.jcifs.JcifListingEngine$JcifListingThread.run(SourceFile:109)
E JcifListingEngine: Caused by: jcifs.util.transport.ConnectionTimeoutException: Connection timeout
E JcifListingEngine:    at jcifs.util.transport.Transport.connect(SourceFile:596)
E JcifListingEngine:    at jcifs.smb.SmbTransportImpl.ensureConnected(SourceFile:686)
E JcifListingEngine:    ... 11 more

Note that the fail to connect happens with SERVER<20>/192.168.25.1 and 192.168.25.1 is not an IP belonging to the subnet used by the user since SERVER has 192.168.1.2 address.
Two independent users reported this behavior, one with a QNAP NAS and the other one with a win7 PC SMB server.
Unfortunately, I am unable to get packet captures of the issue and I guess this will not help you figure it out.

@courville
Copy link
Author

Note that using default jcifs resolvOrder creates issues to connect to WD MyCloud NAS devices. I am thus out of "universal" solution.

@courville
Copy link
Author

OK I have found 2 users that have the issues in the field that are willing to debug. One requiring jcifs.resolveOrder="BCAST,DNS", the other one requiring jcifs.resolveOrder="DNS,BCAST" and using the other option either creates this issue (wrong IP found) or timeouts when browsing shares. This is really strange since I cannot reproduce.
What I will use https://github.com/nova-video-player/smbcli to run on their PC.
Is there any logs that I should enable/add in jcifs-ng in the resolver that would help for you to figure out what is wrong?
I fear that asking for packet captures might be a stretch for these users.

@mbechler
Copy link
Contributor

jcifs.netbios.NameServiceClientImpl on TRACE level along with jcifs.smb.SmbTransportImpl and jcifs.smb.SmbSessionImpl should be a good start to figure out what is going on there.

@courville
Copy link
Author

courville commented Dec 13, 2020

jcifs.netbios.NameServiceClientImpl on TRACE level along with jcifs.smb.SmbTransportImpl and jcifs.smb.SmbSessionImpl should be a good start to figure out what is going on there.

Thanks for the proposal. I have made this morning a jar package of my test program with jcifs.netbios.NameServiceClientImpl set to TRACE available to the motivating users having the issue. I add right away the missing jcifs.smb.SmbSessionImpl and hopefully will report soon with log files.

@courville
Copy link
Author

courville commented Dec 18, 2020

OK I instrumented my application nova to get TRACE for jcifs.smb.SmbTransportImpl and jcifs.netbios.NameServiceClientImpl.

Just to recap, I have two users having systematic problems:

  • first user with a working behavior only with jcifs.resolveOrder="BCAST,DNS"
  • second user with a working behavior only with default jcifs.resolveOrder="LMHOSTS, DNS, BCAST"

Both users have jcifs-ng trying to connect to the wrong IP while nova figures out in the udp discovery process the right IP/share

  • first user when using jcifs.resolveOrder="LMHOSTS, DNS, BCAST" have jcifs-ng using IP 92.242.132.24 (not in the LAN) instead of 192.168.0.53
  • second user when using jcifs.resolveOrder="BCAST,DNS" have jcifs-ng using IP 192.168.25.1 (not in the LAN) instead of 192.168.1.2

I got logs from the first user as requested and we see very early [Transport1] DEBUG jcifs.smb.SmbTransportImpl - Connecting in state 1 addr 92.242.132.24 while nova discovers the right IP [Thread-26] DEBUG c.a.f.samba.SambaDiscovery - onShareFound WORKGROUP "WDMYCLOUD6TB" smb://192.168.0.53/ and it fails to connect jcifs.smb.SmbException: Failed to connect: WDMYCLOUD6TB/92.242.132.24

Please find the complete logs (not sure it will help at this stage):
smb 2 resolver option not tickedlog.zip

Please do not hesitate to ask for more tests to progress on this issue.

[EDIT] jcifs.smb.SmbSessionImpl is missing, apk regenerated and requesting another test to users impacted...

[UPDATE] Logs from second user with jcifs.smb.SmbSessionImpl logs this time:

Behavior on these users are 100% reproduceable.

mbechler added a commit that referenced this issue Dec 20, 2020
Also adds a bunch of new resolver diagnostics, some refactoring.
@mbechler
Copy link
Contributor

In the first case (smb 2 resolver option not tickedlog.zip) the "wrong" address really seems to be returned by DNS (no fallback to netbios). Maybe multiple addresses are returned (unfortunately no proper logging for that until now) - in that case failover to the other returned addresses should be performed, but maybe something is broken there? Otherwise, this looks like somthing is wrong with the user's DNS setup. NetBIOS will not be tried if DNS returns a result.

In the second case the netbios query returns both addresses (192.168.25.1 and 192.168.1.2). Here, we obviously pick the wrong one and discard the remaining addresses. This issue should be fixed in master by using all returned addresses and performing failover.

I also added a bunch of new debugging that might help figuring out the remaining issue.

@courville
Copy link
Author

Thank you! I will deploy a test build based on master for the two users to collect feedback and more logs.
But based on your comment, it means that I should be fine with BCAST,DNS order with master.

@mbechler
Copy link
Contributor

Probably yes, but this order is likely to cause connection delays if netbios is not available, so I would not really recommend it for general usage.

@courville
Copy link
Author

Understood, thus let's go to the bottom of this issue and get logs from first user (92.242.132.24) with default resolver order.

@courville
Copy link
Author

OK got the logs with the master and your latest changes (I am fortunate to have motivated testers) and here is the result for the one with strange DNS answers (92.242.132.24):

@mbechler
Copy link
Contributor

It appears the user's DNS server is really returning that address for all non-existant domain.
Maybe this: https://community.virginmedia.com/t5/Networking-and-WiFi/DNS-hijacking-how-to-disable-opt-out/m-p/4145614

@courville
Copy link
Author

Indeed, on the issue discussion thread poking around with nslookup we came up with the same finding.
I hope that there are not a lot of ISP acting the same.
Thanks for pointing this out.
I got confirmation that the latest changes on master did the trick for the first user. I must confess that I do not quite understand why netbios query returns two addresses.
Thanks again for your help and support.

@mbechler
Copy link
Contributor

If the server has multiple network interfaces that seems to be the case, I was able to reproduce that with a samba server. We could possibly still try to be smarter about the order in which we try the addresses, but I'm not sure that it is worth the effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants