Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent Connection timeout on WinRM connections #220

Open
wmunyan opened this issue Oct 24, 2017 · 1 comment
Open

Intermittent Connection timeout on WinRM connections #220

wmunyan opened this issue Oct 24, 2017 · 1 comment

Comments

@wmunyan
Copy link

wmunyan commented Oct 24, 2017

Hello,
I am experiencing some very strange behavior in my program, which needs to create a (somewhat) long-running WinRM connection to a remote Windows box. More often than not, my program executes partially, and then fails, producing the following:

Exception in thread "main" com.xebialabs.overthere.cifs.winrm.WinRmRuntimeIOException: Error when sending request to https://MY-SERVER-NAME:5986/wsman
Request:
<?xml version="1.0" encoding="UTF-8"?>

<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
  <env:Header>
    <a:To xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing">https://MY-SERVER-NAME:5986/wsman</a:To>
    <a:ReplyTo xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing">
      <a:Address mustUnderstand="true">http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</a:Address>
    </a:ReplyTo>
    <w:MaxEnvelopeSize xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd" mustUnderstand="true">307200</w:MaxEnvelopeSize>
    <a:MessageID xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing">uuid:D9CE72E3-E2EE-483D-9BEB-94BF4583FF08</a:MessageID>
    <w:Locale xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd" mustUnderstand="false" xml:lang="en-US"/>
    <p:DataLocale xmlns:p="http://schemas.microsoft.com/wbem/wsman/1/wsman.xsd" mustUnderstand="false" xml:lang="en-US"/>
    <w:OperationTimeout xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd">PT3600.000S</w:OperationTimeout>
    <a:Action xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing" mustUnderstand="true">http://schemas.microsoft.com/wbem/wsman/1/windows/shell/Receive</a:Action>
    <w:SelectorSet xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd">
      <w:Selector Name="ShellId">77098E84-D91C-4E9F-B26C-36997F9F1D7C</w:Selector>
    </w:SelectorSet>
    <w:ResourceURI xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd" mustUnderstand="true">http://schemas.microsoft.com/wbem/wsman/1/windows/shell/cmd</w:ResourceURI>
  </env:Header>
  <env:Body>
    <rsp:Receive xmlns:rsp="http://schemas.microsoft.com/wbem/wsman/1/windows/shell">
      <rsp:DesiredStream CommandId="580F404C-D703-4A48-861C-96041F8E19CB">stdout stderr</rsp:DesiredStream>
    </rsp:Receive>
  </env:Body>
</env:Envelope>

Response:
[EMPTY]
        at com.xebialabs.overthere.cifs.winrm.WinRmClient.doSendRequest(WinRmClient.java:435)
        at com.xebialabs.overthere.cifs.winrm.WinRmClient.sendRequest(WinRmClient.java:345)
        at com.xebialabs.overthere.cifs.winrm.WinRmClient.receiveOutput(WinRmClient.java:182)
        at com.xebialabs.overthere.cifs.winrm.CifsWinRmConnection$2.run(CifsWinRmConnection.java:162)
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to MY-SERVER-NAME:5986 [MY-SERVER-NAME/MY-SERVER-IP] failed: Connection timed out: connect
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
        at com.xebialabs.overthere.cifs.winrm.WinRmClient.doSendRequest(WinRmClient.java:414)
        ... 3 more
Caused by: java.net.ConnectException: Connection timed out: connect
        at java.net.DualStackPlainSocketImpl.connect0(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:79)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:337)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
        ... 12 more

I have tried numerous timeout settings, such as setting:

connectionTimeoutMillis=0
socketTimeoutMillis=0
winrmTimeout=PT3600.000S

System.setProperty("jcifs.smb.client.connTimeout", "1200000")
System.setProperty("jcifs.smb.client.responseTimeout", "1200000")
System.setProperty("jcifs.smb.client.soTimeout", "1200000")

I have also tried configuring WinRM on the target with massive timeouts. Here's the current configuration:

C:\Windows\system32>winrm get winrm/config
Config
    MaxEnvelopeSizekb = 500
    MaxTimeoutms = 600000
    MaxBatchItems = 32000
    MaxProviderRequests = 4294967295
    Client
        NetworkDelayms = 10000
        URLPrefix = wsman
        AllowUnencrypted = true [Source="GPO"]
        Auth
            Basic = true [Source="GPO"]
            Digest = false [Source="GPO"]
            Kerberos = true
            Negotiate = true
            Certificate = true
            CredSSP = false
        DefaultPorts
            HTTP = 5985
            HTTPS = 5986
        TrustedHosts = * [Source="GPO"]
    Service
        RootSDDL = O:NSG:BAD:P(A;;GA;;;BA)(A;;GR;;;IU)S:P(AU;FA;GA;;;WD)(AU;SA;GXGW;;;WD)
        MaxConcurrentOperations = 4294967295
        MaxConcurrentOperationsPerUser = 1500
        EnumerationTimeoutms = 240000
        MaxConnections = 300
        MaxPacketRetrievalTimeSeconds = 240
        AllowUnencrypted = true [Source="GPO"]
        Auth
            Basic = true [Source="GPO"]
            Kerberos = true
            Negotiate = true
            Certificate = false
            CredSSP = false
            CbtHardeningLevel = Relaxed
        DefaultPorts
            HTTP = 5985
            HTTPS = 5986
        IPv4Filter = *
        IPv6Filter = *
        EnableCompatibilityHttpListener = false
        EnableCompatibilityHttpsListener = false
        CertificateThumbprint
        AllowRemoteAccess = true
    Winrs
        AllowRemoteShellAccess = true
        IdleTimeout = 7200000
        MaxConcurrentUsers = 10
        MaxShellRunTime = 2147483647
        MaxProcessesPerShell = 50
        MaxMemoryPerShellMB = 1024
        MaxShellsPerUser = 30

Again, the timeout is random. Sometimes the process makes it all the way through, executing about 400 individual commands on the system. There will be instances where more than 400 will take place, and the connection may need to stay active for hours in order to collect the (sometimes massive) amounts of information it needs. Any thoughts or ideas would be most welcome. Because the exception is random, I am having a hard time determining root cause. Thanks anyone for help!

Cheers,
-Bill M.

Update: I am seeing the following in my logs when the connections are failing:

Connection released: [id: 373][route: {s}->https://MY-SERVER:5986][total kept alive: 0; route allocated: 0 of 2; total allocated: 0 of 20]

I feel like the fact that the kept alive, allocated routes, and/or total allocations are all 0 is significant in some way, but i dont know what...

@wmunyan
Copy link
Author

wmunyan commented Aug 7, 2018

Anyone potentially looking at this issue? It is still occurring, still intermittent, and still baffling.
Cheers,
-Bill M.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant