You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a timeout value is set on a parallel-ssh client it appears that this timeout is being used as the max amount of time to wait for scp_send operations to complete to all nodes. When sending a large file to multiple hosts I have started seeing an error:
('Error opening remote file %s for writing on host %s - %s', 'big_file', u'host', Timeout())
I construct the ParallelSSHClient object like this:
I have been using version 2.3.2 but have verified that the code in this area hasn't changed on the latest version. I like to try it on the latest parallel-ssh version but as the environment I am working in is using python2 (much to my disdain) and I have no control over this I am unable to. I'm hoping this report helps with future improvements.
I did a bit of digging into the code and found the block which is raising it in pssh/clients/native/parallel.py:
It is using self.timeout to set the timeout for this operation and in my case it is set to a value of 10 (from my constructor args). I found it unlikely that this opening of the remote file would take 10 secs on a regular basis so I tried adding some debug logging to a local version of the source and I found this:
Opened remote file on host0 in 0.0152170658112
Finished sending file data on host0 in 7.17523002625
Opened remote file on host1 in 7.19069314003
Finished sending file data on host1 in 7.42101097107
Opened remote file on host3 in 14.6122598648
Finished sending file data on host3 in 7.85037207603
Opened remote file on host2 in 22.4626410007
Finished sending file data on host2 in 7.75936889648
It appears as though the opening of each remote file (in the scp_send64 call) is being queued up behind the actual copying of the files meaning that this timeout value effectively becomes a "maximum amount of time to wait for the copy to complete at all hosts"
I'm not too familiar with greenlets but my understanding is that these operations should be being performed co-operatively so I guess that the self.eagain_write(chan.write, data) call afterwards never yields when performing the write operation. For my own learning I added a self.poll() call between the scp_send64 and self.eagain_write(chan.write, data) calls and this shows the behaviour I would have expected - the copy is allowed to complete, without error, even if it takes > 10 secs:
Opened remote file on host1 in 0.0519061088562
Opened remote file on host0 in 0.0521171092987
Opened remote file on host2 in 0.0541279315948
Opened remote file on host3 in 0.0543429851532
Finished sending file data on host1 in 3.092856884
Finished sending file data on host0 in 6.02891302109
Finished sending file data on host2 in 9.43797397614
Finished sending file data on host3 in 12.3629457951
To Reproduce
Create a ParallelSSHClient with a small timeout value to multiple (the more the better) hosts:
Try to send a large (500MB+) file to all of these hosts - the important part is that the sending of the file should take longer than timeout set in the client:
('Error opening remote file %s for writing on host %s - %s', 'big_file', u'host', Timeout())
Expected behavior
The timeout value set in the client should only apply to the opening of the remote file opening - not the transfer of the file contents.
Additional info
parallel-ssh: Version: 2.3.2
python version: Python 2.7.5
I'm embarassed by the old, unsupported conditions I'm working with but I hope this can be addressed in the latest code versions and I can try to convince someone that we should move to python3 :)
The text was updated successfully, but these errors were encountered:
When a timeout value is set on a parallel-ssh client it appears that this timeout is being used as the max amount of time to wait for scp_send operations to complete to all nodes. When sending a large file to multiple hosts I have started seeing an error:
I construct the ParallelSSHClient object like this:
The timeouts always occur 10 secs after starting the scp_send operation and using join on it:
I have been using version 2.3.2 but have verified that the code in this area hasn't changed on the latest version. I like to try it on the latest parallel-ssh version but as the environment I am working in is using python2 (much to my disdain) and I have no control over this I am unable to. I'm hoping this report helps with future improvements.
I did a bit of digging into the code and found the block which is raising it in pssh/clients/native/parallel.py:
The
self._eagain
call is doing this:It is using
self.timeout
to set the timeout for this operation and in my case it is set to a value of 10 (from my constructor args). I found it unlikely that this opening of the remote file would take 10 secs on a regular basis so I tried adding some debug logging to a local version of the source and I found this:It appears as though the opening of each remote file (in the
scp_send64
call) is being queued up behind the actual copying of the files meaning that this timeout value effectively becomes a "maximum amount of time to wait for the copy to complete at all hosts"I'm not too familiar with greenlets but my understanding is that these operations should be being performed co-operatively so I guess that the
self.eagain_write(chan.write, data)
call afterwards never yields when performing the write operation. For my own learning I added aself.poll()
call between thescp_send64
andself.eagain_write(chan.write, data)
calls and this shows the behaviour I would have expected - the copy is allowed to complete, without error, even if it takes > 10 secs:To Reproduce
Create a ParallelSSHClient with a small timeout value to multiple (the more the better) hosts:
Try to send a large (500MB+) file to all of these hosts - the important part is that the sending of the file should take longer than timeout set in the client:
An error will be raised by the
joinall
call:Expected behavior
The timeout value set in the client should only apply to the opening of the remote file opening - not the transfer of the file contents.
Additional info
parallel-ssh: Version: 2.3.2
python version: Python 2.7.5
I'm embarassed by the old, unsupported conditions I'm working with but I hope this can be addressed in the latest code versions and I can try to convince someone that we should move to python3 :)
The text was updated successfully, but these errors were encountered: