-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job submission fails without internet connection #270
Comments
Original comment by Jens Broeder (Bitbucket: broeder-j, GitHub: broeder-j): Dear Sasha, I (an AiiDA user) am aware of that problem. The way I solve it: I shut down the daemon if I work 'offline'. 'low level' solution suggestion I could think of: a) calculation with submission failed can be easily resubmitted. Drawback, the user has to take care of 'submission failed' calculations he does not want to be resubmitted. b) calculations stay in tosubmit state, if resource unavailable ... But I think this is really unwanted, because the daemon will be slowed down and there shouldn't be many/endless connection tries. I like the way it is. Best, Jens |
Original comment by Aliaksandr Yakutovich (Bitbucket: yakutovich, GitHub: yakutovicha): Dear Jens, Thank you for your comments. I do actually agree with the 'low level' ways you proposed to solve the problem. And I personally believe that both of them should be implemented But I wouldn't agree with two things:
Why? Suppose you run a workflow. You computer submits the jobs in a given order. And then for some reason you lose your internet connection. Then your job is failed.. You would need to restart it somehow. For me it is really unclear why failing is better then waiting until the computer is up again.
As far as I know daemon tries to connect every 30 seconds (to check the running state). What would change if it first checks whether the computer is online or not? I do not see a problem here. Best, Sasha |
If there is no network connection, the error one gets is e.g.
|
Proposal:
Add command Potential things to take care of
This depends on the work by @muhrin on the new daemon and should be implemented afterwards. |
Another common case to consider, AiiDA should put jobs in 'SUBMISSIONHOLD' if an HPC resource is in maintenance. Currently the calculation gets submitted and will fail, because it will not be put into the queue and soon considered as done. Fails because no output file. I have currently no Idea how to detect it. Maybe another feature idea might be to give computer an optional 'available property' (ggf user specific), connected to time. (submit calcs only on this machine from 21 p.m. to 5 p.m in Jan, Feb....) |
I agree that it should go on SUBMISSIONHOLD. |
Your are right. I thought about that. Does this not reject your calculation right away? If I add some logic this should work. Thanks for pointing it out. |
This is now implemented through PR #1903 and will be released with |
Originally reported by: Aliaksandr Yakutovich (Bitbucket: yakutovich, GitHub: yakutovicha)
Dear AiiDA developers,
I would like to draw you attention to the fact, that many times (at least in my case) AiiDA job fails due to the simple reason: absence or unstable internet connection.
Of course user could do some work around, but I think it would be great if you can solve this problem from the "low level".
I believe this is rather important issue to solve, because user can not always control internet connection (for example if workflow is running). Moreover this will allow offline job submission where one could prepare his calculations, submit them and the rest will be done automatically once internet will appear again. (This can happen if you work in the train for example)
Hope this will help to improve AiiDA.
Best,
Sasha
The text was updated successfully, but these errors were encountered: