Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch notifier can stall the scheduler #1957

Closed
daveFNbuck opened this issue Dec 6, 2016 · 8 comments
Closed

Batch notifier can stall the scheduler #1957

daveFNbuck opened this issue Dec 6, 2016 · 8 comments
Labels

Comments

@daveFNbuck
Copy link
Contributor

So shortly after getting the new batch notifier merged, my connection to my smtp server died. I had timeout set to 30 seconds and 25 batch e-mails getting sent every hour, leading to the scheduler being stuck timing out e-mail sends for 12.5 minutes each hour.

I tried a patch that would fork and do the sending in the child process to avoid this failure mode, but tornado flipped out when that happened. Any suggestions on how we could do this more safely? Maybe we should use the multithreading or multiprocessing libraries to get something safer?

@Tarrasch
Copy link
Contributor

Tarrasch commented Dec 7, 2016

Oh boy. This is a very serious issue. But I suppose most people will not expect the new feature to be bug-free from day one.


As for solution. Using multithreading or multiprocessing seems like a intuitive solution to me. Ideally, the scheduler page would also show some warning if the email server is dead. I for so long have wanted a luigid health page ...

@dlstadther dlstadther added the bug label Jul 9, 2017
@dlstadther
Copy link
Collaborator

@daveFNbuck Is this still an issue? Or has it been resolved?

@daveFNbuck
Copy link
Contributor Author

Not sure, I'll check tomorrow. I think I have a fix in my fork, but I probably didn't make a pull request if it's not linked here.

@daveFNbuck
Copy link
Contributor Author

Made a PR for my fix that just ignores the error. I didn't make a PR right away because I was hoping for a better solution.

@nikgrok
Copy link

nikgrok commented Feb 1, 2018

Hey guys, Is this still an open bug? We are thinking of enabling this feature, but are concerned about the bug.

@daveFNbuck
Copy link
Contributor Author

#2177 is a workaround that prevents the bug from affecting the scheduler

@Tarrasch
Copy link
Contributor

Tarrasch commented Feb 2, 2018

@daveFNbuck should we close this then?

@daveFNbuck
Copy link
Contributor Author

I guess so, it doesn't look like this will lead to a better solution, and the bug we have now is a different one about losing e-mails which probably can't be effectively mitigated anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants