Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to connect to mail server causes a timeout, eventual Glassfish crash #5707

Closed
bricas opened this issue Apr 1, 2019 · 8 comments · Fixed by #9939
Closed

Failure to connect to mail server causes a timeout, eventual Glassfish crash #5707

bricas opened this issue Apr 1, 2019 · 8 comments · Fixed by #9939
Labels
Feature: Notifications Type: Bug a defect User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Milestone

Comments

@bricas
Copy link
Contributor

bricas commented Apr 1, 2019

FYI: Running Dataverse 4.11

I experienced a hard-to-diagnose issue where an action (e.g. publishing a dataset) would fail, but with no visible indication as to why.

I was able to figure out that our local SMTP relay wasn't online. Once that service was restored, the operations completed normally.

A number of tasks send out email notifications. Even if the SMTP server was offline, I would expect these operations to complete successfully -- but show a warning about not being able to send out the notification.

@donsizemore
Copy link
Contributor

@bricas at @pdurbin's suggestion I tried to recreate this on a test server but was able to publish successfully.

@bricas
Copy link
Contributor Author

bricas commented Apr 1, 2019

@donsizemore interesting. It was definitely the factor that corrected my issue. I had initially tried restarting glassfish/postgresql but still got stuck with just a blue spinner.

To explain our situation further, we have a postfix relay on localhost that simply ships messages to our institution's smtp server.

@pdurbin
Copy link
Member

pdurbin commented Apr 1, 2019

@bricas thanks for opening this issue. @donsizemore tried stopping postfix ( http://irclog.iq.harvard.edu/dataverse/2019-03-29#i_89642 ) but maybe you can help us dig deeper into this relay scenario. We need to know how to reproduce the issue.

Thanks also for your take on what you think the user experience should be when email is down or unavailable. From http://irclog.iq.harvard.edu/dataverse/2019-03-29#i_89625 it sounds like @pameyer 's take was more that publishing could be prevented until email is back up (and explain this to the user who is trying to publish).

@pdurbin pdurbin changed the title Failure to connect to mail server causes a timeout Failure to connect to mail server causes a timeout, eventual Glassfish crash Apr 4, 2019
@mheppler
Copy link
Contributor

mheppler commented Jun 18, 2019

Related to Do not require email server for development environments #5328, as well as No provider for smtp #5723.

@pdurbin
Copy link
Member

pdurbin commented Oct 10, 2022

@bricas hi, have you seen this since? Are you still interested in this issue? Any more details (beyond what you've already provided, thanks) that would help us reproduce this? Thanks.

@pdurbin pdurbin added Feature: Notifications Status: Still Interested? User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh labels Oct 10, 2022
@bricas
Copy link
Contributor Author

bricas commented Oct 11, 2022

We've not seen this issue since my original report. We also added monitoring so we would be notified if postfix was every offline.

IMO, It would still be ideal to not fail when SMTP is offline -- but I doubt we'll be affected going forward.

@pdurbin
Copy link
Member

pdurbin commented Mar 26, 2024

Resolved by this pull request:

@poikilotherm
Copy link
Contributor

Actually there is a possibility this might still affect us. Today when trying out a few more things I had one occasion where adding a dataset hang while the mailserver was trying to be reached.

If this happens again, we shall think about using a jakarta.concurrency.Executor to make a timeout happen. Potentially adding to a queue for later retry of the notification. Might be a pattern for other operations like DataCite, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Notifications Type: Bug a defect User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants