Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slack client closed, waiting for reconnect #203

Closed
robertmircea opened this issue Jun 21, 2015 · 52 comments
Closed

Slack client closed, waiting for reconnect #203

robertmircea opened this issue Jun 21, 2015 · 52 comments
Labels
bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented

Comments

@robertmircea
Copy link

I've noticed that Hubot disconnects from Slack after a period of inactivity. I'm seeing this message in log but after that nothing happens and hubot no longer appears connected in Slack channels. At that point, only a manual restart will do.

[Sun Jun 21 2015 09:27:39 GMT+0300 (EEST)] INFO Slack client now connected
[Sun Jun 21 2015 09:29:36 GMT+0300 (EEST)] INFO Slack client closed, waiting for reconnect

How hubot-slack suppose to recover from this this state?

@impleri
Copy link

impleri commented Jul 4, 2015

Perhaps related, but I'm in a similar boat. Log messages:

[Sat Jul 04 2015 13:42:17 GMT+0000 (UTC)] ERROR Last pong is too old: 10.005
[Sat Jul 04 2015 13:42:17 GMT+0000 (UTC)] INFO Reconnecting in 1000ms
[Sat Jul 04 2015 13:42:18 GMT+0000 (UTC)] INFO Attempting reconnect
[Sat Jul 04 2015 13:42:18 GMT+0000 (UTC)] INFO Connecting...
[Sat Jul 04 2015 13:42:38 GMT+0000 (UTC)] INFO Logged in as hubot of TEAM, but not yet connected
[Sat Jul 04 2015 13:42:38 GMT+0000 (UTC)] INFO Slack client closed

@ptierno
Copy link

ptierno commented Jul 13, 2015

Also experiencing the Last pong is too old: error. 👍

@drags
Copy link

drags commented Jul 14, 2015

Seeing this as well:

INFO Slack client closed, waiting for reconnect

Turned on debugging, the last action before the disconnect (by 10m 50s) was emitting a slack-attachment that looked like:

robot.emit 'slack-attachment',
  channel: res.message.user.name
  username: robot.name
  icon_url: 'https://slack.global.ssl.fastly.net/9fa2/img/services/hubot_128.png'
  content:
    fallback: "A thing #{thing}"
    title: my_title
    text: "```<~15 lines>```"
    mrkdwn_in: ['text']
    color: '#111111'

Including the slack-attachment since it seems to frequently be the last thing in the logs

@wylieswanson
Copy link

I also have the same problem;

[Sat Jul 18 2015 04:26:52 GMT+0000 (UTC)] ERROR Last pong is too old: 14.952
[Sat Jul 18 2015 04:26:52 GMT+0000 (UTC)] INFO Reconnecting in 1000ms
[Sat Jul 18 2015 04:26:53 GMT+0000 (UTC)] INFO Attempting reconnect
[Sat Jul 18 2015 04:26:53 GMT+0000 (UTC)] INFO Connecting...
[Sat Jul 18 2015 04:26:54 GMT+0000 (UTC)] INFO Logged in as hubot of team, but not yet connected
[Sat Jul 18 2015 04:27:22 GMT+0000 (UTC)] INFO Slack client closed

I see the code used to exit hubot but was updated at some point to not exit hubot. Since I run hubot with a supervisor, I'd actually prefer it exit -- since it would get restarted. But instead, it just sits until I manually restart with supervisor.

eriklindebratt added a commit to eriklindebratt/hubot-slack that referenced this issue Jul 29, 2015
Since Slack client seems to have troubles reconnecting occasionally (e.g. slackapi#203), this seems like the only way to solve it right now. I'm running Hubot using supervisor and in the case of Slack client dropping its connection it seems to me that having the Hubot process exit and get restarted from supervisor is the way to go.
eriklindebratt added a commit to eriklindebratt/hubot-slack that referenced this issue Jul 29, 2015
Since Slack client seems to have troubles reconnecting occasionally (e.g. slackapi#203), this seems like the only way to solve it right now. I'm running Hubot using supervisor and in the case of Slack client dropping its connection it seems to me that having the Hubot process exit and get restarted from supervisor is the way to go.
eriklindebratt added a commit to eriklindebratt/hubot-slack that referenced this issue Jul 29, 2015
Since Slack client seems to have troubles reconnecting occasionally (e.g. slackapi#203), this seems like the only way to solve it right now. I'm running Hubot using supervisor and in the case of Slack client dropping its connection it seems to me that having the Hubot process exit and get restarted from supervisor is the way to go.
@gyoza
Copy link

gyoza commented Aug 25, 2015

From within the actual desktop client of Slack this evening I noticed it say something along the lines of "reconnecting...." and everything went grey. The client reconnected and we were back in business. Noticed that the bot was offline at that point and wouldn't you know this message appeared in the log..

[Mon Aug 24 2015 21:08:35 GMT-0700 (PDT)] INFO Slack client closed, waiting for reconnect

I think it has something to do with Slack gettin funky with its protocols. Or maybe somebody just restarted some node that does some thing that some other thing needs. The days of high availability, amirite?

@rey
Copy link

rey commented Nov 5, 2015

Also having this issue.

Nov 04 10:58:33 hubot[29321]: [Wed Nov 04 2015 10:58:33 GMT+0000 (GMT)] INFO Connecting...
Nov 04 10:58:33 hubot[29321]: [Wed Nov 04 2015 10:58:33 GMT+0000 (GMT)] INFO Logged in as hubot of team, but not yet connected
Nov 04 10:58:34 hubot[29321]: [Wed Nov 04 2015 10:58:34 GMT+0000 (GMT)] INFO Slack client now connected
Nov 04 10:58:34 hubot[29321]: [Wed Nov 04 2015 10:58:34 GMT+0000 (GMT)] INFO hubot-redis-brain: Using default redis on lo...ost:6379
Nov 04 10:58:35 hubot[29321]: [Wed Nov 04 2015 10:58:35 GMT+0000 (GMT)] INFO hubot-redis-brain: Data for hubot brain retr...om Redis
Nov 05 01:55:55 hubot[29321]: [Thu Nov 05 2015 01:55:55 GMT+0000 (GMT)] INFO Slack client closed, waiting for reconnect

@farski
Copy link

farski commented Nov 6, 2015

Seeing the same thing

[Wed Nov 04 2015 21:52:19 GMT-0500 (EST)] INFO Connecting...
[Wed Nov 04 2015 21:52:19 GMT-0500 (EST)] INFO Logged in as hubot of TKTK, but not yet connected
[Wed Nov 04 2015 21:52:20 GMT-0500 (EST)] INFO Slack client now connected
[Wed Nov 04 2015 21:52:21 GMT-0500 (EST)] INFO hubot-redis-brain: Discovered redis from REDISTOGO_URL environment variable
[Wed Nov 04 2015 21:52:21 GMT-0500 (EST)] INFO hubot-redis-brain: Successfully authenticated to Redis
[Wed Nov 04 2015 21:52:21 GMT-0500 (EST)] INFO hubot-redis-brain: Data for hubot brain retrieved from Redis
[Thu Nov 05 2015 23:37:17 GMT-0500 (EST)] INFO Slack client closed, waiting for reconnect

@ryoqun
Copy link

ryoqun commented Nov 11, 2015

Probably we're affected this issue too. I really wish this bug is fixed in some way. For that matter, I can help.

@shaundon
Copy link

Just to add another voice to the chorus, I have this issue too:

[Wed Nov 11 2015 06:28:15 GMT+0000 (UTC)] INFO Slack client closed, waiting for reconnect

@zachlatta
Copy link

We're running into this as well.

Nov 11 03:31:10 hubot default:  info main: [Wed Nov 11 2015 11:31:07 GMT+0000 (UTC)] INFO Slack client closed, waiting for reconnect 

@dvdeurse
Copy link

Same problem here:

[Wed Nov 11 2015 11:20:21 GMT+0000 (UTC)] INFO Slack client closed, waiting for reconnect

@shaundon
Copy link

We came up with a 'workaround' for this - it appears that Hubot disconnects from Slack when there's been no activity visible to Hubot for a while. So, any messages in channels Hubot is in (doesn't have to be messages directed at Hubot, he sees everything).

So, we just made sure that there's always some activity on Slack by making a channel containing just Hubot and Slackbot, and then asking using Slackbot's reminder functionality to have him post a reminder in the channel every hour.

@rey
Copy link

rey commented Nov 12, 2015

@shaundon giving this a whirl 👍🏼

@gyoza
Copy link

gyoza commented Nov 13, 2015

@shaundon - we were just restarting hubort every hour ha... So it looks like the connector lacks a keep_alive...

@ptierno
Copy link

ptierno commented Nov 13, 2015

@shaundon couldn't use slashbots reminder since my hubot is in a private channel. Trying hubot-cron to see if i can replicate your fix

@aensley
Copy link

aensley commented Nov 13, 2015

I have slackbot send a reminder to the channel, not hubot itself. Works perfectly.

@ptierno
Copy link

ptierno commented Nov 13, 2015

@aensley my issue is that slackbot cannot send a reminder to a private channel

@aensley
Copy link

aensley commented Nov 13, 2015

Ah, I just made it a public channel. The only members are still slackbot and hubot. The extra channel in the list doesn't bother my team that much.

@sparanoid
Copy link

Got the same issue here:

Dec 10 07:27:29 ridill hubot[23847]: [Thu Dec 10 2015 07:27:29 GMT+0000 (UTC)] INFO Slack client now connected
Dec 10 07:27:29 ridill hubot[23847]: [Thu Dec 10 2015 07:27:29 GMT+0000 (UTC)] INFO hubot-redis-brain: Using default redis on localhost:6379
Dec 10 07:27:29 ridill hubot[23847]: [Thu Dec 10 2015 07:27:29 GMT+0000 (UTC)] INFO hubot-redis-brain: Data for hubot brain retrieved from Redis
Dec 11 06:04:37 ridill hubot[23847]: [Fri Dec 11 2015 06:04:37 GMT+0000 (UTC)] ERROR Last pong is too old: 14.981
Dec 11 06:04:37 ridill hubot[23847]: [Fri Dec 11 2015 06:04:37 GMT+0000 (UTC)] INFO Reconnecting in 1000ms
Dec 11 06:04:38 ridill hubot[23847]: [Fri Dec 11 2015 06:04:38 GMT+0000 (UTC)] INFO Attempting reconnect
Dec 11 06:04:38 ridill hubot[23847]: [Fri Dec 11 2015 06:04:38 GMT+0000 (UTC)] INFO Connecting...
Dec 11 06:04:38 ridill hubot[23847]: [Fri Dec 11 2015 06:04:38 GMT+0000 (UTC)] INFO Logged in as hubot of MyTeam, but not yet connected
Dec 11 06:04:40 ridill hubot[23847]: [Fri Dec 11 2015 06:04:40 GMT+0000 (UTC)] INFO Slack client now connected
Dec 11 06:04:57 ridill hubot[23847]: [Fri Dec 11 2015 06:04:57 GMT+0000 (UTC)] INFO Slack client closed, waiting for reconnect

I'm running it on a server with a systemd service, so my temporary solution is to restart the hubot periodically via cron.

@technicalpickles
Copy link
Contributor

Either exiting on disconnect (like #215) , or constantly pinging the bot user constantly (suggested #203 (comment)) both seem like workarounds to me, rather than fixing the problem. Best I can tell, this is an issue with either slack-client, or how hubot-slack sets up slack-client.

@technicalpickles
Copy link
Contributor

Trying to trace this through:

Based on what people have reported, the second connection isn't actually receiving message. Setting HUBOT_LOG_LEVEL=debug should provide more insight, as that would log received messages too. It'd be logging a ton of things though, so will be pretty noisy.

I'd love to find a way to reliably reproduce this. It'd be pretty manual, but might be able to do something as simple as turn off networking for 10 seconds to trigger the ping timeouts.

@technicalpickles
Copy link
Contributor

Setting HUBOT_LOG_LEVEL=debug

Actually, it looks like there's an environment variable specifically for the slack logger: SLACK_LOG_LEVEL.

@Qkyrie
Copy link

Qkyrie commented Dec 14, 2015

@shaundon how would one remind every hour? I can't seem to find an example and slackbot doesn't understand my "every hour" very well..

as a sidenote, same problem here ofc

@shaundon
Copy link

I think just by setting 24 different reminders to go off daily. I'm not 100% as it was somebody else who set it up - I try not to use Slackbot very much for fear of making Hubot jealous.

@rey
Copy link

rey commented Dec 14, 2015

@Qkyrie @shaundon yep that's what I did
screen shot 2015-12-14 at 14 11 41

@erez-rabih
Copy link

Hi,

Is there a known workaround to this problem? I've looked through this thread and saw multiple leads but not a clear workaround.

@lecoueyl
Copy link

I run a script to check if the message Slack client closed appears in the log and restart hubot if I had to:

#!/bin/bash -e

## Configuration

BASEDIR=$(cd "$(dirname "$0")/.."; pwd)
MONITOR_MESSAGE="Slack client closed"
if [ $# -eq 0 ];then
  LOG=$BASEDIR/log/hubot.log
else
  LOG=$1
fi

## Monitor the file
cd $BASEDIR
tail --follow --lines 0 $LOG | while read LOGLINE;do
  if [[ "${LOGLINE}" == *$MONITOR_MESSAGE* ]]; then
    ./bin/forever.sh restart
  fi
done

@xaka
Copy link

xaka commented Mar 17, 2016

@lecoueyl You don't have to monitor it by yourself. You can set HUBOT_SLACK_EXIT_ON_DISCONNECT env variable to any non empty value and slack adapter is going to exit the process whenever there is disconnect, and forever will spawn it again.

@aensley
Copy link

aensley commented Mar 17, 2016

@lecoueyl What is forever.sh? Is it connected to https://github.com/foreverjs/forever?

@mdonoughe
Copy link

@xaka actually this is unfortunately only mostly true. I have seen a few times recently my hubot gets disconnected and exits, but then when the new instance tries to reconnect it just hangs with an "unknown" connection error and I have to manually restart it.

@technicalpickles
Copy link
Contributor

I think I have a fix for this. At least, it's a fix for this occurring when team_migration_starteds happen.

I had contacted Slack about debugging this by triggering a bunch of migrations on a test account and test bot, and @ofross was kind enough to sit with me while I debugged and he hit a button to cause the migration. I was able to reproduce it consistently easily enough, and the hard part was finding where to detect the upcoming migration. I had a workaround by changing hubot-slack, but I was worried the conditional I added it under was being used for reconnect when the ping was too old, and would cause double pings. That lead me to adding explicitly handling the migration events.

I'm pretty sure you can install a version of node-slack-client in your hubot, and it will use that version rather than what hubot-slack specifies:

npm install --save "technicalpickles/node-slack-client#reconnect-on-migration-1.4x"

It's a little complicated to release this, because master has the 2.x series, and there's a lot of changes without a 1.4.x branch I can work against. I filed slackapi/node-slack-sdk#180 to figure that out.

I still need to find a way to recreate the 'ping too old' condition to make sure that is fixed too. I am trying to figure out how to use toxiproxy to do it.

@lecoueyl
Copy link

@aensley yes I run it with forever. My script:

#!/bin/bash
# forever.sh start | stop | restart

## Configuration

BASEDIR=$(cd "$(dirname "$0")/.."; pwd)
NODE_MODULES=$BASEDIR/node_modules
LOG=$BASEDIR/log/hubot.log
PID=$BASEDIR/tmp/pids/hubot.pid
source $BASEDIR/config/hubot.cfg

## Ensure
mkdir -p $(dirname "${LOG}")
mkdir -p $(dirname "${PID}")
cd $BASEDIR

## Forever Commands

function start {
  $NODE_MODULES/forever/bin/forever start \
    -p $BASEDIR \
    --minUptime 10 \
    --spinSleepTime 100 \
    --pidFile $PID \
    --append \
    -l $LOG \
    -c node_modules/coffee-script/bin/coffee node_modules/.bin/hubot \
    --adapter slack

  # Start log monitoring
  pkill --full hubot_log_monitoring.sh
  nohup $BASEDIR/bin/hubot_log_monitoring.sh $LOG &>/dev/null &
}

function stop {
  # Kill current instance of hubot
  if [ -f $PID ];then
    $NODE_MODULES/forever/bin/forever stop `cat $PID`
  fi
}

case $1 in
  start) start;;
  stop) stop;;
  restart) stop;start;;
  --) break;;
sac

@xarem
Copy link

xarem commented Apr 21, 2016

with debug enabled:

bot_1    | hubot stdout | [Thu Apr 21 2016 06:00:15 GMT+0000 (UTC)] INFO Slack client now connected
...
bot_1    | hubot stdout | [Thu Apr 21 2016 07:40:08 GMT+0000 (UTC)] ERROR Last pong is too old: 17.417
bot_1    | [Thu Apr 21 2016 07:40:08 GMT+0000 (UTC)] INFO Reconnecting in 1000ms
bot_1    | hubot stdout | [Thu Apr 21 2016 07:40:08 GMT+0000 (UTC)] INFO Slack client closed, waiting for reconnect
bot_1    | hubot stdout | [Thu Apr 21 2016 07:40:09 GMT+0000 (UTC)] INFO Attempting reconnect
bot_1    | [Thu Apr 21 2016 07:40:09 GMT+0000 (UTC)] INFO Connecting...
bot_1    | hubot stdout | [Thu Apr 21 2016 07:40:10 GMT+0000 (UTC)] INFO Logged in as hubot of SLACKNAME, but not yet connected
bot_1    | hubot stdout | [Thu Apr 21 2016 07:40:11 GMT+0000 (UTC)] INFO Slack client now connected
bot_1    | hubot stdout | [Thu Apr 21 2016 09:39:10 GMT+0000 (UTC)] ERROR Last pong is too old: 14.909
bot_1    | hubot stdout | [Thu Apr 21 2016 09:39:11 GMT+0000 (UTC)] INFO Reconnecting in 1000ms
bot_1    | hubot stdout | [Thu Apr 21 2016 09:39:12 GMT+0000 (UTC)] INFO Attempting reconnect
bot_1    | hubot stdout | [Thu Apr 21 2016 09:39:12 GMT+0000 (UTC)] INFO Connecting...
bot_1    | hubot stdout | [Thu Apr 21 2016 09:39:15 GMT+0000 (UTC)] INFO Logged in as hubot of SLACKNAME, but not yet connected
bot_1    | hubot stdout | [Thu Apr 21 2016 09:39:41 GMT+0000 (UTC)] INFO Slack client closed, waiting for reconnect
bot_1    | hubot stdout | [Thu Apr 21 2016 09:41:24 GMT+0000 (UTC)] ERROR Error: connect ETIMEDOUT 54.172.38.248:443
bot_1    |   at Object.exports._errnoException (util.js:890:11)
bot_1    |   at exports._exceptionWithHostPort (util.js:913:20)
bot_1    |   at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1059:14)

@ghost
Copy link

ghost commented Jun 3, 2016

I run hubot with a custom script that keeps the bot online, just like forever.sh, but simpler :)

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
source $DIR/enviroment

until $DIR/hubot --adapter slack; do
    echo "Server 'hubot' crashed with exit code $?.  Respawning.." >&2
    sleep 1
done

I set the HUBOT_SLACK_EXIT_ON_DISCONNECT to true, it works.

But the problem is that sometimes it crashes with Error: unknown, and then it's stuck there and i have to manually restart it. Can't seem to find from where the "unknown" error is being printed nor why it happens. All i want is a process.exit 1 in case of being stuck :))

Any ideas ? Thanks

@anupdhml
Copy link

anupdhml commented Jun 4, 2016

@andreimavenhut Try doing export SLACK_LOG_LEVEL="debug" before you run hubot. That's used by the node slack client on which hubot-slack relies and will show you more details on what's going on.

Also, change line https://github.com/slackhq/hubot-slack/blob/master/src/slack.coffee#L54 to @robot.logger.error error. I have seen Error: unknown in the past when working with hubot-slack and this has helped me find out what the error is.

@ghost
Copy link

ghost commented Jun 13, 2016

@anupdhml Tried that, here is the output:

[Sat Jun 11 2016 22:32:20 GMT+0300 (EEST)] DEBUG Unknown message type: reconnect_url
[Sat Jun 11 2016 22:32:20 GMT+0300 (EEST)] DEBUG { type: 'reconnect_url',
  url: 'wss://mpmulti-olt5.slack-msgs.com/websocket/G397PDMrMj13eIEchwClnSNBi-IpIC8hyrD0Z7rTjGmxj4w04u3XDtMDfkMh-BJOWbzmsuyiSXrg72-Fbsr0hwp6nQW0crGEevgsy-cnxy90z27o8qigeaxShjcm0ks6WIr1fhQWA1LP5MQSnciEiZZ_mqtL6xn9NWM8Hade3NQ=' }
[Sat Jun 11 2016 22:33:04 GMT+0300 (EEST)] ERROR Last pong is too old: 14.965
[Sat Jun 11 2016 22:33:04 GMT+0300 (EEST)] INFO Reconnecting in 1000ms
[Sat Jun 11 2016 22:33:04 GMT+0300 (EEST)] INFO Slack client connection was closed, exiting hubot process
Server 'cyberhut' crashed with exit code 1.  Respawning..
[Sat Jun 11 2016 22:33:39 GMT+0300 (EEST)] INFO Connecting...
[Sat Jun 11 2016 22:33:45 GMT+0300 (EEST)] ERROR undefined

The Respawning.. part is from our keep alive system, it restarted the hubot process which gave an undefined error..

@DEGoodmanWilson
Copy link

This PR should resolve most of the issues you are seeing: #309

Many of these disconnects are related to either a series of outages in November (https://status.slack.com/2015-11) or regular team migrations. We do regularly disconnect, but we have a number of mechanisms in place to help clients gracefully and quickly reconnect (or avoid the need to reconnect altogether), many of which are now handled in the latest version of node-slack-sdk.

Could any of you give this PR a spin and see if it helps with your reconnect issues?

@DEGoodmanWilson DEGoodmanWilson added bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented Priority—High labels Jun 27, 2016
@marceloslacerda
Copy link

What do I put in the package.json to test the latest version?

@DEGoodmanWilson
Copy link

@marceloslacerda I believe the syntax would be:
"hubot-slack": "slackhq/hubot-slack#v3-sdk-updates"

benjie added a commit to timecounts/botdylan that referenced this issue Jul 6, 2016
@marceloslacerda
Copy link

After the upgrade the problem stopped.

@benjie
Copy link

benjie commented Jul 21, 2016

Yep; we've been connected fine for 2 weeks now :)

@DEGoodmanWilson
Copy link

Outstanding!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented
Projects
None yet
Development

No branches or pull requests