-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read operation to server failed on database local #123
Comments
updated w/ more log info: The org.elasticsearch.action.NoShardAvailableActionException is also new. Any help would be greatly appreciated. [2013-09-13 04:06:41,811][INFO ][node ] [Spiral] version[0.90.3], pid[11057], build[5c38d60/2013-08-06T13:18:31Z] [2013-09-13 04:07:50,074][ERROR][river.mongodb ] [Spiral] [mongodb][vendors_river] Mongo gave an exception |
How big is "oplog.rs" collection? |
the oplog.rs collection is 9553 (I believe): set1:PRIMARY> db.printReplicationInfo() |
@dblado I was actually asking the number of rows in "oplog.rs" (sorry for the confusion). The way the river monitors this collection is the recommended way by Mongo [1] to access mapped collection using a tailable cursor. I have the feeling something is not correctly setup in your MongoDB instance. [1] - http://docs.mongodb.org/manual/tutorial/create-tailable-cursor/ |
no worries -- db.oplog.rs.count() shows 18878152 records On Fri, Sep 13, 2013 at 9:16 AM, Richard Louapre
|
Hi, There are actually hardcoded values for "connect timeout" and "socket timeout" in the current implementation... That seems to be the issue has it takes more than 15 seconds in your scenario.
I will make a change to be able to specify these parameters in the river settings. |
Yea but I don't the indexing through the river to be taking so darn long -- On Fri, Sep 13, 2013 at 9:37 AM, Richard Louapre
|
the queries that the river sends to mongo are taking 20+ minutes to execute. the load on both m1.large instances are near 0 since i'm the only one using it at the moment. there might be a few other hits to the db and es from the app but not much |
Can you do an .explain() on the query to see if it can be optimized or indexes can be created? |
not sure how to do the explain...I thought I could do: set1:PRIMARY> db.oplog.rs.find({ $query: { $and: [ { $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ] }, { ts: { $gt: Timestamp 1378589134000|45 } } ] }, $orderby: { $natural: 1 } }).explain() I just copied one of the offending queries from the mongo logfile |
ya, send me a jar file w/ the timeout hardcode increased (or customizable) and I'll give it a try -- maybe once the river receives the response to this specific query my problem will go away. Although I did set an initial timestamp on the river to force to ignore old oplog entries |
Here is the snapshot [1]. Please give it a try. No need to change the river settings. Just stop ES replace the jar files and restart. [1] - https://dl.dropboxusercontent.com/u/64847502/elasticsearch-river-mongodb-1.7.1-SNAPSHOT.zip |
still seeing the read timed out exception: [2013-09-13 17:31:45,243][INFO ][node ] [Jimmy Woo] version[0.90.3], pid[13379], build[5c38d60/2013-08-06T13:18:31Z] [2013-09-13 17:32:53,383][ERROR][river.mongodb ] [Jimmy Woo] [mongodb][vendors_river] Mongo gave an exception |
this explain seems to be doing something: it's been running a couple minutes already but I guess that's normal because the explain() still runs the query normally just outputs additional data?? |
I don't believe you are have correctly the SNAPHOT version. The log file should show: Did you replace elasticsearch-river-mongodb-1.7.0.jar by elasticsearch-river-mongodb-1.7.1-SNAPSHOT.jar in $ES_HOME/plugins/river-mongodb? |
my bad -- I didn't delete 1.7.0 just copied in the SNAPSHOT -- running again On Fri, Sep 13, 2013 at 11:09 AM, Richard Louapre
|
in the startup of ES I'm seeing: [2013-09-13 18:14:26,106][TRACE][river.mongodb ] mongoOptionsSettings: {initial_timestamp={script_type=js, script=var date = new Date(); date.setSeconds(date.getSeconds() + 5); new java.lang.Long(date.getTime());}, secondary_read_preference=true} but the query to mongo doesn't seem to respect the initial timestamp: |
here is the explain: set1:PRIMARY> db.oplog.rs.find({ "$and" : [ { "$or" : [ { "ns" : "vendop.vendor"} , { "ns" : "vendop.$cmd"}]} , { "ts" : { "$gt" : { "$ts" : 1378589134 , "$inc" : 45}}}]}).explain() {
} |
wow, just had LOTS of activity in the ES log: |
things appear to be coming in live now -- so the last_ts in the river was GMT: Sat, 07 Sep 2013 21:25:34 GMT. According to my log files my issue has actually been happening for a while I just never noticed until yesterday. I'm guessing river queries were taking longer and longer each time because it was trying to play catchup?? |
The initial sync can definitely take a long time but hopefully it should be fast once all data are in sync. Regarding the initial timestamp can you please provide your river settings so I can take a look? |
here is my river setting: { |
as soon as it caught the river was near live but it's slowing down -- response time from mongo is 400000+ ms |
Did you try to register the river multiple times? Did you set "initial_timestamp" in one of your last version of the river settings? "initial_timestamp" is used only if _last_ts does not exist. So you want to use this parameter make sure the previous river settings have been deleted first:
|
Can you tell which query is slowing down? |
ahh ok, will remember to delete the river and not use multiple versions to seems all the queries form the river are slow still: Fri Sep 13 18:20:51.854 [conn744] query local.oplog.rs query: { $query: { looks like those are all for the same timestamp On Fri, Sep 13, 2013 at 12:15 PM, Richard Louapre
|
It looks like there might be an undocumented QUERYOPTION_OPLOGREPLAY option [1] to make query on oplog.rs more efficient. I will test it and let you know. Thanks, |
Can you please try this version [1]? QUERYOPTION_OPLOGREPLAY has been implemented. [1] - https://dl.dropboxusercontent.com/u/64847502/elasticsearch-river-mongodb-1.7.1-SNAPSHOT.zip |
everything seems to have stabilized w/ the most recent change to use QUERYOPTION_OPLOGREPLAY. WIll continue monitoring and let you know if I have other issues. Thanks so much for the quick turnaround. |
@dblado could you please provide the response time of the query using QUERYOPTION_OPLOGREPLAY? |
What is odd is that the oplog is now empty...even when I make changes to the db...so nothing is being indexed?? |
oh nevermind, I was looking at localhost :) here are log entries: Sun Sep 15 21:13:42.290 [conn3] getmore local.oplog.rs query: { $query: { $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ], ts: { $gt: Timestamp 1379278843000|1 } }, $orderby: { $natural: 1 } } cursorid:4325292602388 ntoreturn:0 keyUpdates:0 locks(micros) r:114 nreturned:0 reslen:20 5010ms |
each query is still taking about 5 seconds -- considering that there are no updated records in the db in the last few hours that seems a bit high to me |
Could you please provide the response time for these 3 queries (sorry I don't have a MongoDB instance with a huge oplog.rs):
|
Had to modify them to: db.oplog.rs.find({ $query: { $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ], ts: { $gt: Timestamp(1379278843,1) } }, $orderby: { $natural: 1 } }) db.oplog.rs.find({ $query: { ts: { $gt: Timestamp(1379278843,1) }, $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ] }, $orderby: { $natural: 1 } }) db.oplog.rs.find({ $query: { ts: { $gt: Timestamp(1379278843,1) } }, $orderby: { $natural: 1 } }) waiting for the results |
results: Mon Sep 16 16:50:40.649 [conn15] query local.oplog.rs query: { $query: { $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ], ts: { $gt: Timestamp 1379278843000|1 } }, $orderby: { $natural: 1.0 } } ntoreturn:0 ntoskip:0 nscanned:18878319 keyUpdates:0 numYields: 919 locks(micros) r:613458050 nreturned:0 reslen:20 308179ms Mon Sep 16 16:52:50.190 [conn17] query local.oplog.rs query: { $query: { ts: { $gt: Timestamp 1379278843000|1 } }, $orderby: { $natural: 1.0 } } ntoreturn:0 ntoskip:0 nscanned:18878319 keyUpdates:0 numYields: 1062 locks(micros) r:701058589 nreturned:0 reslen:20 351803ms Mon Sep 16 16:52:50.190 [conn16] query local.oplog.rs query: { $query: { ts: { $gt: Timestamp 1379278843000|1 }, $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ] }, $orderby: { $natural: 1.0 } } ntoreturn:0 ntoskip:0 nscanned:18878319 keyUpdates:0 numYields: 1164 locks(micros) r:770496332 nreturned:0 reslen:20 386413ms |
the river is producing these queries right now @ 5seconds per query: Mon Sep 16 17:03:09.425 [conn8] getmore local.oplog.rs query: { $query: { $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ], ts: { $gt: Timestamp 1379278843000|1 } }, $orderby: { $natural: 1 } } cursorid:39500982229712 ntoreturn:0 keyUpdates:0 locks(micros) r:108 nreturned:0 reslen:20 5010ms |
5 seconds seems still a lot to me. I have changed the order if the index filter in oplog.rs: ts comes first. Can you please give a try to this version [1]? [1] - https://dl.dropboxusercontent.com/u/64847502/elasticsearch-river-mongodb-1.7.1-SNAPSHOT.zip |
yea 5 seconds is still a lot of time. new jar didn't change it Wed Sep 18 21:11:05.118 [conn68] getmore local.oplog.rs query: { $query: { ts: { $gt: Timestamp 1379529386000|1 }, $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ] }, $orderby: { $natural: 1 } } cursorid:1102542858630512437 ntoreturn:0 keyUpdates:0 locks(micros) r:188 nreturned:0 reslen:20 5011ms |
@dblado I have just create a new post in MongoDB user group [1]. [1] - https://groups.google.com/forum/#!topic/mongodb-user/E7BSv624nBg |
@dblado just a quick note according to Asya Kamsky the log should be interpreted as: r:195 -> it took 195 microseconds, the rest of the time was spent waiting. |
- Implement suggested changes - https://groups.google.com/forum/#!topic/mongodb-user/E7BSv624nBg
I've been seeing the following exception in elasticsearch today:
[2013-09-12 22:36:49,823][ERROR][river.mongodb ] [M] [mongodb][vendors_river] Mongo gave an exception
com.mongodb.MongoException$Network: Read operation to server ip-10-181-140-155/10.181.140.155:27017 failed on database local
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:253)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:216)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:288)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:273)
at com.mongodb.DBCursor._check(DBCursor.java:368)
at com.mongodb.DBCursor._hasNext(DBCursor.java:459)
at com.mongodb.DBCursor.hasNext(DBCursor.java:484)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:1211)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at org.bson.io.Bits.readFully(Bits.java:46)
at org.bson.io.Bits.readFully(Bits.java:33)
at org.bson.io.Bits.readFully(Bits.java:28)
at com.mongodb.Response.(Response.java:40)
at com.mongodb.DBPort.go(DBPort.java:142)
at com.mongodb.DBPort.call(DBPort.java:92)
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:244)
... 8 more
I did add two indexes to my mongo collection last night. In the mongo I see the following query executing (the client is the elasticsearch river):
}
one of these queries takes a long time to execute:
Thu Sep 12 22:37:21.277 [conn87] query local.oplog.rs query: { $query: { $and: [ { $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ] }, { ts: { $gt: Timestamp 1378589134000|45 } } ] }, $orderby: { $natural: 1 } } cursorid:22909236201096610 ntoreturn:0 ntoskip:0 nscanned:18878069 keyUpdates:0 numYields: 3873 locks(micros) r:1835294560 nreturned:92 reslen:15638 932067ms
Thu Sep 12 22:37:21.277 [conn90] query local.oplog.rs query: { $query: { $and: [ { $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ] }, { ts: { $gt: Timestamp 1378589134000|45 } } ] }, $orderby: { $natural: 1 } } cursorid:22909235441195695 ntoreturn:0 ntoskip:0 nscanned:18878069 keyUpdates:0 numYields: 3099 locks(micros) r:1480829880 nreturned:92 reslen:15638 751916ms
Thu Sep 12 22:37:21.277 [conn91] query local.oplog.rs query: { $query: { $and: [ { $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ] }, { ts: { $gt: Timestamp 1378589134000|45 } } ] }, $orderby: { $natural: 1 } } cursorid:22909237130526342 ntoreturn:0 ntoskip:0 nscanned:18878069 keyUpdates:0 numYields: 2894 locks(micros) r:1362870376 nreturned:92 reslen:15638 691878ms
Thu Sep 12 22:37:21.278 [conn85] query local.oplog.rs query: { $query: { $and: [ { $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ] }, { ts: { $gt: Timestamp 1378589134000|45 } } ] }, $orderby: { $natural: 1 } } cursorid:22909235755883542 ntoreturn:0 ntoskip:0 nscanned:18878069 keyUpdates:0 numYields: 4315 locks(micros) r:2072460163 nreturned:92 reslen:15638 1052126ms
I'm on es 0.90.3, river-mongodb 1.7.0, mongodb 2.4.5.
any ideas what is causing the queries from river-mongodb to mongo to take so long to run?
The text was updated successfully, but these errors were encountered: