Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexOutOfBoundsException when iterating over async result cursor #42

Open
gmethvin opened this issue May 15, 2015 · 7 comments
Open

IndexOutOfBoundsException when iterating over async result cursor #42

gmethvin opened this issue May 15, 2015 · 7 comments

Comments

@gmethvin
Copy link

I get the following error when iterating over a result cursor:

java.lang.IndexOutOfBoundsException: 996
    at scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:43) ~[scala-library-2.11.6.jar:na]
    at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:48) ~[scala-library-2.11.6.jar:na]
    at com.rethinkscala.net.RethinkIterator.next(DefaultCursor.scala:29) ~[core_2.11-0.4.7.jar:0.4.7]
    at scala.collection.Iterator$class.foreach(Iterator.scala:750) ~[scala-library-2.11.6.jar:na]
    at com.rethinkscala.net.RethinkIterator.foreach(DefaultCursor.scala:9) ~[core_2.11-0.4.7.jar:0.4.7]
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) ~[scala-library-2.11.6.jar:na]
    at com.rethinkscala.net.DefaultCursor.foreach(DefaultCursor.scala:94) ~[core_2.11-0.4.7.jar:0.4.7]
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) ~[scala-library-2.11.6.jar:na]
    at com.rethinkscala.net.DefaultCursor.map(DefaultCursor.scala:94) ~[core_2.11-0.4.7.jar:0.4.7]

My code looks something like:

emailsTable.filter(f => (f \ "userId") === userId).map(f => (f \ "messageId").string).run.map { ids: Seq[String] =>
  val messageIds = ids.map(MessageId(_)).toSet
  // ...
}

It seems like basically anything that iterates over the result cursor (map, toSet, etc.) has the potential to cause this error for me. I'm also using the async connection, so do we know that https://github.com/kclay/rethink-scala/blob/master/core/src/main/scala/com/rethinkscala/net/DefaultCursor.scala#L26 will complete before indexing into the chunks array? It's hard to tell from the code.

@gmethvin
Copy link
Author

I'm noticing this on basically any large table I have, and I've found that sometimes adding log statements will prevent this from happening. So my guess is there's some kind of race condition.

@kclay
Copy link
Owner

kclay commented Jun 1, 2015

Still looking into this one.

@gmethvin
Copy link
Author

gmethvin commented Jun 1, 2015

Also, am I correct that the result cursor blocks to get more results from the database, even with the async API? It'd be ideal to have an API that returns an Enumerator instead.

@kclay
Copy link
Owner

kclay commented Jun 1, 2015

In some cases rethinkdb will chunk up your results. Lets say you have a table with 1k rows but each row is say 5kb. Then rethinkdb would return all 1k rows at once.

Now lets say you have the same number of rows but each row is 15kb. Rethinkdb would then chunk up the results , so on first fetch return 100 rerows , next fetch 100 rows and so on. This is were the custom Iterator comes into play in the driver, it tries to fetch these rows since rethinkdb didn't return all the results on initial fetch. From asking in the rethinkdb irc channel this is the expected outcome in these cases and the official drivers handles this. So there is a bug in how the Async driver works (you are the second person that had this same issuse and both were with async). In the blocking mode this error doesnt' seem to happen so it may way be a race condition.

As for the use of an Enumerator. Could you take that request as well as any other Play request and create a "Play Support" issues for it. Along with all the features you would like to have in the driver for play, as well as some usecases. I know you asked for play-json support but I do have some issues with trying to support Group/Ungroup and Datetime serialization since rethinkdb wraps these objects in an nested object with some metadata.

@gmethvin
Copy link
Author

gmethvin commented Jun 1, 2015

@kclay Using iteratees isn't really about "Play support" to me. The point is that I'm able to enumerate the data asynchronously. It doesn't matter that much to me if you use Play's iteratee library or some other iteratee or reactive stream implementation.

@kclay
Copy link
Owner

kclay commented Jun 8, 2015

@gmethvin can you provide a test case for this, I'm having a hard time getting this to reproduce. I know the bug is there but can't create a case to show it.

@gmethvin
Copy link
Author

gmethvin commented Jul 6, 2015

@kclay I haven't had time to write a complete test case for this. This only happens for me when there are on the order of 10000 elements in the result set, and then only occasionally. I'm not sure what the trick is exactly to causing the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants