Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detecting "end" of row Streaming? #37

Closed
ottes opened this issue Nov 25, 2013 · 14 comments
Closed

Detecting "end" of row Streaming? #37

ottes opened this issue Nov 25, 2013 · 14 comments

Comments

@ottes
Copy link

ottes commented Nov 25, 2013

How can i know, that no more rows will come?

var con = new Cassandra.Client({.....}),
con.connect(function () {
 var query = "SELECT * FROM statistics LIMIT 10000000";
 con.streamRows(query, [], function (err, row) {
      console.log(row.get('fieldXY'));
 });
});

I don't want to register a timeout (like "lastReceivedRow") or something, i just want to know, when theres nothing more to stream. Other database-system-drivers would call the callback with null,null. But an event would also be okay.
I dived into code and found out, that the frameParser of the correct connection object would emit some event like this, but even when i think of using this, it smells kind of dirty.

Any suggestions? Or is this some kind of "by design"?

@jorgebay
Copy link
Owner

Its indeed a missing feature, I was planning to add it this week...
Something like:

client.streamRows(query, params, consistency, rowCallback, callback);

//with the definition of the callbacks
client.streamRows(query, params, consistency, function (err, row) {
  //gets called per row - AS IS
}, function (err) {
  //gets executed when there are no more rows to stream
});

@ottes
Copy link
Author

ottes commented Nov 25, 2013

Yeah, looks nice like this..... alternatively callback with a null-row and null-error, like MongoDb (Cursor.forEach()) does it. FoundationDB does it like this too (LazyIterator)

@bitcloud
Copy link
Contributor

I think the null-row and null-error approach is a bit better because in the other approach you would have some kind of redundant error handling.

@jorgebay
Copy link
Owner

mmm... as you said, I agree that the method signature I'm proposing is no good...

I see 2 possibilities:
A: async.each like:

client.streamRows(query, params, consistency, function (n, row) {
 //n is the index of the row
 //row object
}, function (err, totalRows) {
 //the error is only yielded here
 //the consumer gets the amount of rows without the need of an internal counter
 //for example: if (!totalRows) res.send(404, 'These arent the droids you are looking for');
});

B: cursor.each (mongodb) like:

client.streamRows(query, params, consistency, function (err, row) {
  //if !err && !row is OEF
});

I see a some use cases

  • Case 1: there are n rows, the response needs to end at the end.
  • Case 2: no rows and there is the need to return a HTTP 404 status code.
  • Case 3: there is an error (ie: could not achieved consistency).

I like possibility A the best.

What do you think?

@bitcloud
Copy link
Contributor

I would favor B to have a similar function signature as the other db's (to be kind of db agnostic with the callback functions).
As well because I'm not quite sure if returning an index and totalRows isn't to specific to those use cases that pipe the data directly to the client.

We directly stream data thru different filters that would reduce the resultset. And we are collecting data from different queries that we push all into the same stream. So we won't need those numbers because we have to check ourselves anyway.

A nice interface would be to have the same function for single data callback and streaming as well.
So you could return an eventEmitter and emit a data event for every data blob if there is an listener attached while buffering the data if there is no listener attached and returning the data in the callback that is called when the query is done.

riak-js is using an event emitter to return streamed data as far as i know.

Perhaps I could write some sample code to explain it in a little more detail if it is unclear. With this approach it should be very easy to push the data in a stream.

@bitcloud
Copy link
Contributor

An example usage could be

function test = function(query) {
  stream = cassandra.getRows(query, function(err, data) {
    // query finished
    // result in 'data'
  });

  stream.on('data', function(data) { 
    //streamed result in 'data' 
  });
}

@ottes
Copy link
Author

ottes commented Nov 25, 2013

I personally prefer the async version and would be ok with the other-db-like one. Dont like the EventEmitter thing that much, but it would work for me too

@jorgebay
Copy link
Owner

(being data a row object right?)
mmm... Nice!
Very similar to node-mysql's query().stream() functionality.

@jorgebay
Copy link
Owner

Maybe the 2 methods:

//async.each type
client.eachRow(query, [params], [consistency], rowCallback, endCallback);

//node-mysql stream like
var stream = client.stream(query, [params], [consistency], endCallback);
//a readable streams2 

What do you think?
2xWork = 2xWin?

@bitcloud
Copy link
Contributor

Not quite sure when I have enough time to work on that, but will look into it this week.

@jorgebay
Copy link
Owner

I think it would be best to first add the callback for EOF (no more rows) to the existent streamRows() method, to close this issue and make the method "useful".

I will create a new issue for the "Streaming Api Changes" for the client.stream (and rename streamRows to eachRow)

@bitcloud
Copy link
Contributor

Made some small changes for the additional null callback:
https://github.com/bitcloud/node-cassandra-cql/compare/stream-null-callback
If you think this is the right approach on that I could continue working on that and send you a pull request

@jorgebay
Copy link
Owner

I thought we agreed on the extra callback for the streamRows method, similar to async.each (that in the future will change name to something like eachRow), and then implement the client.stream() (#39). Sorry for the misunderstanding.

The code you wrote is awesome, but the discussion is on a functional level.

I would definitely accept any pull request from you if it's aligned with the functionality.

This was referenced Nov 26, 2013
@jorgebay
Copy link
Owner

Implemented by @bitcloud.
Merged and released on npm under version 0.3.2, check package.json for contributors :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants