Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bucket.getFiles should document (at the top) the on() way of iterating. #673

Closed
jgeewax opened this issue Jun 17, 2015 · 12 comments
Closed
Assignees
Labels
api: storage Issues related to the Cloud Storage API.

Comments

@jgeewax
Copy link
Contributor

jgeewax commented Jun 17, 2015

Right now, in http://googlecloudplatform.github.io/gcloud-node/#/docs/v0.15.0/storage/bucket?method=getFiles

bucket.getFiles(function(err, files, nextQuery, apiResponse) {
  if (nextQuery) {
    // nextQuery will be non-null if there are more results.
    bucket.getFiles(nextQuery, function(err, files, nextQ, apiResponse) {});
  }
});

which is ... absurdly confusing -- I just want to iterate through all the files.

Can we change the examples to document the way to iterate through all files, and abort the traversal early if necessary?

bucket.getFiles().on('data', function(file) {
  console.log(file);
  if (some condition) {
    this.end();
  }
});
@jgeewax jgeewax added enhancement api: storage Issues related to the Cloud Storage API. labels Jun 17, 2015
@jgeewax
Copy link
Contributor Author

jgeewax commented Jun 17, 2015

/cc @stephenplusplus : Kicking to you. I'm watching someone struggle with this right now.... and the recursive traversal stuff is confusing the hell out of them.

@stephenplusplus
Copy link
Contributor

Separate and name the function to make it easier. We can document how to use streams, but it really isn't the solution. Streams are best when you indend to consume and pass-through all of the data to a connecting stream. It would be best to have a "recurse for me, then give me all of the results in my callback that only gets called once" option and avoid the concept of terminating early. The user always has pageSize and maxResults to avoid getting ridiculous size result sets.

@stephenplusplus
Copy link
Contributor

I'll put a better proposal together tomorrow to demo what I'm thinking.

@jgeewax
Copy link
Contributor Author

jgeewax commented Jun 17, 2015

OK -- just the whole recursive calling next query... is confusing. People want to say "call this method again on the next page"...

@stephenplusplus
Copy link
Contributor

just the whole recursive calling next query... is confusing.

Yeah, it is. But it's the most unsurprising implementation. I don't think it would be less confusing to tell the user "your callback is going to be called an unknown amount of times, and each time the result set will be of a different length." I would rather use the callback once after we have all of the results consumed, and not even let the user know the nextQuery/API-determined pagination exists.

As far as the streams go, I added support for the premature ending of a stream as part of the search PR: stephenplusplus@abca6f1 -- after that lands, I'll start to use stream-router throughout the rest of the library.

The only thing about using streams to solve this problem is, it's kind of a weird use of them. The .on('data') API is nicer to deal with, but it's not typically where you see streams being used. A better use case for streams could be something like getting a bunch of data, filtering it down to a set you care about, parsing the metadata out of them and displaying the results in a terminal (bucket.getFiles().pipe(filterFilesOverTenMb).pipe(parseMetadata).pipe(process.stdout)). Streams usually aren't about getting a large set of results, then closing the pipe once you've found the result you're looking for.

Still though, supporting pre-mature closing is probably a safe plan, since the user has ultimate say over how they want to do things.

I still haven't thought of a great API to improve how nextQuery paginating works currently (other than removing it completely), but it's on my mind 😵

@jgeewax
Copy link
Contributor Author

jgeewax commented Jun 18, 2015

I don't think it would be less confusing to tell the user "your callback is going to be called an unknown amount of times, and each time the result set will be of a different length."

But it's ... already being called an unknown number of times with the recursive way, right?


I would rather use the callback once after we have all of the results consumed, and not event let the user know the nextQuery/API-determined pagination exists.

Hmm... I really appreciated the idea of being able to say "when I've gotten all the items in this chunk, go go again"... this.nextPage() would have solved that problem....

To put it in comparison to Python, we have an Iterator that basically lets you do:

for item in bucket.list_objects():
  print item
  if item.name == 'whatever':
    break

Which is paginating stuff behind the scenes. This is (I think) the .on('data', ...) syntax which... I'm OK with .

@stephenplusplus
Copy link
Contributor

But it's ... already being called an unknown number of times with the recursive way, right?

It depends what we're talking about. Going off of "People want to say "call this method again on the next page"...", I thought you meant you wanted getFiles to recurse itself, and execute that callback with each result set returned from the API.

As far as "nextQuery()" vs "nextQuery" being an object, I fought my case in that other issue. I think it's better to give an object to the user, which I think is where we landed. That, plus documenting how to use the streams to early-exit. There will be a big sweep of all the dual methods soon after Search lands. I have to go through and implement the new stream-router, which will be a great time to catch the docs up.

This is (I think) the .on('data', ...) syntax which... I'm OK with .

Yeah, it will work this way.

@jgeewax
Copy link
Contributor Author

jgeewax commented Jun 18, 2015

I thought you meant you wanted getFiles to recurse itself

I think I did mean that, but if it's not acceptable... then being able to say that "we'll run this function for each page until you tell us to stop (or we've gone through all the items)" is fine too...

Bottom line though is... the whole nextQuery thing, and having to re-establish the callback again, or define it elsewhere and pass it in that calls the next query with a reference to itself and all that stuff is... not the way we should be telling users "here's how you page through all the data". It's hugely confusing...

@stephenplusplus
Copy link
Contributor

I'd like to find a higher level solution for the nextQuery process as well. We can change nextQuery the object into a continueProcessing() function. Now I remember (I think) that is where we left off in our last discussion.

@jgeewax
Copy link
Contributor Author

jgeewax commented Jun 18, 2015

Or something like nextQuery.runAgain(); or something of that nature...?

@stephenplusplus
Copy link
Contributor

Something like that might work. I'll be thinking on it 💭

@stephenplusplus
Copy link
Contributor

Just an update: merged #648 so I can get this going.

sofisl pushed a commit that referenced this issue Oct 11, 2022
* fix(docs): describe fallback rest option

Use gapic-generator-typescript v2.15.1.

PiperOrigin-RevId: 456946341

Source-Link: googleapis/googleapis@88fd18d

Source-Link: googleapis/googleapis-gen@accfa37
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiYWNjZmEzNzFmNjY3NDM5MzEzMzM1YzY0MDQyYjA2M2MxYzUzMTAyZSJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
sofisl pushed a commit that referenced this issue Oct 11, 2022
🤖 I have created a release *beep* *boop*
---


## [5.0.1](googleapis/nodejs-language@v5.0.0...v5.0.1) (2022-06-30)


### Bug Fixes

* **docs:** describe fallback rest option ([#673](googleapis/nodejs-language#673)) ([6767804](googleapis/nodejs-language@6767804))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
sofisl pushed a commit that referenced this issue Oct 13, 2022
* fix(docs): describe fallback rest option

Use gapic-generator-typescript v2.15.1.

PiperOrigin-RevId: 456946341

Source-Link: googleapis/googleapis@88fd18d

Source-Link: googleapis/googleapis-gen@accfa37
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiYWNjZmEzNzFmNjY3NDM5MzEzMzM1YzY0MDQyYjA2M2MxYzUzMTAyZSJ9

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
sofisl pushed a commit that referenced this issue Oct 13, 2022
🤖 I have created a release *beep* *boop*
---


## [5.0.1](googleapis/nodejs-language@v5.0.0...v5.0.1) (2022-06-30)


### Bug Fixes

* **docs:** describe fallback rest option ([#673](googleapis/nodejs-language#673)) ([6767804](googleapis/nodejs-language@6767804))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
sofisl pushed a commit that referenced this issue Nov 10, 2022
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
sofisl pushed a commit that referenced this issue Nov 10, 2022
sofisl pushed a commit that referenced this issue Nov 11, 2022
samples: pull in latest typeless bot, clean up some comments

Source-Link: https://github.com/googleapis/synthtool/commit/0a68e568b6911b60bb6fd452eba4848b176031d8
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-nodejs:latest@sha256:5b05f26103855c3a15433141389c478d1d3fe088fb5d4e3217c4793f6b3f245e
sofisl pushed a commit that referenced this issue Nov 17, 2022
* chore: fix release_level to reflect current status

* 🦉 Updates from OwlBot

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* 🦉 Updates from OwlBot

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Benjamin E. Coe <bencoe@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API.
Projects
None yet
Development

No branches or pull requests

2 participants