Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a row iterator #24

Merged
merged 29 commits into from
Feb 7, 2018
Merged

Add a row iterator #24

merged 29 commits into from
Feb 7, 2018

Conversation

withinboredom
Copy link
Contributor

Allows efficiently iterating over a result set.

xyu
xyu previously requested changes Feb 2, 2018
Copy link
Contributor

@xyu xyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not packaged this to test on a sandbox yet but if I'm reading this correctly think how we are discarding array items is reversed.

Overall this is a great idea! Let's get it tested and merged so we can start using it. :)

*/
public function next() {
$this->location ++;
array_pop( $this->buffer );
Copy link
Contributor

@xyu xyu Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be array_shift() so that we grab and discard the first element in the buffer? Otherwise if we get a resultset of 6 rows in two chunks like so:

---
1
2
3
---
4
5
6
---

Iterating through the set would end up producing (I think):

1
1
1
4
4
4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f818232

}

try {
$this->buffer = array_merge( $this->buffer, $this->stream->fetch( 100 ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is array_merge() needed here seeing as we already know $this->buffer is empty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I guess maybe we can fetch even less rows at a time given that we are going to iterate over the results to reduce memory consumption even more. But 100 is not very many so maybe not worth it to try and tune it any further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 703371f.

*
* @param string $queryStr
*
* @return \ThriftSQL\ThriftStream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor but seeing as this is already namespaced within \ThriftSQL why name the iterator \ThriftSQL\ThriftStream and not something more obvious like simply \ThriftSQL\Iterator or \ThriftSQL\ResultIterator? After all it's a straight up implementation of \Iterator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to \ThriftSQL\Utils\Iterator in 519e794

README.md Outdated

// Try out a Hive query
$hiveIterator = new \ThriftSQL\Hive( 'hive.host.local', 10000, 'user', 'pass' );
$hiveTables = $hive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should be $hiveIterator instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 47d3eac

* @since 5.0.0
* @throws Exception
*/
public function rewind() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some sort of first run flag and throw a warning if the same query is run twice? What's a good way to alert the dev that doing something like the following is actually going to run the query twice which is

  1. Slow
  2. May produce different results in each loop
$res = $thriftsql->getIterator('SELECT ...');

foreach ( $res as $row ) {
  // Do something with $row
}
...
foreach ( $res as $row ) {
  // Do something else with $row
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a feature to disable rewind from happening more then once which prevents queries from being rerun unless \ThriftSQL\ThriftStream::allowRerun(true) is called in 71fc266.

@xyu
Copy link
Contributor

xyu commented Feb 2, 2018

Rebased / merged with latest tagged release to get packagist.org updates working again.

@xyu
Copy link
Contributor

xyu commented Feb 2, 2018

FYI, might be a good idea to merge with #25 first if you are trying to build the phar to test.

xyu added 13 commits February 5, 2018 21:30
We don't need to use array merge because we only fill the buffer once it's
empty already.

We add a class const to set how many rows to keep in the in-memory buffer of
the iterator.
Change private var names to match thrift types that they contain so that it's clearer what they are and we avoid having things that say:

```
$this->query->query( $this->queryStr )
```

Which can be confusing. Also add in strong typing for constructor.
Make the name more obvious by renaming it from:

```
\ThriftSQL\ThriftStream()
```

to:

```
\ThriftSQL\Utils\Iterator()
```
Use an abstract class instead of just an interface so that we don't have to
repeate the `getIterator()` method.
@xyu xyu self-assigned this Feb 5, 2018
@xyu xyu dismissed their stale review February 6, 2018 15:43

Made changes suggested in review.

@xyu
Copy link
Contributor

xyu commented Feb 6, 2018

@withinboredom can you take a look at the latest diff when you have a chance?

xyu added 11 commits February 6, 2018 16:08
Update docs to explain different methods to use ThriftSQL and specify the iterator object as the recommended method to fetch results.
- Don't eat any exceptions, let's bubble them up so that the caller / runner
  can handle them instead of pretending that the query simply returned no
  results or partial results.
- Validity after fetching rows is simply a matter of is there are records in
  the buffer so just return that.
@xyu xyu merged commit 89ea07e into master Feb 7, 2018
@xyu xyu deleted the add/row-iterator branch February 7, 2018 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants