Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc for issue #8420 #14215

Merged
merged 11 commits into from
May 31, 2017
75 changes: 63 additions & 12 deletions docs/guide/db-query-builder.md
Original file line number Diff line number Diff line change
Expand Up @@ -652,10 +652,9 @@ value which will be used as the index value for the current row.

### Batch Query <span id="batch-query"></span>

When working with large amounts of data, methods such as [[yii\db\Query::all()]] are not suitable
because they require loading all data into the memory. To keep the memory requirement low, Yii
provides the so-called batch query support. A batch query makes use of the data cursor and fetches
data in batches.
When working with large amounts of data, methods such as [[yii\db\Query::all()]] are not suitable because they require loading the whole query result into the client's memory. To solve this issue Yii provides batch query support. The server holds the query result, and the client uses a cursor to iterate over the result set one batch at a time.

> Warning: There are known limitations and workarounds for the MySQL implementation of batch queries. See below.

Batch query can be used like the following:

Expand All @@ -670,23 +669,23 @@ foreach ($query->batch() as $users) {
// $users is an array of 100 or fewer rows from the user table
}

// or if you want to iterate the row one by one
// or to iterate the row one by one
foreach ($query->each() as $user) {
// $user represents one row of data from the user table
// data is being fetched from the server in batches of 100,
// but $user represents one row of data from the user table
}
```

The method [[yii\db\Query::batch()]] and [[yii\db\Query::each()]] return an [[yii\db\BatchQueryResult]] object
which implements the `Iterator` interface and thus can be used in the `foreach` construct.
During the first iteration, a SQL query is made to the database. Data are then fetched in batches
The method [[yii\db\Query::batch()]] and [[yii\db\Query::each()]] return an [[yii\db\BatchQueryResult]] object which implements the `Iterator` interface and thus can be used in the `foreach` construct.
During the first iteration, a SQL query is made to the database. Data is then fetched in batches
in the remaining iterations. By default, the batch size is 100, meaning 100 rows of data are being fetched in each batch.
You can change the batch size by passing the first parameter to the `batch()` or `each()` method.

Compared to the [[yii\db\Query::all()]], the batch query only loads 100 rows of data at a time into the memory.
If you process the data and then discard it right away, the batch query can help reduce memory usage.

If you specify the query result to be indexed by some column via [[yii\db\Query::indexBy()]], the batch query
will still keep the proper index. For example,
If you specify the query result to be indexed by some column via [[yii\db\Query::indexBy()]], the batch query will still keep the proper index.

For example:

```php
$query = (new \yii\db\Query())
Expand All @@ -701,3 +700,55 @@ foreach ($query->each() as $username => $user) {
// ...
}
```

#### Limitations of batch query in MySQL <span id="batch-query-mysql"></span>

MySQL implementation of batch queries relies on the PDO driver library. By default, MySQL queries are [`buffered`](http://php.net/manual/en/mysqlinfo.concepts.buffering.php). This defeats the purpose of using the cursor to get the data, because it doesn't prevent the whole result set from being loaded into the client's memory by the driver.

> Note: When `libmysqlclient` is used (typical of PHP5), PHP's memory limit won't count the memory used for result sets. It may seem that batch queries work correctly, but in reality the whole dataset is loaded into client's memory, and has the potential of using it up.

To disable buffering and reduce client memory requirements PDO connection property `PDO::MYSQL_ATTR_USE_BUFFERED_QUERY` can be set to `false`. However, until the whole dataset has been retrieved, no other query can be made through the same connection. This may prevent `ActiveRecord` from making a query to get the table schema when it needs to. If this is not a problem (the table schema is cached already), it is possible to switch the original connection into unbuffered mode, and then back when the batch query is done.

```php
Yii::$app->db->pdo->setAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

// Do batch query

Yii::$app->db->pdo->setAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, true);
```

> Note: For the duration of the batch query, the table or parts of it may become locked, delaying or denying write access for other connections. When using unbuffered queries, try to keep the cursor open for as little time as possible.

If the schema is not cached, or it is necessary to run other queries while the batch query is being processed, you can create a separate unbuffered connection to the database:

```php
$unbufferedDb = new \yii\db\Connection([
'dsn' => Yii::$app->db->dsn,
'username' => Yii::$app->db->username,
'password' => Yii::$app->db->password,
'charset' => Yii::$app->db->charset,
]);
$unbufferedDb->open();
$unbufferedDb->pdo->setAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
```

Queries are created normally. The new connection is used to run batch queries and retrieve results either in batches or one by one:

```php
// getting data in batches of 1000
foreach ($query->batch(1000, $unbufferedDb) as $users) {
// ...
}


// data is fetched from server in batches of 1000, but is iterated one by one
foreach ($query->each(1000, $unbufferedDb) as $user) {
// ...
}
```

When the connection is no longer necessary, it can be closed:

```php
$unbufferedDb->close();
```