Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoCardlessPro\Core\Paginator does not work if a $params["before"] filter is used #32

Open
oschonrock opened this issue Oct 19, 2017 · 5 comments

Comments

@oschonrock
Copy link

oschonrock commented Oct 19, 2017

Without query filter GoCardlessPro\Core\Paginator works fine

$paginator = $client->events()->all();

foreach ($paginator as $record)
{
  // do sth with record...works fine,
  // ie can iterate over all records and Paginator makes requests as required
}

but with filter it does not

$paginator = $client->events()->all(["params" => ["before" => "SOMEEVENTID"]]);

foreach ($paginator as $record)
{
 // do sth with record...does not work correctly
 //  ie you only get the first "page" ie 50 records by default.
}

I ended up implementing my own "de-pagination" which is actually surprisingly non-trivial if a filter is used (to do with the reverse chronological order slice).

Did I miss sth? Or this is a straight bug / missing implementation?

@oschonrock oschonrock changed the title provided \Paginator does not work if a $params["before"] filter is used GoCardlessPro\Core\Paginator does not work if a $params["before"] filter is used Oct 20, 2017
@timrogers
Copy link

As you've worked out, the function of all() is to automatically paginate through the paginated list returned by the API for you, but if you specify your own before option, then it's not necessarily clear what you're expecting the library to do (i.e. it's not clear what "direction" you want to go in after your page).

You're going beyond the use case which the library is designed to serve, which assumes that you might specify filters, but not set the pagination options yourself.

Have you thought about filtering events using created_at[gt] instead? That would achieve largely the same thing and would let you use the built-in pagination logic.

@oschonrock
Copy link
Author

Hi Tim

Thanks for your response

This is the primary of the 3 issues I filed, the other 2 are just related thoughts which I came across across while implementing my own de-pagination

What I would "expect it to do" is:

  • if I specify no filter then ->list() => 50 most recent, and ->all() ..well "all" (both in reverse chronological order, or whatever the default is)

that all works fine ..

However

  • if I specify [before => someid] (which actually means "after" see naming of ListResponse filters "before" and "after" is counterintuitive #34 ) then list() => should give me the 50 most recent "just after someid" . It doesn't. It gives me the "50 most recent", essentially the same as without the filter, because the filter doesn't kick in (assuming there are more than 50 recrords "after" someid)

The problem gets worse when you use ->all() with [before => someid] which should "logically" (?) give me "all after someid" ..but it doesn't ...it gives me "50 most recent" and then stops (so it is the same as ->list(before=>50) which is the same as ->list() ..making filters all a bit pointless?

As far as I can see all() and list() are broken with any filter (the reason is to do with the paginator and the way it tries to "fetch next page". Please try it to see what I mean)

Using "created_at" doesn't change anything, has the same problems, and has the added disadvantage that it is potentially non-deterministic (ie there could be 2 with same timestamp).

really if list() and all() in your client are not usable with [somefilter => value] then the code should throw at least an Exception('filters do not work with all() and list()'). (where do they work?)

That would be truthful. Or find a way to make it work.

When I implemented my own de-paginator (ie ->all()) i found I needed to handle the "no filter" and "some filter" cases completely different. They need to use inverted logic in 2 or 3 places. That's why you Paginator fails, because it does not understand this.

@timrogers
Copy link

I've just been having a play around myself to investigate a bit further the issues you're raising. I'm starting with listing the customers in my account. Here's a copy of my scratchpad:

<?php
require 'vendor/autoload.php';

$gocardless = new \GoCardlessPro\Client(['access_token' => 'redacted', 'environment' => \GoCardlessPro\Environment::LIVE]);

// As expected, I see outputted the emails of my most recent 50 customers, since
// `list()` does not automatically paginate.
$customers = $gocardless->customers()->list();

foreach ($customers->records as $customer) {
    echo $customer->email . "\n";
}

// I can add a `limit` and some filtering to that, and all works as expected
$customers = $gocardless->customers()->list(["params" => ["created_at[lt]" => "2017-05-01T09:30:00Z", "limit" => 5]]);

foreach ($customers->records as $customer) {
    echo $customer->email . "\n";
}

// If I use a `before` with `list()` to get records earlier in the list (i.e. more
// recent records, since we use reverse-chronological ordering), I get the
expected results (i.e. I see the emails of the five customers created most
// recently after CU000728W3MBQ0)
$customers = $gocardless->customers()->list(["params" => ["limit" => 5, "before" => "CU000728W3MBQ0"]]);

foreach ($customers->records as $customer) {
    echo $customer->email . "\n";
}

// Now, let's try auto-paginating with `all()`. I see all of my customers emails
// outputted.
$customers = $gocardless->customers()->all();

foreach ($customers as $customer) {
    echo $customer->email . "\n";
}

// With a `limit` specified, I get the same output, but it takes a bit longer
// to generate.
$customers = $gocardless->customers()->all(["params" => ["limit" => 5]]);

foreach ($customers as $customer) {
    echo $customer->email . "\n";
}

// Let's start experimenting with filters. If I set a `created_at[gt]` filter, the
// automatic pagination still works, and I see all of the matching customers.
$customers = $gocardless->customers()->all(["params" => ["created_at[gt]" => "2015-11-03T09:30:00Z"]]);

foreach ($customers as $customer) {
    echo $customer->email . "\n";
}

// If I add a `before`, then it doesn't paginate automatically, rather it just does
// the first page. I think this makes some kind of sense since if you're using
// `before`, it suggests you're trying to control the pagination and are going
// beyond what the library supports. (You're essentially trying to use `before`
// as a filter, rather than relying on the library's built-in pagination.)
$customers = $gocardless->customers()->all(["params" => ["created_at[gt]" => "2015-11-03T09:30:00Z", "before" => "CU00071R7A40EG"]]);

foreach ($customers as $customer) {
    echo $customer->email . "\n";
}

// If I specify an `after`, it goes a bit mad because `Paginator` resets this as
// part of its operation - see the `initial_response()` method.
$customers = $gocardless->customers()->all(["params" => ["created_at[gt]" => "2015-11-03T09:30:00Z", "after" => "CU00071R7A40EG"]]);

foreach ($customers as $customer) {
    echo $customer->email . "\n";
}

The conclusions I've come to are as follows - do let me know if anything is wrong or doesn't make sense:

  • Filtering (as distinct from pagination with before and after) works with both list() and all(), returning the expected results and, in the case of all(), maintaining the filters while paginating
  • I'm not seeing the same behaviour as you when specifying before or after with list() - I see the results I expect, with the correct records returned and my setting respected
  • You can't successfully specify a before or after when using all() because it interferes with the operation of the pagination object. When I specify a before, it only shows the first page. When I specify an after, it just ignores it. We should either (a) throw an exception when you try to specify these or (b) fix the pagination so it can support them.

You're right that using the created_at filtering isn't quite a replacement for using before and after because, as you say, it is non-deterministic. How are you handling things on your end? Do you keep a record of the last event you processed for an account, and then use that?

@timrogers
Copy link

In terms of improving the situation, I'd be in favour of making the library paginate in the opposite direction if you specify a before option to all() - I'm going to make a ticket with a view to doing that.

@oschonrock
Copy link
Author

oschonrock commented Oct 31, 2017

Thanks for looking into this. I think we are beginning to understand each other.

I have also made some stripped down tests:

$events = $client->events()->list(['params' => ['limit' => 20]]);

echo "\nlist(limit=20)\n";
foreach ($events->records as $event)
{
  echo $event->id . ' ' . $event->created_at . "\n";
}

echo "\nlist(limit=5, before=EV006F34D0YD8K)\n";
$events = $client->events()->list(['params' => ['limit' => 5, 'before' => 'EV006F34D0YD8K']]);

foreach ($events->records as $event)
{
  echo $event->id . ' ' . $event->created_at . "\n";
}

echo "\nall(before=EV006F34D0YD8K) .. while hacking Paginator to grab pages of 5 record at a time\n";
$events = $client->events()->all(['params' => ['before' => 'EV006F34D0YD8K']]);

foreach ($events as $event)
{
  echo $event->id . ' ' . $event->created_at . "\n";

which gives the following output:

list(limit=20)
EV006GVVJR0QSA 2017-10-30T16:47:16.694Z
EV006GKHM55580 2017-10-30T16:01:25.326Z
EV006GHP65W5MC 2017-10-30T13:19:32.330Z
EV006GHP63MRKA 2017-10-30T13:19:32.304Z
EV006GD8PS83TB 2017-10-30T11:08:44.521Z
EV006G8VDP19EW 2017-10-29T17:25:03.272Z
EV006G49S3NNPK 2017-10-27T15:16:18.222Z
EV006G41S9V4NH 2017-10-27T15:01:17.203Z
EV006FXT9F66R0 2017-10-27T10:08:27.379Z
EV006FW9C2XAH6 2017-10-27T08:24:37.606Z
EV006FC1S0Q06W 2017-10-26T19:32:10.915Z
EV006F6YB9DDCS 2017-10-26T15:18:55.517Z
EV006F34DX5QGQ 2017-10-26T11:42:10.504Z
EV006F34DVAR41 2017-10-26T11:42:10.490Z
EV006F34DR20QZ 2017-10-26T11:42:10.477Z
EV006F34DNMRRP 2017-10-26T11:42:10.463Z
EV006F34DJWW5M 2017-10-26T11:42:10.450Z
EV006F34D0YD8K 2017-10-26T11:42:10.334Z
EV006EW7QWYPJQ 2017-10-25T21:26:20.934Z
EV006EVM2RH9K8 2017-10-25T15:20:07.726Z

list(limit=5, before=EV006F34D0YD8K)
EV006F34DX5QGQ 2017-10-26T11:42:10.504Z
EV006F34DVAR41 2017-10-26T11:42:10.490Z
EV006F34DR20QZ 2017-10-26T11:42:10.477Z
EV006F34DNMRRP 2017-10-26T11:42:10.463Z
EV006F34DJWW5M 2017-10-26T11:42:10.450Z

all(before=EV006F34D0YD8K) .. while hacking Paginator to grab pages of 5 record at a time
EV006F34DX5QGQ 2017-10-26T11:42:10.504Z
EV006F34DVAR41 2017-10-26T11:42:10.490Z
EV006F34DR20QZ 2017-10-26T11:42:10.477Z
EV006F34DNMRRP 2017-10-26T11:42:10.463Z
EV006F34DJWW5M 2017-10-26T11:42:10.450Z

While I agree that the "list(limit=5, before=EV006F34D0YD8K)" is arguably correct (ie it's the 5 events just "before" the eventid specifiied -Sorry, I had that wrong in my last comment above)..it does show part of where the problem begins.

When not specifying a filter ->list(limit=5) starts with the most globally recent event matching the (non-existent) criteria. Whereas list(limit=5, before=someid) shows the 5 events starting with event "5 up" from "someid". So when you try to de-paginate (either by calling ->all() or by iteratively calling ->list(limit=5, before=SOMEOTHERID) it doesn't work, because it's non-trivial which SOMEOTHERID to use, whether to use a "before" or "after" filter and whether to append or prepend the second set of records.

For what it's worth, here is my de-pagination method which takes a $last_gc_event_id_processed, which we store each time we batch process events, (or null if we are starting from scratch) and returns all events chronologically "after" that id in "chronologically ascending" order (ie the order we must process). I am not super happy with the code, but it is partially a result of the bending over backwards I had to do, to get this to work.

The key part is the IF statement in the middle of the de-pagination loop, which reverses the logic of whether to use $response->after or $response->before for the second "page" and whether to APPEND or PREPEND that second page to the first page. My code also uses array_reverse in both cases because I want chronologically ascending (for chronologically descending the APPEND/PREPEND logic would need to be reversed). The WHILE loop termination condition is the other interesting part, again requiring a switch on whether $last_gc_event_id_processed was passed or not.

public static function listEvents(GoCardlessOrganisationPaymentMethod $opm, $last_gc_event_id_processed = null)
{
  $client = self::getClient($opm);
  $params = [];
  $limit = 5; // slices of 5, but combined here, because complex and therefore easier for calling code                                                                            
  $params["limit"] = $limit;

  if ($last_gc_event_id_processed !== null)
  {
    $params["before"] = $last_gc_event_id_processed;
  }
  $records = [];
  do
  {
    $response = $client->events()->list(['params' => $params]);
    if ($last_gc_event_id_processed === null)
    {
      // and use after condition when retrieving all (crazy!) ref https://github.com/gocardless/gocardless-pro-php/issues/34                                                          
      $params["after"] = $response->after;
      // prepend the reversed records (crazy!) ref https://github.com/gocardless/gocardless-pro-php/issues/34                                                                         
      $records = array_merge(array_reverse($response->records), $records); // merge in chronologically increasing order                                                               
    }
    else
    {
      // use before condition when last_id given (crazy!)                                                                                                                             
      $params["before"] = $response->before;
      // append the reversed records (crazy!)                                                                                                                                         
      $records = array_merge($records, array_reverse($response->records)); // merge in chronologically increasing order                                                               
    }
  }
  while (count($response->records) == $limit &&                                    // if full set there might be more                                                                 
         (($last_gc_event_id_processed === null && $response->after !== null) ||   // use after condition when retrieving all (crazy!)                                                
          ($last_gc_event_id_processed !== null && $response->before !== null)));  // and use before condition when last_id given (crazy!)                                            

  return $records;
}

Here is me calling that method:

echo "\ncustom depagination listEvents(\$last_gc_event_id_processed = EV006F34D0YD8K) .. results in chrono asc order\n";
$events = GoCardlessPspTransaction::listEvents($opm, 'EV006F34D0YD8K');

foreach ($events as $event)
{
  echo $event->id . ' ' . $event->created_at . "\n";
}

echo "\ncustom depagination listEvents(\$last_gc_event_id_processed = null) .. all results in chrono asc order\n";
$events = GoCardlessPspTransaction::listEvents($opm, null);

foreach ($events as $event)
{
  echo $event->id . ' ' . $event->created_at . "\n";
}

and gives this result:

custom depagination listEvents($last_gc_event_id_processed = EV006F34D0YD8K) c.. result in chrono asc order
EV006F34DJWW5M 2017-10-26T11:42:10.450Z
EV006F34DNMRRP 2017-10-26T11:42:10.463Z
EV006F34DR20QZ 2017-10-26T11:42:10.477Z
EV006F34DVAR41 2017-10-26T11:42:10.490Z
EV006F34DX5QGQ 2017-10-26T11:42:10.504Z
EV006F6YB9DDCS 2017-10-26T15:18:55.517Z
EV006FC1S0Q06W 2017-10-26T19:32:10.915Z
EV006FW9C2XAH6 2017-10-27T08:24:37.606Z
EV006FXT9F66R0 2017-10-27T10:08:27.379Z
EV006G41S9V4NH 2017-10-27T15:01:17.203Z
EV006G49S3NNPK 2017-10-27T15:16:18.222Z
EV006G8VDP19EW 2017-10-29T17:25:03.272Z
EV006GD8PS83TB 2017-10-30T11:08:44.521Z
EV006GHP63MRKA 2017-10-30T13:19:32.304Z
EV006GHP65W5MC 2017-10-30T13:19:32.330Z
EV006GKHM55580 2017-10-30T16:01:25.326Z
EV006GVVJR0QSA 2017-10-30T16:47:16.694Z

custom depagination listEvents($last_gc_event_id_processed = null) .. all results in chrono asc order
EV00001TBS92D9 2014-05-29T11:01:36.794Z
EV00001Z2134NP 2014-05-30T14:50:58.487Z
EV00002ACKJEZJ 2014-06-03T11:01:01.083Z
EV00002C9RMK37 2014-06-03T11:01:29.228Z
EV00002P4WBV58 2014-06-03T11:04:23.968Z
EV00002TXPT62T 2014-06-03T11:06:09.403Z
EV00002WYGMZ7F 2014-06-03T11:06:58.034Z
EV00004Q52NRMY 2014-06-09T11:02:49.998Z
...
[snip very large number of results]
...
EV006G41S9V4NH 2017-10-27T15:01:17.203Z
EV006G49S3NNPK 2017-10-27T15:16:18.222Z
EV006G8VDP19EW 2017-10-29T17:25:03.272Z
EV006GD8PS83TB 2017-10-30T11:08:44.521Z
EV006GHP63MRKA 2017-10-30T13:19:32.304Z
EV006GHP65W5MC 2017-10-30T13:19:32.330Z
EV006GKHM55580 2017-10-30T16:01:25.326Z
EV006GVVJR0QSA 2017-10-30T16:47:16.694Z


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants