Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for resumption tokens #11

Merged
merged 4 commits into from
Apr 4, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,8 @@ Filter support:

* `from`: specifies a lower bound for datestamp-based selective harvesting. UTC+0 datetimes must be provided.
* `until`: specifies an upper bound for datestamp-based selective harvesting. UTC+0 datetimes must be provided.
* `resumptionToken`: Includes validation of current verb and filters
* `set`: TBA
* `resumptionToken`: TBA

### List Sets

Expand Down
117 changes: 86 additions & 31 deletions src/Controllers/OaiController.php
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
use SilverStripe\Control\HTTPResponse;
use SilverStripe\Core\Environment;
use SilverStripe\Core\Injector\Injector;
use SilverStripe\ORM\DataList;
use SilverStripe\ORM\PaginatedList;
use SilverStripe\SiteConfig\SiteConfig;
use Terraformers\OpenArchive\Documents\Errors\BadVerbDocument;
use Terraformers\OpenArchive\Documents\Errors\CannotDisseminateFormatDocument;
Expand All @@ -20,6 +20,7 @@
use Terraformers\OpenArchive\Formatters\OaiDcFormatter;
use Terraformers\OpenArchive\Formatters\OaiRecordFormatter;
use Terraformers\OpenArchive\Helpers\DateTimeHelper;
use Terraformers\OpenArchive\Helpers\ResumptionTokenHelper;
use Terraformers\OpenArchive\Models\OaiRecord;
use Throwable;

Expand Down Expand Up @@ -56,9 +57,9 @@ class OaiController extends Controller
'oai_dc' => OaiDcFormatter::class,
];

private static string $supportedProtocol = '2.0';
private static string $supported_protocol = '2.0';

private static string $supportedDeletedRecord = self::DELETED_SUPPORT_PERSISTENT;
private static string $supported_deleted_record = self::DELETED_SUPPORT_PERSISTENT;

/**
* All dates provided by the OAI repository must be ISO8601, and with an additional requirement that only "zulu" is
Expand All @@ -67,7 +68,20 @@ class OaiController extends Controller
*
* @see http://www.openarchives.org/OAI/openarchivesprotocol.html#Dates
*/
private static string $supportedGranularity = 'YYYY-MM-DDThh:mm:ssZ';
private static string $supported_granularity = 'YYYY-MM-DDThh:mm:ssZ';

/**
* For verbs that use Resumption Tokens, this is the configuration that controls how many OAI Records we will load
* into a single response
*/
private static string $oai_records_per_page = '100';

/**
* The expiration time (in seconds) of any resumption tokens that are generated. Default is 60 minutes
*
* Set this to null if you want an infinite duration
*/
private static ?int $resumption_token_expiry = 3600;

public function index(HTTPRequest $request): HTTPResponse
{
Expand Down Expand Up @@ -122,11 +136,11 @@ protected function Identify(HTTPRequest $request): HTTPResponse
// Base URL defaults to the current URL. Extension point is provided in this method
$xmlDocument->setBaseUrl($this->getBaseUrl($request));
// Protocol Version defaults to 2.0. You can update the configuration if required
$xmlDocument->setProtocolVersion($this->config()->get('supportedProtocol'));
$xmlDocument->setProtocolVersion($this->config()->get('supported_protocol'));
// Deleted Record support defaults to "persistent". You can update the configuration if required
$xmlDocument->setDeletedRecord($this->config()->get('supportedDeletedRecord'));
$xmlDocument->setDeletedRecord($this->config()->get('supported_deleted_record'));
// Date Granularity support defaults to date and time. You can update the configuration if required
$xmlDocument->setGranularity($this->config()->get('supportedGranularity'));
$xmlDocument->setGranularity($this->config()->get('supported_granularity'));
// You should set your env var appropriately for this value
$xmlDocument->setAdminEmail(Environment::getEnv(OaiController::OAI_API_ADMIN_EMAIL));
// Earliest Datestamp defaults to the Jan 1970 (the start of UNIX). Extension point is provided in this method
Expand Down Expand Up @@ -185,38 +199,66 @@ protected function ListRecords(HTTPRequest $request): HTTPResponse
// Request URL defaults to the current URL. Extension point is provided in this method
$xmlDocument->setRequestUrl($this->getRequestUrl($request));

// The lower bound for selective harvesting
$from = $request->getVar('from');
// The upper bound for selective harvesting
$until = $request->getVar('until');
// The lower bound for selective harvesting. The original UTC should be preserved for Resumption Tokens and any
// display requirements
$fromUtc = $request->getVar('from');
// Local value which will be used purely for internal filtering
$fromLocal = null;
// The upper bound for selective harvesting. The original UTC should be preserved for Resumption Tokens and any
// display requirements
$untilUtc = $request->getVar('until');
// Local value which will be used purely for internal filtering
$untilLocal = null;
// Specifies the Set for selective harvesting
$set = (int) $request->getVar('set');
$set = $request->getVar('set');
// An encoded string containing pagination requirements for selective harvesting
$resumptionToken = $request->getVar('resumptionToken');
// Default page is always 1, but this can change later if there is a Resumption Token active
$currentPage = 1;

if ($from) {
if ($fromUtc) {
try {
$from = DateTimeHelper::getLocalStringFromUtc($from);
$fromLocal = DateTimeHelper::getLocalStringFromUtc($fromUtc);
} catch (Throwable $e) {
$xmlDocument->addError(OaiDocument::ERROR_BAD_ARGUMENT, 'Invalid \'from\' date format provided');
}
}

if ($until) {
if ($untilUtc) {
try {
$until = DateTimeHelper::getLocalStringFromUtc($until);
$untilLocal = DateTimeHelper::getLocalStringFromUtc($untilUtc);
} catch (Throwable $e) {
$xmlDocument->addError(OaiDocument::ERROR_BAD_ARGUMENT, 'Invalid \'until\' date format provided');
}
}

if ($resumptionToken) {
try {
$currentPage = ResumptionTokenHelper::getPageFromResumptionToken(
$resumptionToken,
'ListRecords',
$fromUtc,
$untilUtc,
$set
);
} catch (Throwable $e) {
$xmlDocument->addError(OaiDocument::ERROR_BAD_RESUMPTION_TOKEN, $e->getMessage());
}
}

if ($xmlDocument->hasErrors()) {
return $this->getResponseWithDocumentBody($xmlDocument);
}

$oaiRecords = $this->fetchOaiRecords($from, $until, $set, $resumptionToken);
// Grab the Paginated List of records based on our filter criteria
$oaiRecords = $this->fetchOaiRecords($fromLocal, $untilLocal, $set);

if (!$oaiRecords->count()) {
// Set the page length and current page of our Paginated list
$oaiRecords->setPageLength($this->config()->get('oai_records_per_page'));
$oaiRecords->setCurrentPage($currentPage);

// If there are no results after we apply filters and pagination, then we should return an error response
if (!$oaiRecords->Count()) {
$xmlDocument->addError(OaiDocument::ERROR_NO_RECORDS_MATCH);

return $this->getResponseWithDocumentBody($xmlDocument);
Expand All @@ -225,6 +267,23 @@ protected function ListRecords(HTTPRequest $request): HTTPResponse
// Start processing whatever OaiRecords we found
$xmlDocument->processOaiRecords($oaiRecords);

// If there are still more records to be processed, then we need to add a new Resumption Token to our response
if ($oaiRecords->TotalPages() > $currentPage) {
$newResumptionToken = ResumptionTokenHelper::generateResumptionToken(
'ListRecords',
$currentPage + 1,
$fromUtc,
$untilUtc,
$set
);

$xmlDocument->setResumptionToken($newResumptionToken);
} elseif ($resumptionToken) {
// If this is the last page of a request that included a Resumption Token, then we specifically need to add
// an empty Token - indicating that the list is now complete
$xmlDocument->setResumptionToken('');
}

return $this->getResponseWithDocumentBody($xmlDocument);
}

Expand Down Expand Up @@ -283,15 +342,11 @@ protected function getRepositoryName(): string
}

/**
* Regarding dates, please @see $supportedGranularity docblock. All dates passed to this method should already be
* Regarding dates, please @see $supported_granularity docblock. All dates passed to this method should already be
* adjusted to local server time
*/
protected function fetchOaiRecords(
?string $from = null,
?string $until = null,
?int $set = null,
?string $resumptionToken = null
): DataList {
protected function fetchOaiRecords(?string $from = null, ?string $until = null, ?int $set = null): PaginatedList
{
$filters = [];

if ($from) {
Expand All @@ -306,15 +361,15 @@ protected function fetchOaiRecords(
// Set support to be added
}

if ($resumptionToken) {
// Resumption token support to be added
}

if (!$filters) {
return OaiRecord::get();
return PaginatedList::create(OaiRecord::get());
}

return OaiRecord::get()->filter($filters);
$list = OaiRecord::get()
->sort('LastEdited ASC')
->filter($filters);

return PaginatedList::create($list);
}

}
23 changes: 20 additions & 3 deletions src/Documents/ListRecordsDocument.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@

namespace Terraformers\OpenArchive\Documents;

use SilverStripe\ORM\DataList;
use SilverStripe\ORM\PaginatedList;
use Terraformers\OpenArchive\Formatters\OaiRecordFormatter;
use Terraformers\OpenArchive\Helpers\ResumptionTokenHelper;
use Terraformers\OpenArchive\Models\OaiRecord;

class ListRecordsDocument extends OaiDocument
Expand All @@ -21,9 +22,9 @@ public function __construct(OaiRecordFormatter $formatter)
}

/**
* @param DataList|OaiRecord[] $oaiRecords
* @param PaginatedList|OaiRecord[] $oaiRecords
*/
public function processOaiRecords(DataList $oaiRecords): void
public function processOaiRecords(PaginatedList $oaiRecords): void
{
$listRecordsElement = $this->findOrCreateElement('ListRecords');

Expand All @@ -32,4 +33,20 @@ public function processOaiRecords(DataList $oaiRecords): void
}
}

public function setResumptionToken(string $resumptionToken): void
{
$listRecordsElement = $this->findOrCreateElement('ListRecords');
$resumptionTokenElement = $this->findOrCreateElement('resumptionToken', $listRecordsElement);

$resumptionTokenElement->nodeValue = $resumptionToken;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!$resumptionToken) {
return;
}

Can we please add this for the empty resumption token scenario? 👀
I just realised it would cause an error in the decode process($decode = base64_decode($resumptionToken, true);) if we set an empty resumption token in the last page of the response. Looks base64_decode are not able to decode an empty string.

Screen Shot 2022-04-04 at 2 30 40 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @MelissaWu-SS!

Fixed at the source of the issue:
fad9151

$tokenExpiry = ResumptionTokenHelper::getExpiryFromResumptionToken($resumptionToken);

if (!$tokenExpiry) {
return;
}

$resumptionTokenElement->setAttribute('expirationDate', $tokenExpiry);
}

}
1 change: 1 addition & 0 deletions src/Helpers/DateTimeHelper.php
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ public static function getLocalStringFromUtc(string $utcDateString): string
throw new Exception('Invalid UTC date format provided');
}

// Note: strtotime() already converts UTC date strings (UTC+Z) into local timestamps
return date('Y-m-d H:i:s', strtotime($utcDateString));
}

Expand Down
Loading