Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't fail on libxml errors if the RSD URL can still be found #35

Merged
merged 1 commit into from
May 3, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 19 additions & 4 deletions src/MediawikiApi.php
Original file line number Diff line number Diff line change
Expand Up @@ -79,15 +79,30 @@ public static function newFromApiEndpoint( $apiEndpoint ) {
* @throws RsdException If the RSD URL could not be found in the page's HTML.
*/
public static function newFromPage( $url ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is starting to get a little bit big.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSD could conceivably be done in its own package, then here we'd just have to do something like

$rsd = new Rsd($url);
if (!$rsd->hasApi('MediaWiki')) {
	throw new RsdException( "Unable to find RSD URL in page: $url" );
}
return self::newFromApiEndpoint( $rsd->getApi('MediaWiki')->getApiLink() );

But I reckon methods that can be seen whole on one screen are okay. Although I know some people say 20 lines is about the max.

Up to you; I'm happy to rework. (I don't think splitting bits into other methods in this class is much better than this though.)

// Set up HTTP client and HTML document.
$tempClient = new Client( [ 'headers' => [ 'User-Agent' => 'addwiki-mediawiki-client' ] ] );

// Get the page HTML and extract the RSD link.
$pageHtml = $tempClient->get( $url )->getBody();
$pageDoc = new DOMDocument();

// Try to load the HTML (turn off errors temporarily; most don't matter, and if they do get
// in the way of finding the API URL, will be reported in the RsdException below).
$internalErrors = libxml_use_internal_errors( true );
$pageDoc->loadHTML( $pageHtml );
$link = ( new DOMXpath( $pageDoc ) )->query( 'head/link[@type="application/rsd+xml"][@href]' );
$libXmlErrors = libxml_get_errors();
libxml_use_internal_errors( $internalErrors );

// Extract the RSD link.
$xpath = 'head/link[@type="application/rsd+xml"][@href]';
$link = ( new DOMXpath( $pageDoc ) )->query( $xpath );
if ( $link->length === 0 ) {
throw new RsdException( "Unable to find RSD URL in page: $url" );
// Format libxml errors for display.
$libXmlErrorStr = array_reduce( $libXmlErrors, function( $prevErr, $err ) {
return $prevErr . ', ' . $err->message . ' (line '.$err->line . ')';
} );
if ( $libXmlErrorStr ) {
$libXmlErrorStr = "In addition, libxml had the following errors: $libXmlErrorStr";
}
throw new RsdException( "Unable to find RSD URL in page: $url $libXmlErrorStr" );
}
$rsdUrl = $link->item( 0 )->attributes->getnamedItem( 'href' )->nodeValue;

Expand Down
22 changes: 22 additions & 0 deletions tests/Integration/MediawikiApiTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,28 @@ public function testNewFromPageInvalidHtml() {
MediawikiApi::newFromPage( $nonWikiPage );
}

/**
* Duplicate element IDs break DOMDocument::loadHTML
* @see https://phabricator.wikimedia.org/T163527#3219833
* @covers Mediawiki\Api\MediawikiApi::newFromPage
*/
public function testNewFromPageWithDuplicateId() {
$testPageName = __METHOD__;
$testEnv = TestEnvironment::newInstance();
$wikiPageUrl = str_replace( 'api.php', "index.php?title=$testPageName", $testEnv->getApiUrl() );

// Test with no duplicate IDs.
$testEnv->savePage( $testPageName, '<p id="unique-id"></p>' );
$api1 = MediawikiApi::newFromPage( $wikiPageUrl );
$this->assertInstanceOf( MediawikiApi::class, $api1 );

// Test with duplicate ID.
$wikiText = '<p id="duplicated-id"></p><div id="duplicated-id"></div>';
$testEnv->savePage( $testPageName, $wikiText );
$api2 = MediawikiApi::newFromPage( $wikiPageUrl );
$this->assertInstanceOf( MediawikiApi::class, $api2 );
}

/**
* @covers Mediawiki\Api\MediawikiApi::getRequest
* @covers Mediawiki\Api\MediawikiApi::getClientRequestOptions
Expand Down
16 changes: 16 additions & 0 deletions tests/Integration/TestEnvironment.php
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

use Exception;
use Mediawiki\Api\MediawikiApi;
use Mediawiki\Api\SimpleRequest;

/**
* @author Addshore
Expand Down Expand Up @@ -68,4 +69,19 @@ public function getApi() {
return $this->api;
}

/**
* Save a wiki page.
* @param string $title
* @param string $content
*/
public function savePage( $title, $content ) {

$params = [
'title' => $title,
'text' => $content,
'md5' => md5( $content ),
'token' => $this->api->getToken(),
];
$this->api->postRequest( new SimpleRequest( 'edit', $params ) );
}
}