Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace idna_convert.class.php with Algo26\IdnaConvert, allow custom idna library #776

Closed
wants to merge 18 commits into from

Conversation

Art4
Copy link
Contributor

@Art4 Art4 commented Jan 25, 2023

Hi all 👋

In #538 we discussed the update or replacement of idna_convert.class.php 0.5.1, which is outdated and probably don't work with PHP 7 or PHP 8 (at least I couldn't make it run). (I made it run. \o/)

This PR will remove the outdated bundled idna_convert.class.php file and will add support for the updated Algo26\IdnaConvert library. This PR fixes #538.

Support for Algo26\IdnaConvert v2 and v3

The idna_convert.class.php has been updated and renamed to Algo26\IdnaConvert, see https://github.com/algo26-matthias/idna-convert
This PR will allow to use version 2 or 3. If the library is installed, it will work out of the box without other configuration. All you need to do is to install the library e.g. with composer:

$ composer require algo26-matthias/idna-convert:^3

I've added the library to the suggest section in the composer.json.

BC: Support for idna_convert v0.5.1

If you would like to use the integrated idna_convert class (v0.5.1) you have to require() the file like before:

require_once($path_to_simplepie_folder . '/idn/idna_convert.class.php');
$simplepie = new \SimplePie\SimplePie();
// ...

This PR will add more jobs in Github Actions to test SimplePie with the Algo26\IdnaConvert library v2 and v3 and v0.5.1 installed.

Support for custom Idna converter

Also this PR adds a new IdnaDomainFilter interface. This will allow us to use a complete other solution by providing an implementation to the Registry.

use SimplePie\Idna\IdnaDomainFilter;

class MyIdnaConverter implements IdnaDomainFilter {
    public function filterDomain(string $decoded): string {
        // do your own encoding here...
        return \idn_to_ascii($decoded);
    }
}

$simplepie = new \SimplePie\SimplePie();
$simplepie->get_registry()->register(IdnaDomainFilter::class, MyIdnaConverter::class);

Follow up

This PR requires #774 to merge, so I can move the idna support from the File class to the HTTP client. On the other hand idna support is broken for PHP 7, so it would possibly ok to just remove it from the File class.

In follow up I plan to use the same concept to improve the mf2 support (#684) and using Requests as HTTP library (#520).

@Art4 Art4 mentioned this pull request Jan 25, 2023
48 tasks
@Art4
Copy link
Contributor Author

Art4 commented Jan 27, 2023

Reading this comment from @rmccue:

The main thing I ask for any changes: Don't break backwards compatibility - BC must be maintained above (almost) everything else. Adding more stuff is fine, refactoring is fine, as is deprecating, but don't remove anything.

I think I should restore the idna_convert.class.php file and make it work to keep BC.

public function set_registry(Registry $registry): void
{
$idnaConverter = $registry->create(IdnaDomainFilter::class);
$parsed = $registry->call(Misc::class, 'parse_url', [$this->link]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why replace a simple pure function like this with indirection via registry? It is unlikely that we would want to swap it and it just makes the code more complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right, but this is the only possibility to mock the Misc class for tests.

We should discuss the whole concept of the Registry in a separate issue and how we should refactore it.

@Art4
Copy link
Contributor Author

Art4 commented Jan 31, 2023

I've created BC support for idna_convert v0.5.1 and have added new tests in Github Actions to test the bundled library with PHP 8.2.

.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
*
* @return string the decoded domain name
*/
public function filterDomain(string $encoded): string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filterDomain does not really say anything about what is being done.

Copy link
Contributor Author

@Art4 Art4 Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming things is hard. 🙂 Any suggestions?

src/Enclosure.php Show resolved Hide resolved
src/Idna/IdnaConverter.php Outdated Show resolved Hide resolved
* @uses idna_convert If available, this will convert an IDN in $url
* @see https://github.com/algo26-matthias/idna-convert
*/
private function create_enclosure(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this out? It seems to me like it goes the opposite way from moving the IDN handling into the HTTP request component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it here, because this is the only place where Enclosures are created and Item already implements RegistryAware. If we want the idna_convert inside Enclosure we would have to implement RegistryAware there too.

@Art4
Copy link
Contributor Author

Art4 commented Feb 1, 2023

Seems like that idna-convert@v0.5.1 has problems encoding some characters. 😕

https://github.com/simplepie/simplepie/actions/runs/4062694656/jobs/6994049348#step:7:54

grafik

@jtojnar
Copy link
Contributor

jtojnar commented Feb 15, 2023

Actually, maybe we could just use idn_to_ascii function when present. And suggest https://github.com/symfony/polyfill-intl-idn in case people do not have intl extension.

@Art4
Copy link
Contributor Author

Art4 commented Feb 15, 2023

Actually, maybe we could just use idn_to_ascii function when present. And suggest https://github.com/symfony/polyfill-intl-idn in case people do not have intl extension.

I've tested native idn_to_ascii() but this function does something different then idna_convert (e.g. ß becomes ss). See https://3v4l.org/HBYUa

Thats why I added IdnaDomainFilter so everyone can use whatever one wants:

use SimplePie\Idna\IdnaDomainFilter;

class MyIdnaConverter implements IdnaDomainFilter {
    public function filterDomain(string $decoded): string {
        return \idn_to_ascii($decoded);
    }
}

$simplepie = new \SimplePie\SimplePie();
$simplepie->get_registry()->register(IdnaDomainFilter::class, MyIdnaConverter::class);

@Art4
Copy link
Contributor Author

Art4 commented Feb 15, 2023

Maybe I should even remove support for Algo26\IdnaConvert v2 and v3 again in favor of IdnaDomainFilter. 🤔

@jtojnar
Copy link
Contributor

jtojnar commented Feb 15, 2023

PHP will default to old version of the standard. It needs to be passed IDNA_NONTRANSITIONAL_TO_ASCII (and INTL_IDNA_VARIANT_UTS46 before PHP 7.4), see https://www.php.net/manual/en/function.idn-to-ascii.php#120917.

@Art4
Copy link
Contributor Author

Art4 commented Feb 15, 2023

PHP will default to old version of the standard. It needs to be passed IDNA_NONTRANSITIONAL_TO_ASCII (and INTL_IDNA_VARIANT_UTS46 before PHP 7.4), see https://www.php.net/manual/en/function.idn-to-ascii.php#120917.

Got it. var_dump(idn_to_ascii('weißenbach', \IDNA_NONTRANSITIONAL_TO_ASCII, \INTL_IDNA_VARIANT_UTS46)); returns the expected xn--weienbach-i1a. https://3v4l.org/5HPJf

@jtojnar
Copy link
Contributor

jtojnar commented Feb 15, 2023

I do not really think allowing choosing a custom IDN library is worth the extra piping. I would just keep it simple and replace the existing code and document the intl requirement:

diff --git a/README.markdown b/README.markdown
index 5500f03c..594d1322 100644
--- a/README.markdown
+++ b/README.markdown
@@ -14,6 +14,7 @@ Requirements
 * PHP 7.2+ (Required since SimplePie 1.8.0)
 * libxml2 (certain 2.7.x releases are too buggy for words, and will crash)
 * One of iconv, mbstring or intl extensions
+* intl extension or [symfony/polyfill-intl-idn](https://github.com/symfony/polyfill-intl-idn) to support IDNs
 * cURL or fsockopen()
 * PCRE support
 
@@ -29,10 +30,8 @@ What comes in the package?
    server for required settings.
 6. `demo/` - A basic feed reader demo that shows off some of SimplePie's more
    noticeable features.
-7. `idn/` - A third-party library that SimplePie can optionally use to
-   understand Internationalized Domain Names (IDNs).
-8. `build/` - Scripts related to generating pieces of SimplePie
-9. `test/` - SimplePie's unit test suite.
+7. `build/` - Scripts related to generating pieces of SimplePie
+8. `test/` - SimplePie's unit test suite.
 
 ### Where's `simplepie.inc`?
 Since SimplePie 1.3, we've split the classes into separate files to make it easier
diff --git a/demo/index.php b/demo/index.php
index 97cf2d15..47bc3178 100644
--- a/demo/index.php
+++ b/demo/index.php
@@ -6,7 +6,6 @@ $starttime = $starttime[1] + $starttime[0];
 // Include SimplePie
 // Located in the parent directory
 include_once('../autoloader.php');
-include_once('../idn/idna_convert.class.php');
 
 // Create a new instance of the SimplePie object
 $feed = new \SimplePie\SimplePie();
diff --git a/demo/test.php b/demo/test.php
index 47bed901..18764106 100644
--- a/demo/test.php
+++ b/demo/test.php
@@ -1,6 +1,5 @@
 <?php
 include_once('../autoloader.php');
-include_once('../idn/idna_convert.class.php');
 
 // Parse it
 $feed = new \SimplePie\SimplePie();
diff --git a/src/Enclosure.php b/src/Enclosure.php
index 124d65b6..26e9133c 100644
--- a/src/Enclosure.php
+++ b/src/Enclosure.php
@@ -221,7 +221,7 @@ class Enclosure
      * For documentation on all the parameters, see the corresponding
      * properties and their accessors
      *
-     * @uses idna_convert If available, this will convert an IDN
+     * @uses intl extension If available, this will convert an IDN
      */
     public function __construct($link = null, $type = null, $length = null, $javascript = null, $bitrate = null, $captions = null, $categories = null, $channels = null, $copyright = null, $credits = null, $description = null, $duration = null, $expression = null, $framerate = null, $hashes = null, $height = null, $keywords = null, $lang = null, $medium = null, $player = null, $ratings = null, $restrictions = null, $samplingrate = null, $thumbnails = null, $title = null, $width = null)
     {
@@ -251,10 +251,10 @@ class Enclosure
         $this->type = $type;
         $this->width = $width;
 
-        if (class_exists('idna_convert')) {
-            $idn = new \idna_convert();
+        if (function_exists('idn_to_ascii')) {
             $parsed = \SimplePie\Misc::parse_url($link);
-            $this->link = \SimplePie\Misc::compress_parse_url($parsed['scheme'], $idn->encode($parsed['authority']), $parsed['path'], $parsed['query'], $parsed['fragment']);
+            $authority = \idn_to_ascii($parsed['authority'], IDNA_NONTRANSITIONAL_TO_ASCII, INTL_IDNA_VARIANT_UTS46);
+            $this->link = \SimplePie\Misc::compress_parse_url($parsed['scheme'], $authority, $parsed['path'], $parsed['query'], $parsed['fragment']);
         }
         $this->handler = $this->get_handler(); // Needs to load last
     }
diff --git a/src/File.php b/src/File.php
index bc18b8d8..2a92f1b2 100644
--- a/src/File.php
+++ b/src/File.php
@@ -68,10 +68,10 @@ class File
 
     public function __construct($url, $timeout = 10, $redirects = 5, $headers = null, $useragent = null, $force_fsockopen = false, $curl_options = [])
     {
-        if (class_exists('idna_convert')) {
-            $idn = new \idna_convert();
+        if (function_exists('idn_to_ascii')) {
             $parsed = \SimplePie\Misc::parse_url($url);
-            $url = \SimplePie\Misc::compress_parse_url($parsed['scheme'], $idn->encode($parsed['authority']), $parsed['path'], $parsed['query'], null);
+            $authority = \idn_to_ascii($parsed['authority'], IDNA_NONTRANSITIONAL_TO_ASCII, INTL_IDNA_VARIANT_UTS46);
+            $url = \SimplePie\Misc::compress_parse_url($parsed['scheme'], $authority, $parsed['path'], $parsed['query'], null);
         }
         $this->url = $url;
         $this->permanent_url = $url;

@Art4
Copy link
Contributor Author

Art4 commented Feb 15, 2023

I do not really think allowing choosing a custom IDN library is worth the extra piping. I would just keep it simple and replace the existing code and document the intl requirement:

I like the idea. 👍 This would make everything easier. Because of functions_exists() the dependency to IDNA support will be optional, and we can remove the buggy 0.5.1 class. But I think we will have to modify some tests.

@Art4
Copy link
Contributor Author

Art4 commented Feb 16, 2023

@jtojnar Would you create a PR for your proposal? Then I could close this PR in favor of yours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bundled idn/idna_convert.class.php can be replaced with mso/idna-convert
2 participants