Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YouTube channels with unicode names are not accepted by YouTube.getChannelExtractor #435

Closed
sadr0b0t opened this issue Oct 19, 2020 · 1 comment
Labels
bug Issue is related to a bug youtube service, https://www.youtube.com/

Comments

@sadr0b0t
Copy link

Hello, NewPipeExctractor 0.20.1

I am trying to work with YouTube channel which has unicode in its URL https://www.youtube.com/c/СтудияДиафильм

And get ParsingException from NewPipeExtractor. First, I thought that java.net.URL does not want to accept UTF-8 url string, so converted url to ASCII, but got same exception also with ASCII representation

        try {
            String plUrl = "https://www.youtube.com/c/СтудияДиафильм";
            String plUrlAscii = new java.net.URI(plUrl).toASCIIString();
            System.out.println(plUrlAscii);

            java.net.URL url1 = new java.net.URL(plUrl);
            System.out.println(url1.toString());
            java.net.URL url2 = new java.net.URL(plUrlAscii);
            System.out.println(url2.toString());

            YouTube.getChannelExtractor(url2.toString());
        } catch (Exception e) {
            e.printStackTrace();
        }

constructing URL both from UTF-8 url and from ascii-escaped works

System.out: https://www.youtube.com/c/%D0%A1%D1%82%D1%83%D0%B4%D0%B8%D1%8F%D0%94%D0%B8%D0%B0%D1%84%D0%B8%D0%BB%D1%8C%D0%BC
System.out: https://www.youtube.com/c/СтудияДиафильм
System.out: https://www.youtube.com/c/%D0%A1%D1%82%D1%83%D0%B4%D0%B8%D1%8F%D0%94%D0%B8%D0%B0%D1%84%D0%B8%D0%BB%D1%8C%D0%BC
 org.schabi.newpipe.extractor.exceptions.ParsingException: Malformed unacceptable url: https://www.youtube.com/c/%D0%A1%D1%82%D1%83%D0%B4%D0%B8%D1%8F%D0%94%D0%B8%D0%B0%D1%84%D0%B8%D0%BB%D1%8C%D0%BC
     at org.schabi.newpipe.extractor.linkhandler.LinkHandlerFactory.fromUrl(LinkHandlerFactory.java:54)
     at org.schabi.newpipe.extractor.linkhandler.ListLinkHandlerFactory.fromUrl(ListLinkHandlerFactory.java:43)
     at org.schabi.newpipe.extractor.linkhandler.ListLinkHandlerFactory.fromUrl(ListLinkHandlerFactory.java:36)
     at org.schabi.newpipe.extractor.StreamingService.getChannelExtractor(StreamingService.java:253)
...

It much looks like the URL (both unicode and ascii-escaped) is not accepted somewhere here in YouTubeChannelLinkHandlerFactory.getId(url):

            if (id == null || !id.matches("[A-Za-z0-9_-]+")) {
                throw new ParsingException("The given id is not a Youtube-Video-ID");
            }
@AudricV
Copy link
Member

AudricV commented Aug 8, 2023

This bug should have been fixed by #964 (commit 61ce041). The channel you provided is now recognized by the extractor and its extraction works properly.

Closing as fixed.

@AudricV AudricV closed this as completed Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is related to a bug youtube service, https://www.youtube.com/
Projects
None yet
Development

No branches or pull requests

3 participants