Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bunkr] Broken extractor #5151

Closed
Yakabuff opened this issue Feb 2, 2024 · 1 comment
Closed

[bunkr] Broken extractor #5151

Yakabuff opened this issue Feb 2, 2024 · 1 comment

Comments

@Yakabuff
Copy link

Yakabuff commented Feb 2, 2024

bunkr now inserts different TLDs into every URL which breaks the extractor.

Currently, we assume that URLs in a bunkr album either

  1. Use the same TLD (starts with /) and we need to extract CDN URL from HTML
  2. Directly links the CDN and the file can be downloaded directly

We will need to add a case where the URL in album does not start with / and is not a CDN URL but also matches base pattern

@Yakabuff
Copy link
Author

Yakabuff commented Feb 2, 2024

                     else:
                         domain = domain.replace("cdn", "media-files", 1)
                     url = urlunsplit((scheme, domain, path, query, fragment))
+                else:
+                    scheme, domain, path, query, fragment = urlsplit(url)
+                    try:
+                        url = self._extract_file(text.unescape(path))
+                    except Exception as exc:
+                        self.log.error("%s: %s", exc.__class__.__name__, exc)
+                        continue

mikf added a commit that referenced this issue Feb 11, 2024
- remove legacy code
- map legacy domains to bunkr.sk
- use input URL domain for newer domains
- update tests (some files got slightly modified or deleted)
@mikf mikf closed this as completed Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants