Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wikipedia: support query strings in Wikipedia URLs #2414

Merged
merged 1 commit into from
Feb 20, 2023

Conversation

SnoopJ
Copy link
Contributor

@SnoopJ SnoopJ commented Feb 19, 2023

Description

This PR fixes #2412 by discarding the query string from the name of the article to be searched.

Checklist

  • I have read CONTRIBUTING.md
  • I can and do license this contribution under the EFLv2
  • No issues are reported by make qa (runs make quality and make test)
  • I have tested the functionality of the things this change touches

sopel/modules/wikipedia.py Outdated Show resolved Hide resolved
@dgw dgw changed the title wikipedia: support query strings in Wikipedia URLs (closes #2412) wikipedia: support query strings in Wikipedia URLs Feb 19, 2023
@dgw dgw added this to the 8.0.0 milestone Feb 19, 2023
@dgw dgw added the Bugfix Generally, PRs that reference (and fix) one or more issue(s) label Feb 19, 2023
@dgw
Copy link
Member

dgw commented Feb 19, 2023

I am foolish for not looking more closely at this earlier. Why not use the parsed URL for the fragment, also? Are you trying to avoid touching too much? If you change the URL pattern to https?:\/\/([a-z]+(?:\.m)?\.wikipedia\.org)\/wiki\/(?!File\:)[^ ]+ it should be possible to offload fetching all the bits to urllib.parse instead of the regex.

(Not a review comment because I'm just trying to understand the intended scope; Exi's comment you referenced was about converting all of the link-matching logic to use urllib.parse, stripping the regex down to only what's necessary to make sure what's been given is a Wikipedia article of some kind.)

Copy link
Member

@dgw dgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're updating comments anyway...

sopel/modules/wikipedia.py Outdated Show resolved Hide resolved
sopel/modules/wikipedia.py Outdated Show resolved Hide resolved
@dgw dgw requested a review from Exirel February 19, 2023 06:47
Copy link
Contributor

@Exirel Exirel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the right moment to push a little further the cleanup of the plugin and remove the use of sopel.tools.web.unquote when it's literally just a shortcut for the stdlib urllib.parse.unquote.

Edit: actually, same for sopel.tools.web.quote. Both are useless now.

sopel/modules/wikipedia.py Outdated Show resolved Hide resolved
sopel/modules/wikipedia.py Outdated Show resolved Hide resolved
@Exirel
Copy link
Contributor

Exirel commented Feb 20, 2023

Alright, all that is left is a squash, since @dgw approved and I checked the changes. I don't think there is enough to require more than one commit, and I don't mind if you lose the co-author from my suggestions (I don't assume you know, but you might, all that to say it's up to you).

…2412)

Co-authored-by: dgw <dgw@technobabbl.es>
Co-authored-by: Florian Strzelecki <florian.strzelecki@gmail.com>
@SnoopJ SnoopJ force-pushed the bugfix/gh2412-support-query-string branch from 2e759af to 9229c20 Compare February 20, 2023 00:03
Copy link
Member

@dgw dgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have put changing the existing imports from sopel.tools.web as out of scope for this, but it doesn't ultimately matter. The quote and unquote wrappers there have comments that we should consider deprecating them, anyway.

@dgw dgw merged commit 44af4a4 into sopel-irc:master Feb 20, 2023
@SnoopJ SnoopJ deleted the bugfix/gh2412-support-query-string branch February 21, 2023 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bugfix Generally, PRs that reference (and fix) one or more issue(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wikipedia: query parameters cause URL parse failure
3 participants