Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some characters are discarded in a title search. #108

Open
Jehan opened this issue Nov 29, 2015 · 2 comments
Open

Some characters are discarded in a title search. #108

Jehan opened this issue Nov 29, 2015 · 2 comments

Comments

@Jehan
Copy link

Jehan commented Nov 29, 2015

I encountered a funny behavior of your API.
My test was using French Wikipedia.

wikipedia.set_lang('fr')

If I query the pages "Bos taurus" and "Bos_taurus", they both work fine and are the same. Until now all good:

p1 = wikipedia.page('Bos_taurus')
p2 = wikipedia.page('Bos taurus')
p1.content == p2.content
True

Then if I query "Bœuf (animal)", it is still good, but "Bœuf_(animal)" raises a PageError telling me that "bœuf animal" does not exist. It's like when processing the underscore, the parentheses were discarded (or something else, I don't know, I haven't checked the code).

In [17]: p = wikipedia.page('Bœuf (animal)')
In [18]: p = wikipedia.page('Bœuf_(animal)')
---------------------------------------------------------------------------
PageError                                 Traceback (most recent call last)
<ipython-input-18-108f8ec12884> in <module>()
----> 1 p = wikipedia.page('Bœuf_(animal)')

/home/jehan/.local/lib/python3.4/site-packages/wikipedia/wikipedia.py in page(title, pageid, auto_suggest, redirect, preload)
    274         # if there is no suggestion or search results, the page doesn't exist
    275         raise PageError(title)
--> 276     return WikipediaPage(title, redirect=redirect, preload=preload)
    277   elif pageid is not None:
    278     return WikipediaPage(pageid=pageid, preload=preload)

/home/jehan/.local/lib/python3.4/site-packages/wikipedia/wikipedia.py in __init__(self, title, pageid, redirect, preload, original_title)
    297       raise ValueError("Either a title or a pageid must be specified")
    298 
--> 299     self.__load(redirect=redirect, preload=preload)
    300 
    301     if preload:

/home/jehan/.local/lib/python3.4/site-packages/wikipedia/wikipedia.py in __load(self, redirect, preload)
    343     if 'missing' in page:
    344       if hasattr(self, 'title'):
--> 345         raise PageError(self.title)
    346       else:
    347         raise PageError(pageid=self.pageid)

PageError: Page id "bœuf animal" does not match any pages. Try another id!
@co-sty
Copy link

co-sty commented Dec 7, 2016

Thanks for pointing this out.

There is no text processing going on, the change in text ('Bœuf_(animal)' -> "bœuf animal") comes from Wikipedia's suggestion system (enabled by the auto_suggestion=True parameter of wikipedia.page()). However, the suggestions were not used correctly, which is to be fixed (#131).

@dakshvar22
Copy link

Use auto_suggestion=False as parameter to wikipedia.page() call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants