Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify HTML parser for BeautifulSoup to supress warnings. #112

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wjoe
Copy link

@wjoe wjoe commented Dec 9, 2015

Fixes #107

Since at least 4.4.1, BeautifulSoup prints a warning if you don't explicitly specify the parser

/usr/lib64/python2.7/site-packages/bs4/init.py:166: UserWarning: No parser w
as explicitly specified, so I'm using the best available HTML parser for this sy
stem ("lxml"). This usually isn't a problem, but if you run this code on another
system, or in a different virtual environment, it may use a different parser an
d behave differently.

To get rid of this warning, change this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "lxml")

This occurs when returning a list, which is parsed by BeautifulSoup.

As specified in the warning, I've updated the BeautifulSoup line to specify the parser used. I've used the python built-in html.parser rather than lxml so that no extra packages are required. lxml may be slightly faster (though there were no consistent differences in speed in my tests) and handles invalid HTML differently, but this should make no difference for it's uses in wikipedia, so I think it's better to use the default parser.

@elliots-bits
Copy link

elliots-bits commented May 26, 2016

My god, someone please merge it already

@nikicc nikicc mentioned this pull request Aug 4, 2016
2 tasks
@kirai
Copy link

kirai commented Jan 6, 2017

Homebrew version is still not working. Gets the version with lis = BeautifulSoup(html).find_all('li')

@medecau medecau mentioned this pull request Feb 21, 2023
23 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants