Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lxml, google description, qwant and removing yandex. #7

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

fliot
Copy link

@fliot fliot commented Feb 4, 2023

Hi,
Nice project, happy to contribute.
Best regards.

@EdmundMartin
Copy link
Owner

Why did you remove Yandex?

@@ -62,12 +63,29 @@ def _check_exceptions(self, res: ScrapeResponse) -> None:
async def scrape(self, req: ScrapeRequest) -> List[SearchResult]:
geo = req.geo if req.geo else "en_GB"
urls = self._paginate(req.term, "", geo, req.count)
headers = {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually get applied to the request - as on line 83 - the headers are overriden by the call to self.user_agent() - which is probably what should be providing the headers.

@@ -14,6 +14,7 @@
install_requires = [
'aiohttp>=3.6.2',
'beautifulsoup4>=4.8.2',
'lxml'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an extra dependency which is probably not strictly required to use the package - also the version is not pinned. It would probably be better to allow the user to provide the html.parser implementation they want and default to 'html.parser' if an implementation is not provided.

@kasnder
Copy link

kasnder commented Oct 16, 2023

Yandex does not work for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants