Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attribute 'group' #9

Open
rcoenen opened this issue Oct 21, 2022 · 2 comments
Open

attribute 'group' #9

rcoenen opened this issue Oct 21, 2022 · 2 comments

Comments

@rcoenen
Copy link

rcoenen commented Oct 21, 2022

This happens on some homes, not all:

File "/Users/rdm/Dev/funda-scraper-2022/funda/spiders/funda_spider.py", line 32, in parse_dir_contents
postal_code = re.search(r'\d{4} [A-Z]{2}', title).group(0)
AttributeError: 'NoneType' object has no attribute 'group'

@jellevankerk
Copy link

well it looks like they but something up for bots so the current scrapper will not work anymore

if you do:
`
import requests

url = requests.get("https://www.funda.nl/")

print(url.content)
`

it will say
<h1 class="fd-h1 fd-m-none">Je bent bijna op de pagina die je zoekt</h1>\n <p class="fd-text-size-l--bp-m fd-color-dark-3 fd-m-bottom-none fd-m-top fd-p-right-6xl--bp-m">We houden ons platform graag veilig en spamvrij. Daarom moeten we soms verifi\xc3\xabren dat onze bezoekers echte mensen zijn.</p>

@rcoenen
Copy link
Author

rcoenen commented Oct 21, 2022

Obviously they try to block - but I most definitely have seen this scraper work with some modifications in the config.
As per my original comment; it works on most homes, about 1 out of 20 has a failure due to HTML parsing... not due to captcha/blacklist/anti-scrape stuff...

...the scraper definitely works. But the parser has some issues, sometimes
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants