attribute 'group' #9

rcoenen · 2022-10-21T03:46:09Z

This happens on some homes, not all:

File "/Users/rdm/Dev/funda-scraper-2022/funda/spiders/funda_spider.py", line 32, in parse_dir_contents
postal_code = re.search(r'\d{4} [A-Z]{2}', title).group(0)
AttributeError: 'NoneType' object has no attribute 'group'

jellevankerk · 2022-10-21T05:49:56Z

well it looks like they but something up for bots so the current scrapper will not work anymore

if you do:
`
import requests

url = requests.get("https://www.funda.nl/")

print(url.content)
`

it will say
<h1 class="fd-h1 fd-m-none">Je bent bijna op de pagina die je zoekt</h1>\n <p class="fd-text-size-l--bp-m fd-color-dark-3 fd-m-bottom-none fd-m-top fd-p-right-6xl--bp-m">We houden ons platform graag veilig en spamvrij. Daarom moeten we soms verifi\xc3\xabren dat onze bezoekers echte mensen zijn.</p>

rcoenen · 2022-10-21T15:22:08Z

Obviously they try to block - but I most definitely have seen this scraper work with some modifications in the config.
As per my original comment; it works on most homes, about 1 out of 20 has a failure due to HTML parsing... not due to captcha/blacklist/anti-scrape stuff...

...the scraper definitely works. But the parser has some issues, sometimes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attribute 'group' #9

attribute 'group' #9

rcoenen commented Oct 21, 2022

jellevankerk commented Oct 21, 2022

rcoenen commented Oct 21, 2022 •

edited

Loading

attribute 'group' #9

attribute 'group' #9

Comments

rcoenen commented Oct 21, 2022

jellevankerk commented Oct 21, 2022

rcoenen commented Oct 21, 2022 • edited Loading

rcoenen commented Oct 21, 2022 •

edited

Loading