http error 405 #2

wangxisea · 2017-03-30T09:36:41Z

I got 405 when I ran it. it says: HTTP status code is not handled or not allowed.
Would you mind to take a look? Thanks.

gijs · 2017-04-05T10:02:59Z

This is due to Funda blocking crawlers. Configuring a proxy middleware in Scrapy may help but I didn't try that. Good luck

jobvisser03 · 2017-04-05T10:09:13Z

I tried configuring the settings.py by including:
DOWNLOADER_MIDDLEWARES = { 'funda.middlewares.MyCustomDownloaderMiddleware': 543, 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None, }

This doesn't seem to work unfortunately, is there anything else one needs to take care of? Does anyone have experience with this?

gijs · 2017-04-05T12:03:50Z

I think Funda has recently put some sort of rate limiter in place. It detects robots based on several parameters. They suspected me being a robot anyway, as they prompted me with a captcha.

Scrapy can probably solve the captcha but I didn't look into that.. https://github.com/pombredanne/decaptcha

I'm curious if anyone can get it to work again

aliaamin · 2017-07-09T11:21:26Z

Same problem here, would appreciate if anyone can help wih some tips how to overcome the 405 error.

gijs · 2017-07-09T18:21:26Z

I'm building a Funda scraper based on headless Chrome. Whenever a captcha is detected, the scraper takes a screenshot and sends it to me via Telegram. I can reply with the two words which the scraper uses to solve the captcha and continue scraping.

igorkoehne · 2017-08-24T12:44:33Z

Is your scraper working properly? I was trying to use selenium, but I am not even sure if this would be the correct way to go, since I am just starting in this world. If you could share your code it would be the best thing that happened to me this year o/

gijs · 2017-08-25T05:59:32Z

@igorkoehne (I assume you're talking to me) - unfortunately I cannot share this specific codebase at the moment because it contains a bunch of API keys / needs cleaning up - and I have no time for that now.

In the meantime, Google came up with Puppeteer. Building your own captcha-evading scraper should be even easier using this highlevel API for Headless Chrome. I'm going to rewrite my own scraper to use it, too.

gijs · 2017-08-25T06:05:26Z

In other news, detection of unmodified versions of Headless Chrome seems easy... mostly because headless doesnt have WebGL capabilities which can be sniffed.

If Funda is already detecting Headless Chrome, sticking to Selenium's Chrome Webdriver will be a better option.

Good luck scraping them!

igorkoehne · 2017-08-25T20:10:49Z

Thanks for the tips, I will give it a try!

khpeek · 2017-09-06T21:32:58Z

As a quick reply, the 405 error appears to be the result of fingerprinting of headless browsers by Funda. I managed to circumvent it by (1) changing my user agent (using Scrapy Random User Agent), (2) using the Scrapy Splash plugin.

AntoniosMavropoulos · 2017-09-07T15:14:54Z

Do both (1) and (2) need to be in place?
If yes, could you please post the code that you used?
Thanks!

tangvip · 2017-10-01T21:57:45Z

@khpeek
do you mean that you need to use both methods OR either one of them can solve the problem?
Thanks!

arnabsinha4u · 2018-02-13T18:26:13Z

@tangvip with the usage of just (1) Scrapy Random User Agent, the error persists. Have not tried it with Scrapy Splash plugin

MarcDuQuesne · 2018-03-23T15:02:23Z

Hi folks, any update?

arnabsinha4u · 2018-03-28T08:29:55Z

I have done away with scrapping the website. Instead, I am using RSS feeds which are parameterized and serves the purpose. RSS feeds have the latest details and not historic but ofcourse, over time you can create your own history, should that be a need.

Kalli · 2018-04-27T20:17:05Z

@arnabsinha4u could you please tell me where you find those RSS feeds you mention?
Do you mean something along the lines of these: http://partnerapi.funda.nl/feeds/Aanbod.svc/rss/?type=koop&zo=/amsterdam/

Is there a feed that has the postal codes as well?

Suidgeest · 2018-08-06T08:52:41Z

Hi Kurt, mind posting your latest working code (referring to your comment Sep 6th, 2017) Thank you!

fab343 · 2019-01-28T13:25:17Z

Hi all, any updates on the problem?

fab343 · 2019-01-28T13:33:57Z

I have done away with scrapping the website. Instead, I am using RSS feeds which are parameterized and serves the purpose. RSS feeds have the latest details and not historic but ofcourse, over time you can create your own history, should that be a need.

are you talking about this rss: http://partnerapi.funda.nl/feeds/Aanbod.svc/rss/?type=koop&zo=/amsterdam/ ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http error 405 #2

http error 405 #2

wangxisea commented Mar 30, 2017

gijs commented Apr 5, 2017

jobvisser03 commented Apr 5, 2017

gijs commented Apr 5, 2017

aliaamin commented Jul 9, 2017

gijs commented Jul 9, 2017

igorkoehne commented Aug 24, 2017

gijs commented Aug 25, 2017

gijs commented Aug 25, 2017

igorkoehne commented Aug 25, 2017

khpeek commented Sep 6, 2017

AntoniosMavropoulos commented Sep 7, 2017

tangvip commented Oct 1, 2017

arnabsinha4u commented Feb 13, 2018

MarcDuQuesne commented Mar 23, 2018

arnabsinha4u commented Mar 28, 2018

Kalli commented Apr 27, 2018

Suidgeest commented Aug 6, 2018

fab343 commented Jan 28, 2019

fab343 commented Jan 28, 2019

http error 405 #2

http error 405 #2

Comments

wangxisea commented Mar 30, 2017

gijs commented Apr 5, 2017

jobvisser03 commented Apr 5, 2017

gijs commented Apr 5, 2017

aliaamin commented Jul 9, 2017

gijs commented Jul 9, 2017

igorkoehne commented Aug 24, 2017

gijs commented Aug 25, 2017

gijs commented Aug 25, 2017

igorkoehne commented Aug 25, 2017

khpeek commented Sep 6, 2017

AntoniosMavropoulos commented Sep 7, 2017

tangvip commented Oct 1, 2017

arnabsinha4u commented Feb 13, 2018

MarcDuQuesne commented Mar 23, 2018

arnabsinha4u commented Mar 28, 2018

Kalli commented Apr 27, 2018

Suidgeest commented Aug 6, 2018

fab343 commented Jan 28, 2019

fab343 commented Jan 28, 2019