Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Error 403: Forbidden #4

Open
exportio opened this issue Mar 18, 2019 · 1 comment
Open

HTTP Error 403: Forbidden #4

exportio opened this issue Mar 18, 2019 · 1 comment

Comments

@exportio
Copy link

Traceback (most recent call last):
File "main.py", line 21, in
links = crawler.start()
File "\crawler.py", line 17, in start
self.crawl(self.url)
File "\crawler.py", line 26, in crawl
response = urllib.request.urlopen(url)
File "\Python36_64\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "\Python36_64\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "\Python36_64\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "\Python36_64\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "\Python36_64\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "\Python36_64\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

@Aminsaffar
Copy link

Aminsaffar commented Apr 14, 2020

This is probably because of mod_security or some similar server security feature which blocks known spider/bot user agents
change this line of code

		req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
		response = urlopen(req)
		#response = urllib.request.urlopen(url)
		page = str(response.read())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants