Skip to content

Commit

Permalink
try to handle redirections with empty content from BNF archives (cf #426
Browse files Browse the repository at this point in the history
, to be tested)
  • Loading branch information
boogheta committed Oct 29, 2021
1 parent 72f5f80 commit 5075f5a
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion hyphe_backend/crawler/hcicrawler/spiders/pages.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,8 @@ def handle_response(self, response):

if self.webarchives:
# Handle transparently redirections from archives to another available timestamp
if response.status == 302:
if response.status == 302 or \
("archivesinternet.bnf.fr" in self.webarchives["url_prefix"] and 300 <= response.status < 400 and not response.body):
redir_url = response.headers['Location']
if redir_url.startswith("/"):
redir_url = "%s%s" % (self.archivehost, redir_url)
Expand Down

0 comments on commit 5075f5a

Please sign in to comment.