Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyphe changes ":" in to-be-crawled URLs in "%3A" #248

Closed
Guillaume-Levrier opened this issue Jan 17, 2018 · 3 comments
Closed

Hyphe changes ":" in to-be-crawled URLs in "%3A" #248

Guillaume-Levrier opened this issue Jan 17, 2018 · 3 comments

Comments

@Guillaume-Levrier
Copy link

Here is what I get in my web entity:

ID
4

Name
##rnhub.com /.../page:4

Status
IN

Home page
link http://##rnhub.com/details/news/page%3A4

This obviously goes to a 404, making the web entity uncrawlable. Any solution?

@boogheta
Copy link
Member

example url: https://www.altmetric.com/details/23373063/news/page:1
notes for later:

  • try to add ':' in safechars line 260 of hyphe_backend/lib/urllru.py
  • handle also in front ?

@Guillaume-Levrier
Copy link
Author

Problem solved, thx.

@boogheta
Copy link
Member

let's keep it open so that we can remember to fix it for everyone plz :p

@boogheta boogheta reopened this Jan 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants