Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture doesn't actually capture a website. If it isn't already archived #3

Open
Kreijstal opened this issue Jun 15, 2020 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@Kreijstal
Copy link

if you look at line

(link, self.sync_get(self.SAVE_ENDPOINT + link))

you'll see that this just probes archive.org to know if the site exists, but it doens't actually tell archive.org to archive it. You need to send a POST request like this

curl "https://web.archive.org/save/https://example.com/ ^
  -H "authority: web.archive.org" ^
  -H "origin: https://web.archive.org" ^
  -H "content-type: application/x-www-form-urlencoded" ^
  -H "accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" ^
  -H "sec-fetch-site: same-origin" ^
  -H "sec-fetch-mode: navigate" ^
  -H "sec-fetch-user: ?1" ^
  -H "sec-fetch-dest: document" ^
  -H "referer: https://web.archive.org/save/" ^
  --data-raw "url=https^%^3A^%^2F^%^2Fexample.com^%^2F&capture_outlinks=on^&capture_all=on" 
@Kreijstal Kreijstal changed the title Capture doesn't actually capture a website. Capture doesn't actually capture a website. If it isn't already archived Jun 15, 2020
@apurvmishra99
Copy link
Owner

Thanks for bringing this up! It looks like wayback archive has changed the way they snapshot images again. It should be easy fix though, my only concenrn is that there is no id anymore for the archive, its the timestamp + the url. And that is in the cookies, so slightly more diffcult to access from requests but I will try to fix this.

@apurvmishra99 apurvmishra99 self-assigned this Jun 17, 2020
@apurvmishra99 apurvmishra99 added the bug Something isn't working label Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants