You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I downloaded a website from Internet Archive using wayback-machine-downloader then created a WARC using warcit with the following command: warcit --fixed-dt 20100212221453 http://domainname.com /dirpath.
It did create a WARC file. I would like to index them into solr using webarchive-discovery. When trying to do so, I get the following error:
2018-08-16 18:22:08 WARN WARCIndexer:414 - Invalid status line: null@28005
2018-08-16 18:22:08 WARN WARCIndexer:414 - Invalid status line: null@40193
2018-08-16 18:22:08 WARN WARCIndexer:414 - Invalid status line: null@79054
Example warc is attached. Can WARCIT be used to convert snapshots downloaded from Internet Archive into WARC format? (Unfortunately, Internet Archive does not provide a way to download WARCs).
Hm, it seems that AUT must not support resource records, can let them know. Can also generate fake response records probably, although that's less ideal..
But, for your use case, you can also use webrecorder.io directly and enter a wayback machine url. Webrecorder will detect that its a wayback machine url and should do the right thing with it.
You'll then be able to download a WARC directly as well.
Can you please provide some additional background around resource records support. Is this related to how they are implementing/using the WARC standards.
I downloaded a website from Internet Archive using wayback-machine-downloader then created a WARC using warcit with the following command:
warcit --fixed-dt 20100212221453 http://domainname.com /dirpath
.It did create a WARC file. I would like to index them into solr using webarchive-discovery. When trying to do so, I get the following error:
I could not load it into to AUT as well.
Example warc is attached. Can WARCIT be used to convert snapshots downloaded from Internet Archive into WARC format? (Unfortunately, Internet Archive does not provide a way to download WARCs).
esports.com.warc.gz
The text was updated successfully, but these errors were encountered: