You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Really shows how fragile the system could be if required files disappear like this. Perhaps it's good to mirror all files (at least, files that there's no licensing restriction for hosting elsewhere) some other place, and have some method for falling back onto the mirrors if they go missing. I know this has been an issue for TREC in the past with govt shutdowns. This is all the more reason to include hashes for all the files.
Maybe introduce a new class to manage this. Something that could operate like:
dl = util.Downloadable("http://location/1", "http://mirror/1", "http://mirror/2", expected_md5="...")
for line in dl.download_stream():
...
Some download operations (like saving files directly or as tmp files) would work with this paradigm. But how could we handle switching to mirrored version of streamed content if hash doesn't match at end?
The text was updated successfully, but these errors were encountered:
Introduced in #8 - found that dependent file from MS-MARCO has gone missing mysteriously (https://msmarco.blob.core.windows.net/msmarcoranking/qidpidtriples.train.full.tar.gz). Contacted dataset owners but got no response, so self-hosting for now.
Really shows how fragile the system could be if required files disappear like this. Perhaps it's good to mirror all files (at least, files that there's no licensing restriction for hosting elsewhere) some other place, and have some method for falling back onto the mirrors if they go missing. I know this has been an issue for TREC in the past with govt shutdowns. This is all the more reason to include hashes for all the files.
Maybe introduce a new class to manage this. Something that could operate like:
Some download operations (like saving files directly or as tmp files) would work with this paradigm. But how could we handle switching to mirrored version of streamed content if hash doesn't match at end?
The text was updated successfully, but these errors were encountered: