Generator for version 5 (name-based SHA1) UUIDs to identify records in generated web corpus MapFiles.
./gradlew build
inside the source directory. The generated JAR file will be in jar/webis-uuid.jar
Command-line usage:
java -jar jar/webis-uuid.jar clueweb12 clueweb12-0200wb-93-16911
API usage:
import de.webis.WebisUUID;
// ...
System.out.println(WebisUUID.generateUUID("clueweb12", "clueweb12-0200wb-93-16911"));
Result: 7f476110-58fd-5698-b104-8b29c3ac6d55
The Python standard library comes with UUID5 support out of the box and does not need this utility. The UUID from the example above can be generated in Python with
import uuid
uuid.uuid5(uuid.NAMESPACE_URL, "clueweb12:clueweb12-0200wb-93-16911")