You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that WARCRecord isn't always doing the right thing when it reads from the WARC header. I've run into to instances where using the provided methods led to wrong/null content.
See line 201 when WARCRecord reads recordIdentifier. One, might assume that this would return the WARC-Record-ID field, but it doesn't. Instead, this seems like a copy-paste error from how ARCRecord does things. This will always be null in practice
In line 146 is an even more blatant bug as getDigest simply returns null instead of WARC-Payload-Digest
There may be other errors.
The only safe way to access WARC headers is via the getHeaderValue() method. Even that can be tricky if you want to use the constants from WARCConstants as their names aren't always aligned with the WARC spec (e.g. HEADER_KEY_ID really should be HEADER_RECORD_ID line 162)
Some of this seems to have been made possible by the fact that WARCConstants extends ArchiveFileConstants. Seems it might be best to sever this connection.
WARCRecord also implements a deprecated version of WARCConstants, should really fix that while we are at it.
The text was updated successfully, but these errors were encountered:
It seems that WARCRecord isn't always doing the right thing when it reads from the WARC header. I've run into to instances where using the provided methods led to wrong/null content.
recordIdentifier
. One, might assume that this would return theWARC-Record-ID
field, but it doesn't. Instead, this seems like a copy-paste error from how ARCRecord does things. This will always be null in practiceWARC-Payload-Digest
There may be other errors.
The only safe way to access WARC headers is via the
getHeaderValue()
method. Even that can be tricky if you want to use the constants fromWARCConstants
as their names aren't always aligned with the WARC spec (e.g.HEADER_KEY_ID
really should beHEADER_RECORD_ID
line 162)Some of this seems to have been made possible by the fact that
WARCConstants
extendsArchiveFileConstants
. Seems it might be best to sever this connection.WARCRecord
also implements a deprecated version ofWARCConstants
, should really fix that while we are at it.The text was updated successfully, but these errors were encountered: