Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WARC-Cipher-Suite and WARC-Protocol #44

Closed
wants to merge 15 commits into from
Closed

Conversation

NGTmeaty
Copy link
Collaborator

This is based on zstd so merge that first, but we are adding two WARC headers to records when appropriate.

NGTmeaty and others added 12 commits August 4, 2024 23:27
This commit allows an external ZSTD generated dictionary to be used in the compression process. This implementation will be spec complaint against the IIPC spec and currently works with all known ZSTD WARC tools. It is currently a WIP and needs additional testing and validation to ensure everything is working correctly.
warc/client.go:15:25: struct of size 176 could be 144
warc/client.go:31:23: struct of size 232 could be 216
warc/dedupe.go:23:20: struct with 32 pointer bytes could be 24
warc/dedupe.go:31:20: struct with 48 pointer bytes could be 40
warc/dialer.go:24:19: struct with 168 pointer bytes could be 160
warc/random_local_ip.go:16:19: struct with 24 pointer bytes could be 8
warc/spooled.go:40:22: struct of size 80 could be 72
warc.go:15:22: struct with 72 pointer bytes could be 64
write.go:19:13: struct with 64 pointer bytes could be 56
warc/write.go:32:18: struct with 40 pointer bytes could be 32
warc/client.go:15:25: struct with 96 pointer bytes could be 88
warc/client.go:31:23: struct with 176 pointer bytes could be 168
@NGTmeaty NGTmeaty self-assigned this Sep 13, 2024
@CorentinB CorentinB added enhancement New feature or request labels Sep 13, 2024
@CorentinB CorentinB linked an issue Sep 13, 2024 that may be closed by this pull request
CorentinB and others added 3 commits September 26, 2024 05:14
chore: remove logrus usage
* add: DisableIPv4, DisableIPv6

* remove: unrelated broken test

* add: Payload-Digest checks for DisableIPv4 / IPv6 tests
@NGTmeaty NGTmeaty closed this Sep 26, 2024
@NGTmeaty NGTmeaty deleted the warc-http-headers branch September 26, 2024 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add WARC-Cipher-Suite and WARC-Protocol WARC headers
2 participants