swarm/storage: Filehasher prototypes #969

nolash · 2018-10-20T10:02:59Z

This PR outlines prototypes for the next generation content hashing engine in swarm.

It includes:

FileHasher - A doubly linked object chain of levels, batches and chunks, where batches are in pools that can be reserved and released at need. Hashes asynchronously.
Collection FileSplitter,FilePadder, FileChunker that attempt to demonstrate the chained writing concept that the end result should culminate in. FileSplitter also tries a different approach for async writing than AltFileHasher and FileHasher
AltFileHasher - A linear, finite size state object where hashing is done in one batch-size buffer for each level. Hashes asynchronously, and currently performs the tests at about 1/3 the speed of FileHasher
ReferenceFileHasher - A single buffer object where hashes of higher levels are prepended to the buffer as hashing is performed. Hashes synchronously (only for correctness).

NOTE: During this work a bug was found in the previous hasher implementations, leading to incorrect hashes in "danglig chunk" configuations. Currently ReferenceFileHasher from this branch is considered to be the source of truth for file hashing in Swarm (perhaps we should merge that to the codebase separately).

TODO: Add a thorough and legible description of how all this actually works.

returns 5fcbddf3030d1a261b80f5a069b731f1f5e90c52df4b18036b43434cda8f3305 regardless of data

Fails on chunksize*128^2 Also hangs on smaller tree pool in bmt

alt filehasher hangs on dangling chunk, wip fix

Hangs intermittently, review concurrency in write state vars

…29+n

…n bench

…cetoknow)

nolash added enhancement discussion do-not-merge hashing labels Oct 20, 2018

nolash self-assigned this Oct 20, 2018

nolash requested a review from zelig as a code owner October 20, 2018 10:02

zelig and others added 23 commits October 20, 2018 12:13

swarm/storage: filehasher = chunker spit + swarm hash

ffd7ff8

chunkhasherstore

5c02b35

swarm/storage: WIP Create splitter test for FileHasher

66d7071

swarm/storage: Refactor with read direct into node buffer

b23cbca

swarm/storage: Add GetBuffer test, level 0 add on fh init

b79bc23

swarm/storage: Use and wrap bmt.SectionWriter for use in filehasher

e73f7cc

swarm/storage: Make Filehasher TestSum pass on complete batch

20a7ae9

returns 5fcbddf3030d1a261b80f5a069b731f1f5e90c52df4b18036b43434cda8f3305 regardless of data

swarm/storage: Remove race condition in nodebuffer

02e5b86

swarm/storage: Hashing completes with both 1 and 2 batches

81403c6

swarm/storage: Filehasher < 1 * batch correct

d559e3c

swarm/storage: Passes sum test

deaac9b

Fails on chunksize*128^2 Also hangs on smaller tree pool in bmt

swarm/storage: Filehasher pass with 4096 * (128^2)

fe6adde

swarm/storage: Possible pyramid fail on chunk*129

9050bc7

swarm/storage: Fix async issue causing different parents for same batch

be400d7

swarm/storage: WIP reference filehasher

8d08b1f

swarm/storage: Proof of bug in Tree/Pyramid for dangling chunks

a55d5e8

swarm/storage: Add missing filehasher ref src file

e044164

swarm/storage: Correct referencehasher

75ff817

swarm/storage: Clean up logging and add comments

a862773

swarm/storage: Add alt filehasher impl, ok up to chunk boundary

d655184

swarm/storage: Correct parent offset calculation

4fa31dc

swarm/storage: Clean up filehasher test

83b3bb5

alt filehasher hangs on dangling chunk, wip fix

swarm/storage: Correct on dangling chunk

fefa180

Hangs intermittently, review concurrency in write state vars

nolash force-pushed the chunker-refactor-deepdirect-alt branch from 84dd25e to 4759c5c Compare October 20, 2018 10:47

nolash mentioned this pull request Oct 20, 2018

swarm/storage: File hasher correctness implementation #918

Closed

nolash added 4 commits October 23, 2018 17:19

WIP Benchmark file hashers

99142ea

swarm/storage: WIP fix missed dangling hang, but hang on *2/*128+n/*1…

25da853

…29+n

swarm/storage: WIP pass all tests but altfilehasher sometimes hangs i…

010cbcc

…n bench

swarm/storage: Add ReferenceFileHasher benchmark (unnecessary, but ni…

0b484f4

…cetoknow)

This was referenced Nov 29, 2018

database rewrite - meta + timeline #1027

Closed

Hashing refactor - meta + timeline #1039

Open

nolash added 4 commits March 7, 2019 15:41

swarm/storage: Resolve hang in AltFileHasher hashing

466ed06

swarm/storage: Prune redundant locks

596dc4a

swarm/storage: Remove more redundant locks

1eef2a9

swarm/storage: WIP hashpool level chan buffer refactor

55b331c

nolash force-pushed the chunker-refactor-deepdirect-alt branch from 3fc60bb to 55b331c Compare March 9, 2019 22:52

nolash added 14 commits March 10, 2019 10:03

swarm/storage: WIP levelWriteC hang on last

07e9efd

swarm/storage: Removed hang up to 4159 bytes

728cc25

swarm/storage: Remove commented code

535365e

swarm/storage: Fixed all hangs, dangle broken

48c0c3e

swarm/storage: Improved trigger propagation

1125d16

swarm/storage: Remove redundant log, avoid reset on cancel

9fc98db

swarm/storage: Add pyramid hasher compare test

fdb12d9

swarm/storage: WIP set up chained writer prototypes

1aa3c0f

swarm/storage: WIP writethrough implemented

3a28588

swarm/storage: WIP disappointing benchmarks

53bd95d

swarm/storage: WIP better benchmark but far off and hashes wrong

25e0d41

swarm/storage: WIP add missing third attempt file, still disappointed

2bd2b8d

swarm/storage: WIP Add comments

10c9e97

swarm/storage: Factor sum to separate function, added write debugs

2c0e5d4

acud removed the do-not-merge label Jul 1, 2019

nolash closed this Nov 10, 2019

nolash deleted the chunker-refactor-deepdirect-alt branch November 10, 2019 13:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

swarm/storage: Filehasher prototypes #969

swarm/storage: Filehasher prototypes #969

nolash commented Oct 20, 2018 •

edited

Loading

swarm/storage: Filehasher prototypes #969

swarm/storage: Filehasher prototypes #969

Conversation

nolash commented Oct 20, 2018 • edited Loading

nolash commented Oct 20, 2018 •

edited

Loading