Dag import functionality only ( silent / no CLI progress ) #7038

ribasushi · 2020-03-26T17:59:11Z

This still works over "loosely defined" .car files
Please refer to the sharness tests for extra info

We can tighten this up if the sentiment is "Postel was wrong"

core/commands/dag/dag.go

ribasushi · 2020-03-28T06:59:14Z

This PR is now retargetted against the export branch to make diff reading easier. Sharness tests are complete, with a few more "pathological cases" pending for a subsequent PR

ribasushi · 2020-03-28T07:43:40Z

:scratchhead:

The following errors were encountered parsing test results:

sharness/sharness.xml
    ParseError at [row,col]:[25710,1] Message: Invalid byte 1 of 1-byte UTF-8 sequence.

Locally the full sharness passes...

ribasushi · 2020-03-28T09:26:06Z

One needs to run make -O -j 10 coverage/sharness_tests.coverprofile test/sharness/test-results/sharness.xml TEST_GENERATE_JUNIT=1 CONTINUE_ON_S_FAILURE=1 locally ( lifted from .circleci ) in order to observe the malformed XML. Regular make test hides all the the STDOUTs.

ribasushi · 2020-03-28T12:32:47Z

Sharness is now properly fixed, I ended up running into another known mis-design along the way:
3e40ce1#diff-15172926c4147422f472139313467179R509-R517

@Stebalien ready for final review, I will re-add the progress stuff once you are happy with the tests / code / etc.

I would also like to hear from @mikeal if he has strong(er) objections to the current general design of the importer: #7038 (comment)

core/commands/dag/dag.go

Stebalien · 2020-03-30T05:14:27Z

core/commands/dag/dag.go

+				if 0 == blockCount%200 {
+					// work around https://github.com/ipfs/go-ds-flatfs/issues/36 for the time being
+					// batch up-to 200 at a time ( fly under MacOS's default of 256 )
+					if err := batch.Commit(); err != nil {


Did you actually run into this problem? ipfs add has the same behavior.

Looking at the code, this is probably because we usually test with the daemon (which raises the file descriptor limit to 8196). Let's not hack a fix in here.

Also note: we can reduce the batch size when constructing the batch.

Yes I have ran into it. Part of the sharness tests runs without a daemon ( as would be seen in the wild in many cases ). If you remove this block - that test will fail.

I didn't realize the batch size can be controlled, will push a fix momentarily.

Welp the batch size can be controlled, and is currently defaulting to 128. But the batch application is carried out async with ParallelBatchCommits concurrent jobs. Combined together they blow through the ulimit since blocks come in at cid-decoding-speed.

Since this is a global variable, the only reasonable way to flip it would be to muck with atomic. and that doesn't sound right.

How should we fix this? Add an option-defaulting-to-0 to ignore the global value of ParallelBatchCommits? Not using a batch at all?

My real concern here is that these are the default limits we're using for ipfs add as well. Given that, we should consider reducing the default batch size (or just fixing flatfs).

I know why...

We flush when we finish adding each individual file.

The default chunk size is 256KiB.

The maximum amount of data we can buffer is 8MiB * NCPU * 2.

That means we can end up with (8196*NCPU*2)/256 outstanding blocks (usually ~256).

Regardless, we should fix the batch: ipfs/go-ipld-format#56.

Then we should benchmark and see how this affects add performance on flatfs.

Do we have prior art for benchmarking? A typical workstation OS with an SSD will not display a difference for obvious reasons.

Unfortunately, no. However, I seem to remember a pretty dramatic difference when messing with these parameters last time, even on an SSD.

That's ok, I'll improvise. Finishing up something else at the moment, will likely not look into this until ~~way later today.~~ I mean tomorrow

Ok, sharness passes now with no ulimit fixes and no workarounds
Had to pull in ipfs/go-merkledag#53 and ipfs/go-ipld-format#55 in order to benefit from ipfs/go-ipld-format#56

Initial testing looks good, will have the final number tomorrow morning, day got too long

core/commands/dag/dag.go

Stebalien · 2020-04-07T22:13:26Z

I've rebased feat/carfile-export-only on master. I was planning on then rebasing this branch on feat/carfile-export-only so I could review the changes, but they appear to have divergent histories.

Stebalien · 2020-04-07T22:19:03Z

I've released a new go-ipld-format that reverts the ErrNotFound changes: https://github.com/ipfs/go-ipld-format/releases/tag/v0.2.0.

ribasushi · 2020-04-08T14:46:47Z

they appear to have divergent histories

Export was merged into it, hence git getting confused. I generally avoid force-pushes during review exactly for this reason, but no biggie.

Something is wrong with sharness locally is why I haven't pushed yet, any moment now...

test/sharness/t0054-dag-car-import-export-data/README.md

test/sharness/t0054-dag-car-import-export.sh

Stebalien · 2020-04-08T18:31:00Z

core/commands/dag/dag.go

+					ret.PinErrorMsg = err.Error()
+				} else if err := node.Pinning.Pin(req.Context, nd, true); err != nil {
+					ret.PinErrorMsg = err.Error()
+				} else if err := node.Pinning.Flush(req.Context); err != nil {


I'd prefer to flush at most once at the end, but let's leave it this way for now. We'll have to think about the safety implications.

Actually you told me I need to pin+flush like that, originally I had it at the end... :)
But yeah, agreed we should punt, knowing more dark corners of the pin infra now. I'll open an issue in a bit referencing this.

I did? I wonder why I said that...

core/commands/dag/dag.go

Stebalien · 2020-04-08T19:02:35Z

LGTM except for the "root seen" flag. It's much easier to remove it now and add it (or something like it) back later. IMO, we'd be better off with ipfs dag import [--dry-run] [--verify=[properties...]] ... (later).

This still works over "loosely defined" .car files Please refer to the sharness tests for extra info We can tighten this up if the sentiment is "Postel was wrong"

ribasushi · 2020-04-08T21:12:40Z

Force-pushed to lose the 3mb of testdata from the commit chain. Addressing the other notes in separate commits shortly.

ribasushi · 2020-04-08T21:43:29Z

@Stebalien the only new thing to review is eff4223

Everything else pushed is sharness cruft

Stebalien

🚀

ribasushi requested review from mikeal, Stebalien and willscott March 26, 2020 17:59

ribasushi mentioned this pull request Mar 26, 2020

DAG Import/Export #7011

Closed

willscott reviewed Mar 26, 2020

View reviewed changes

core/commands/dag/dag.go Outdated Show resolved Hide resolved

core/commands/dag/dag.go Show resolved Hide resolved

core/commands/dag/dag.go Outdated Show resolved Hide resolved

core/commands/dag/dag.go Show resolved Hide resolved

core/commands/dag/dag.go Outdated Show resolved Hide resolved

Stebalien reviewed Mar 27, 2020

View reviewed changes

ribasushi changed the base branch from master to feat/carfile-export-only March 28, 2020 06:33

ribasushi requested a review from Stebalien March 28, 2020 06:59

ribasushi force-pushed the feat/carfile-import-only branch from 29a4ef8 to 5232d39 Compare March 28, 2020 07:04

Stebalien reviewed Mar 30, 2020

View reviewed changes

pooja mentioned this pull request Apr 3, 2020

DAG Import/Export #6870

Closed

ribasushi changed the title ~~Dag import functionality only ( no progress )~~ Dag import functionality only ( silent / no CLI progress ) Apr 6, 2020

ribasushi changed the base branch from feat/carfile-export-only to master April 7, 2020 21:22

Stebalien changed the base branch from master to feat/carfile-export-only April 7, 2020 21:55

Stebalien changed the base branch from feat/carfile-export-only to master April 7, 2020 21:56

ribasushi force-pushed the feat/carfile-import-only branch from acf03d4 to ec08abe Compare April 8, 2020 14:48

ribasushi changed the base branch from master to feat/carfile-export-only April 8, 2020 14:51

ribasushi commented Apr 8, 2020

View reviewed changes

test/sharness/t0054-dag-car-import-export-data/README.md Show resolved Hide resolved

Stebalien requested changes Apr 8, 2020

View reviewed changes

ribasushi force-pushed the feat/carfile-import-only branch from ec08abe to a9c8a23 Compare April 8, 2020 21:09

Dag import functionality only ( no progress )

f1ecf33

This still works over "loosely defined" .car files Please refer to the sharness tests for extra info We can tighten this up if the sentiment is "Postel was wrong"

ribasushi force-pushed the feat/carfile-import-only branch from a9c8a23 to f1ecf33 Compare April 8, 2020 21:11

ribasushi mentioned this pull request Apr 8, 2020

GC concurrent with pinning outputs unknown entries on the CLI #7121

Open

ribasushi added 3 commits April 8, 2020 23:26

Remove PresentInImport tracking as per conversation

eff4223

Harden test - no -z in busybox

911babb

We are running under set -e :picard-facepalm:

befe045

Stebalien approved these changes Apr 8, 2020

View reviewed changes

Stebalien merged commit 2d1d6cd into feat/carfile-export-only Apr 8, 2020

ribasushi deleted the feat/carfile-import-only branch April 8, 2020 21:54

rvagg mentioned this pull request Apr 9, 2020

(WIP) feat: dag import and export to and from CAR files ipfs/js-ipfs#2953

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dag import functionality only ( silent / no CLI progress ) #7038

Dag import functionality only ( silent / no CLI progress ) #7038

ribasushi commented Mar 26, 2020

ribasushi commented Mar 28, 2020

ribasushi commented Mar 28, 2020

ribasushi commented Mar 28, 2020

ribasushi commented Mar 28, 2020

Stebalien Mar 30, 2020

Stebalien Mar 30, 2020

ribasushi Mar 30, 2020

ribasushi Mar 30, 2020

Stebalien Mar 30, 2020

Stebalien Mar 30, 2020

ribasushi Mar 30, 2020

Stebalien Mar 30, 2020

ribasushi Mar 30, 2020 •

edited

Loading

ribasushi Apr 7, 2020

Stebalien commented Apr 7, 2020

Stebalien commented Apr 7, 2020

ribasushi commented Apr 8, 2020

Stebalien Apr 8, 2020

ribasushi Apr 8, 2020

Stebalien Apr 8, 2020

Stebalien commented Apr 8, 2020

ribasushi commented Apr 8, 2020

ribasushi commented Apr 8, 2020

Stebalien left a comment

Dag import functionality only ( silent / no CLI progress ) #7038

Dag import functionality only ( silent / no CLI progress ) #7038

Conversation

ribasushi commented Mar 26, 2020

ribasushi commented Mar 28, 2020

ribasushi commented Mar 28, 2020

ribasushi commented Mar 28, 2020

ribasushi commented Mar 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ribasushi Mar 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien commented Apr 7, 2020

Stebalien commented Apr 7, 2020

ribasushi commented Apr 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien commented Apr 8, 2020

ribasushi commented Apr 8, 2020

ribasushi commented Apr 8, 2020

Stebalien left a comment

Choose a reason for hiding this comment

ribasushi Mar 30, 2020 •

edited

Loading