Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dag import functionality only ( silent / no CLI progress ) #7038

Merged
merged 4 commits into from
Apr 8, 2020

Conversation

ribasushi
Copy link
Contributor

This still works over "loosely defined" .car files
Please refer to the sharness tests for extra info

We can tighten this up if the sentiment is "Postel was wrong"

core/commands/dag/dag.go Outdated Show resolved Hide resolved
core/commands/dag/dag.go Show resolved Hide resolved
core/commands/dag/dag.go Outdated Show resolved Hide resolved
core/commands/dag/dag.go Show resolved Hide resolved
core/commands/dag/dag.go Outdated Show resolved Hide resolved
core/commands/dag/dag.go Outdated Show resolved Hide resolved
core/commands/dag/dag.go Outdated Show resolved Hide resolved
core/commands/dag/dag.go Outdated Show resolved Hide resolved
core/commands/dag/dag.go Outdated Show resolved Hide resolved
core/commands/dag/dag.go Outdated Show resolved Hide resolved
core/commands/dag/dag.go Show resolved Hide resolved
@ribasushi ribasushi changed the base branch from master to feat/carfile-export-only March 28, 2020 06:33
@ribasushi
Copy link
Contributor Author

This PR is now retargetted against the export branch to make diff reading easier. Sharness tests are complete, with a few more "pathological cases" pending for a subsequent PR

@ribasushi ribasushi requested a review from Stebalien March 28, 2020 06:59
@ribasushi ribasushi force-pushed the feat/carfile-import-only branch from 29a4ef8 to 5232d39 Compare March 28, 2020 07:04
@ribasushi
Copy link
Contributor Author

:scratchhead:

The following errors were encountered parsing test results:

sharness/sharness.xml
    ParseError at [row,col]:[25710,1] Message: Invalid byte 1 of 1-byte UTF-8 sequence.

Locally the full sharness passes...

@ribasushi
Copy link
Contributor Author

One needs to run make -O -j 10 coverage/sharness_tests.coverprofile test/sharness/test-results/sharness.xml TEST_GENERATE_JUNIT=1 CONTINUE_ON_S_FAILURE=1 locally ( lifted from .circleci ) in order to observe the malformed XML. Regular make test hides all the the STDOUTs.

@ribasushi
Copy link
Contributor Author

Sharness is now properly fixed, I ended up running into another known mis-design along the way:
3e40ce1#diff-15172926c4147422f472139313467179R509-R517

@Stebalien ready for final review, I will re-add the progress stuff once you are happy with the tests / code / etc.

I would also like to hear from @mikeal if he has strong(er) objections to the current general design of the importer: #7038 (comment)

core/commands/dag/dag.go Outdated Show resolved Hide resolved
if 0 == blockCount%200 {
// work around https://github.com/ipfs/go-ds-flatfs/issues/36 for the time being
// batch up-to 200 at a time ( fly under MacOS's default of 256 )
if err := batch.Commit(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you actually run into this problem? ipfs add has the same behavior.

Looking at the code, this is probably because we usually test with the daemon (which raises the file descriptor limit to 8196). Let's not hack a fix in here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note: we can reduce the batch size when constructing the batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I have ran into it. Part of the sharness tests runs without a daemon ( as would be seen in the wild in many cases ). If you remove this block - that test will fail.

I didn't realize the batch size can be controlled, will push a fix momentarily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welp the batch size can be controlled, and is currently defaulting to 128. But the batch application is carried out async with ParallelBatchCommits concurrent jobs. Combined together they blow through the ulimit since blocks come in at cid-decoding-speed.

Since this is a global variable, the only reasonable way to flip it would be to muck with atomic. and that doesn't sound right.

How should we fix this? Add an option-defaulting-to-0 to ignore the global value of ParallelBatchCommits? Not using a batch at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My real concern here is that these are the default limits we're using for ipfs add as well. Given that, we should consider reducing the default batch size (or just fixing flatfs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know why...

  1. We flush when we finish adding each individual file.
  2. The default chunk size is 256KiB.
  3. The maximum amount of data we can buffer is 8MiB * NCPU * 2.

That means we can end up with (8196*NCPU*2)/256 outstanding blocks (usually ~256).

Regardless, we should fix the batch: ipfs/go-ipld-format#56.

Then we should benchmark and see how this affects add performance on flatfs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have prior art for benchmarking? A typical workstation OS with an SSD will not display a difference for obvious reasons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no. However, I seem to remember a pretty dramatic difference when messing with these parameters last time, even on an SSD.

Copy link
Contributor Author

@ribasushi ribasushi Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok, I'll improvise. Finishing up something else at the moment, will likely not look into this until way later today. I mean tomorrow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sharness passes now with no ulimit fixes and no workarounds
Had to pull in ipfs/go-merkledag#53 and ipfs/go-ipld-format#55 in order to benefit from ipfs/go-ipld-format#56

Initial testing looks good, will have the final number tomorrow morning, day got too long

core/commands/dag/dag.go Outdated Show resolved Hide resolved
@pooja pooja mentioned this pull request Apr 3, 2020
@ribasushi ribasushi changed the title Dag import functionality only ( no progress ) Dag import functionality only ( silent / no CLI progress ) Apr 6, 2020
@ribasushi ribasushi changed the base branch from feat/carfile-export-only to master April 7, 2020 21:22
@Stebalien Stebalien changed the base branch from master to feat/carfile-export-only April 7, 2020 21:55
@Stebalien Stebalien changed the base branch from feat/carfile-export-only to master April 7, 2020 21:56
@Stebalien
Copy link
Member

I've rebased feat/carfile-export-only on master. I was planning on then rebasing this branch on feat/carfile-export-only so I could review the changes, but they appear to have divergent histories.

@Stebalien
Copy link
Member

I've released a new go-ipld-format that reverts the ErrNotFound changes: https://github.com/ipfs/go-ipld-format/releases/tag/v0.2.0.

@ribasushi
Copy link
Contributor Author

they appear to have divergent histories

Export was merged into it, hence git getting confused. I generally avoid force-pushes during review exactly for this reason, but no biggie.

Something is wrong with sharness locally is why I haven't pushed yet, any moment now...

@ribasushi ribasushi force-pushed the feat/carfile-import-only branch from acf03d4 to ec08abe Compare April 8, 2020 14:48
@ribasushi ribasushi changed the base branch from master to feat/carfile-export-only April 8, 2020 14:51
test/sharness/t0054-dag-car-import-export.sh Outdated Show resolved Hide resolved
ret.PinErrorMsg = err.Error()
} else if err := node.Pinning.Pin(req.Context, nd, true); err != nil {
ret.PinErrorMsg = err.Error()
} else if err := node.Pinning.Flush(req.Context); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to flush at most once at the end, but let's leave it this way for now. We'll have to think about the safety implications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you told me I need to pin+flush like that, originally I had it at the end... :)
But yeah, agreed we should punt, knowing more dark corners of the pin infra now. I'll open an issue in a bit referencing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did? I wonder why I said that...

core/commands/dag/dag.go Outdated Show resolved Hide resolved
@Stebalien
Copy link
Member

LGTM except for the "root seen" flag. It's much easier to remove it now and add it (or something like it) back later. IMO, we'd be better off with ipfs dag import [--dry-run] [--verify=[properties...]] ... (later).

@ribasushi ribasushi force-pushed the feat/carfile-import-only branch from ec08abe to a9c8a23 Compare April 8, 2020 21:09
This still works over "loosely defined" .car files
Please refer to the sharness tests for extra info

We can tighten this up if the sentiment is "Postel was wrong"
@ribasushi ribasushi force-pushed the feat/carfile-import-only branch from a9c8a23 to f1ecf33 Compare April 8, 2020 21:11
@ribasushi
Copy link
Contributor Author

Force-pushed to lose the 3mb of testdata from the commit chain. Addressing the other notes in separate commits shortly.

@ribasushi
Copy link
Contributor Author

@Stebalien the only new thing to review is eff4223

Everything else pushed is sharness cruft

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@Stebalien Stebalien merged commit 2d1d6cd into feat/carfile-export-only Apr 8, 2020
@ribasushi ribasushi deleted the feat/carfile-import-only branch April 8, 2020 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants