Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Wallet synchronization] Introduce BlockDownloadingService #12184

Conversation

kiminuo
Copy link
Contributor

@kiminuo kiminuo commented Jan 4, 2024

Depends on #12236
Part of #10014

The PR adds BlockDownloadService (BDS) that

  • can download blocks that queues blocks and downloads them in parallel,
  • BDS is aware of block repository file-system cache so it fetches requested blocks from there if possible.

note: BDS is intended to power #12148 (or any other approach).

@kiminuo kiminuo changed the title Introduce ParallelBlockDownloadingService [Wallet synchronization] Introduce ParallelBlockDownloadingService Jan 4, 2024
@kiminuo kiminuo requested a review from turbolay January 4, 2024 12:44
@kiminuo kiminuo force-pushed the feature/2024-01-04-ParallelBlockDownloadingService branch from 70743c5 to 455e3e4 Compare January 4, 2024 12:47
Copy link
Contributor

@molnard molnard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstraction LGTM - before this gets to a real PR I would like to see how this will fit into the final goal. Maybe merging back into turbolay's PR if that makes sense.

Copy link
Collaborator

@turbolay turbolay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge cACK, really good code.
I didn't find problem with the logic, I can see that usage will be great.
Sorry for taking long time to review!

/// Guarded by <see cref="Lock"/>.
/// <para>Internal for testing purposes.</para>
/// </remarks>
internal PriorityQueue<Request, uint> BlocksToDownload { get; } = new(Comparer<uint>.Default); // TODO: Turbo requests should have precedence over non-turbo.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Turbo requests should have precedence over non-turbo.

Given how your queue works, this is required, otherwise you might push Turbo all the time, because when we are in this case it means that the wallet is opened and we are doing the second synchronization in the background. You can use the priority used in WalletFilterProcessor, which should be almost the same

{
if (result.Attempts >= MaxFailedAttempts)
{
Logger.LogInfo($"Attempt to download block {result.BlockHash} (height: {result.BlockHeight}) failed {MaxFailedAttempts} times. Dropping the request.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it makes sense to have this retry mechanism, which as stated in my previous comment, I'm not sure. Is failure to get a block really a LogInfo? In the usage of this code, what would really happen if we fail to get a block here? Wallet state broken? Or do you defer the responsibility of throwing/logging error to the consumer component?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogInfo is probably a bit too much I just find it useful for debugging (no need to change log level) but yes, you are right.

So currently the idea is that PBDS is like a prefetching service and does that on best effort basis. Meaning, it would not be an issue not to download a block. But ideas here are welcome - nothing is set to stone at this point.

Wallet state broken?

No. I don't think that this should happen.

Or do you defer the responsibility of throwing/logging error to the consumer component?

Yes. As of now. That's the idea. But it's a subject to change based on new ideas. So yes but can change if there is a better way.

@turbolay
Copy link
Collaborator

turbolay commented Jan 10, 2024

I made turbolay@3a60c17

It's a merge with my PoC PR with PreProcessing support, removing the attempts, using the correct Priority in the PriorityQueue and general improvements. I commented very well, maybe too much but I wanted to be sure the logic will be understood

The main changes are in files BlockDownloadService.cs, WalletFilterProcessor.cs. Rest is mainly integration, and in the BlockDownloadServiceTests.cs I removed the test related to the attempts.

It's untested and in general has to be improved (it doesn't handle reorgs well, misses cancellation tokens, some locks, P2P nodes management etc...) but the logic should be correct. I really like the code btw I think it's clear and quite good. I think it could be a good base to finish this PR and continue working on the feature.

@kiminuo kiminuo changed the title [Wallet synchronization] Introduce ParallelBlockDownloadingService [Wallet synchronization] Introduce BlockDownloadingService Jan 11, 2024
@kiminuo kiminuo force-pushed the feature/2024-01-04-ParallelBlockDownloadingService branch from 168b94f to 96c9553 Compare January 12, 2024 09:40
@kiminuo
Copy link
Contributor Author

kiminuo commented Jan 23, 2024

@turbolay I removed MaxAttempts.

During this work I discovered quite an ugly case when Task<RequestResponse> task = await Task.WhenAny(activeTasks).ConfigureAwait(false); in BDS.ExecuteAsync does not take into account new requests (so if it is waiting for 1 task to finish, it does not take into account that there might new jobs to do that can be started). That's an imperfection that is probably still OK in the real world, but it's really a nuisance for testing. I have been trying to fix it for quite some time but I'm not happy with any solution I found so far... so hopefully I'll get an idea later on.

@kiminuo
Copy link
Contributor Author

kiminuo commented Jan 24, 2024

@turbolay Added NoSuchProviderResult so that https://github.com/kiminuo/WalletWasabi/pull/1/files#r1463832936 if there is no P2P provider, then GetBlockNoCacheAsync can escape from while (true).

@kiminuo
Copy link
Contributor Author

kiminuo commented Jan 24, 2024

@turbolay Would it make sense to merge this PR even without your integration because then:

WDYT?

@turbolay
Copy link
Collaborator

@kiminuo My bad, I forgot you were not in the meetings and I had to provide you more context. I was asked to do this:

Maybe we could make a full PoC integration prior to merging?

So problems with the API could be caught beforehand. This is why I created kiminuo#1, it was not meant to be merged at the same time.

Can you fix the test?

@kiminuo
Copy link
Contributor Author

kiminuo commented Jan 24, 2024

Can you fix the test?

Fixed

@kiminuo
Copy link
Contributor Author

kiminuo commented Jan 25, 2024

45be3e3 fixes #12184 (comment). At least I hope so. I modified the test to the state that previously was causing issues so it looks OK now.

Copy link
Collaborator

@turbolay turbolay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK, code LGTM
Good job with the PR, but especially with 45be3e3

@turbolay turbolay requested a review from molnard January 26, 2024 06:03
@kiminuo
Copy link
Contributor Author

kiminuo commented Jan 26, 2024

@turbolay

btw: I tried that integration test here and it finished in

  • (something like) 1.5 second with BDS [maxParallelTasks=5]
  • (something like) 6.5 seconds with BDS [maxParallelTasks=1]

It was downloaded from a single peer but that in my mind make it even more impressive because my expectation was that the improvement would come from downloading in parallel from multiple P2P nodes.

(Obviously, not statistically correct measurement... but anyway)

Copy link
Contributor

@molnard molnard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cACK. The diff is huge and introduces many changes. I am not able to review it thoroughly. I am open to suggestions, on what next.

/// <param name="cancellationToken">The cancellation token.</param>
public async Task RemoveAsync(uint256 hash, CancellationToken cancellationToken)
/// <inheritdoc/>
public async Task RemoveAsync(uint256 blockHash, CancellationToken cancellationToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to avoid increasing the diff with renamings in the future. If you want to do it, create a separate PR just for this change. 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why was any of the changes in this file was necessary?
  • Why did you change the order of the usings?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We want more control over the nodes, to move the logic of disconnection out of this class etc. In general, I believe that this file might drastically change or disappear. Making ConnectedNodes.Count accessible outside of the class is a first step to decide which nodes to disconnect

  • The order of the using was incorrect, CF requests usings to be alphabetical order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one line: public uint ConnectedNodesCount => (uint)Nodes.ConnectedNodes.Count;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no CF issue generated by the order of the using. #12327

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested to do it: #12184 (comment)
If you are not happy with this change can you please revert df095cc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://documentation.help/StyleCop/SA1210.html
It's off for us, that's why CF didn't raise issue with it. His IDE might be configure to change that automatically (mine is).
Please revert


if (response.Result is SuccessResult)
{
_ = request.Tcs.TrySetResult(response.Result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ = request.Tcs.TrySetResult(response.Result);
request.Tcs.TrySetResult(response.Result);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? _ = is good to emphasize that we are not using the result. It's a global standard, and we are using this all throughout codebase, even in recent PRs.

Copy link
Contributor

@molnard molnard Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is something new.
NACK. It looks annoying. Do this only to suppress when you have warnings to use the return value. Please revert.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert yourself, I don't have the rights and Kimi is not here.

Copy link
Collaborator

@turbolay turbolay Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this form is a standard in C# and is used more or less 30 times in the codebase, excluding tests, I don't really understand why the fight for it now.

You even used it yourself recently:

https://github.com/zkSNACKs/WalletWasabi/blob/6a2de0aca907d54a64bef6a8d202e4ccc1f78b97/WalletWasabi/Wallets/WalletFilterProcessor.cs#L195

{
while (BlocksToDownload.TryDequeue(out Request? request, out _))
{
_ = request.Tcs.TrySetResult(CanceledResult.Instance);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ = request.Tcs.TrySetResult(CanceledResult.Instance);
request.Tcs.TrySetResult(CanceledResult.Instance);


for (int i = 0; i < toStart; i++)
{
// Dequeue does not provide priority value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't understand this comment. It provides, it is there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a leftover, forgot to be removed from #12184 (comment).

for (int i = 0; i < toStart; i++)
{
// Dequeue does not provide priority value.
if (!BlocksToDownload.TryDequeue(out Request? queuedRequest, out Priority? _))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!BlocksToDownload.TryDequeue(out Request? queuedRequest, out Priority? _))
if (!BlocksToDownload.TryDequeue(out Request? queuedRequest, out _))

catch (Exception ex)
{
Logger.LogError(ex);
throw;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means, game over. What happens with the wallet after this? How does this scenario look like?

  • When loading?
  • When opened?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot happen. Only UnreachableException can be thrown from inside this trycatch block. Probably Exception should be changed to UnreachableException, or the clause removed altogether

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, is it OK to stop in this case?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no case. UnreachableException means that it's... not reachable. It is made to emphasize that the case can't happen, or that a modification was wrong if it triggers the exception.
You can catch UnreachableException if you want, or remove the clause.

{
if (TrustedFullNodeBlockProviders.Length == 0)
{
return new RequestResponse(request, NoSuchProviderResult.Instance);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. The plan is to introduce some kind of new behavior? What happens if I set my local node as source and it cannot provides? We are not falling back to random nodes anymore?

Copy link
Collaborator

@turbolay turbolay Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kiminuo#1 (comment)

And the implementation as integration made here:
https://github.com/zkSNACKs/WalletWasabi/pull/12148/files#diff-a3e12639ae8c2e59db996ca151cbe714473c840c9bfa4f6da1d8f30508ab1d6eR334

Basically, we want for that the caller to provide its own method to choose the source from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, same as what SmartBlockProvider is currently doing

Copy link
Contributor

@molnard molnard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turbolay can you create another PR that you can touch and finish this? Kimi won't be here for a week or more AFAIK

@kiminuo
Copy link
Contributor Author

kiminuo commented Feb 10, 2024

I added changes from #12328 here except the following commits:

because I don't think those changes are good. Adding static property public static TerminateService? Instance { get; private set; }:

  • adds a circular dependency in the sense that BDS depends on TerminateService but TS is a higher level component.
  • static just adds tech debt1
  • writing a unit test for the change would show that it's bad idea but none was added.

We should just let the UnreachableException just bubble up and trigger launching of the crash reporter.

Or alternatively we can just call [Environment.Exit(Int32)](https://learn.microsoft.com/en-us/dotnet/api/system.environment.exit?view=net-8.0) and end without crash reporter to work around the issue at the moment.

Continuing after UnreachableException can be dangerous, we should really crash as fast as possible because UnreachableException means "catastrophic error" and even things like saving settings can break the app in unexpected ways.

tl;dr: UnreachableException means reaching state that should never occur and if it does then we have a serious bug in code. Testing before release should uncover these bugs.

Footnotes

  1. We have some statics in code already and it's very costly to fix them (speed of refactoring efforts to remove them is ~0, so ...).

Copy link
Contributor

@molnard molnard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. I agree to merge with the promise of adding some effective meaning of that throw there - whatever it will be.

@kiminuo kiminuo merged commit 7452bad into WalletWasabi:master Feb 17, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants