Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change applyBlocks to also accept BlockSummary #3172

Merged
merged 4 commits into from
Mar 21, 2022

Conversation

HeinrichApfelmus
Copy link
Contributor

Issue number

ADP-1422, ADP-1428

Overview

Light-mode (Epic ADP-1422) aims to make synchronisation to the blockchain faster by trusting an off-chain source of aggregated blockchain data.

In this pull request, we change the applyBlocks function to work with both a list of Block and a BlockSummary. Specifically, we introduce a function applyBlockData which performs the following steps:

  • discoverFromBlockData: discover addresses and and transaction data (ChainEvents) contained in the sequence of blocks
  • applyBlockEventsToUTxO: update the wallet UTxO from the given ChainEvents.

Details

  • Property tests for discoverFromBlockData check that address discovery gives the same result for both a List of blocks and a Summary.
  • When processing a List of Block, we have to be a bit careful about performance, because we need to traverse ~ 32M transactions (as of Feb 2022). In this case, the transactions contained in BlockEvents are essentially a full copy of the transactions in Block. The best way to copy data is to copy a single reference to the data; for this purpose we introduce a Sublist type which avoids deconstructing and reconstructing the transaction list in the case where we do not filter transactions (All).
  • In order to preserve the current checkpointing mechanism, applyBlocks actually calls applyBlockData repeatedly instead of passing the full list of blocks. I will need to rethink the checkpointing mechanism in the future — the main issue is that in its current form, a consumer of BlockSummary cannot ask for a specific block height in order to set a suitable checkpoint there. In the future, the checkpointing logic will need to mediate between producer and consumer.

Comments

  • I expect a slight regression in benchmarks, due to overhead involved in calling applyBlockData repeatedly. However, the introduction of Sublist should prevent a serious deterioration.

* Construct `ChainEvents` from `NonEmpty Block`.
* Test that `discoverFromBlockData` yields the same address discovery state on `List` and on `Summary`, at least for the mock type `WalletState`.
…to work with `BlockEvents` instead of `Block`.
`Sublist` = a sublist of a list, with an optimized case for the full list.
@HeinrichApfelmus HeinrichApfelmus force-pushed the HeinrichApfelmus/ADP-1422/applyBlockEvents branch from 1d892d1 to 278b9a8 Compare March 15, 2022 13:31
@HeinrichApfelmus
Copy link
Contributor Author

bors try

iohk-bors bot added a commit that referenced this pull request Mar 15, 2022
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Mar 15, 2022

try

Build failed:

-- The main purpose of this data type is optimization:
-- When processing whole 'Block', we want to avoid copying
-- and redecorating the entire list of transactions in that 'Block';
-- instead, we want to copy a pointer to this list.
Copy link
Contributor

@paweljakubas paweljakubas Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the intent: when All then we process all content, then Some then only those pointed with ix (being map fst content), right?

Copy link
Contributor Author

@HeinrichApfelmus HeinrichApfelmus Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though the focus is on the producer here: fromEntireBlock uses All and avoids redecorating. An external blockchain source will return Some with specific indices.

nullBlockEvents BlockEvents{transactions,delegations}
= null transactions && null delegations

-- | Merge two 'Sublist' assuming that they are sublists of the /same/ list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, when sublists of the same list are considered then this is perfectly true. The question is can we outlaw calling this function with sublists of different lists? Maybe it would be worth adding some invariant avalilable for export here checking this? Possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered the same thing, but I don't see how to do that in this context: We would need a run-time proof that the transaction data returned by an external blockchain data source indeed belongs to the same block; but that is out of reach and it's easier to postulate it using unsafeMkSublist.

In other contexts, one could enforce uniqueness by using an ST-like existential type parameter:

wholeList :: forall a. [a] -> exists s. Sublist s a
filterSublist :: forall s a. (a -> Bool) -> Sublist s a -> Sublist s a
mergeSublist :: forall s a. Sublist s a -> Sublist s a -> Sublist s a

In other words, the type parameter s would be a representing of the list that the sublist is a sublist of. But we'd be quickly walking into dependent types territory here, and this does not solve the issue in our context where the whole list s is only known on a remote machine.

As Sublist is a small gimmick used for optimization only, I opted to keep it simple and "unsafe" as it is.

mergeSublist :: Sublist a -> Sublist a -> Sublist a
mergeSublist (All xs) _ = All xs -- result cannot be larger
mergeSublist _ (All ys) = All ys
mergeSublist (Some xs) (Some ys) = Some $ mergeOn fst const xs ys

-- | Merge block events that belong to the same block.
mergeSameBlock :: BlockEvents -> BlockEvents -> BlockEvents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, here it is apparent that sublist are from the same list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'm not too happy about exporting mergeSublist in the first place, but the export is necessary for testing.

, transactions = []
, delegations = []
}
fromFiltered (fblock:_) = fblock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are sure fblock is NonEmpty, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm? fblock is a single block. As for the list, I think it's indeed true that the argument to fromFiltered will be nonempty when applied to the result of the call to applyBlockData, but that is not obvious 🤔 — that is why I have opted to cover the case of [] regardless.

pure (filteredBlocks, (dw, Wallet u1 tip1 s1))

-- | Strict variant of 'mapAccumL'.
mapAccumL' :: (s -> a -> (o,s)) -> s -> [a] -> ([o],s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I see something like this I always would like to make sure that even irrespective of bang patterns applied we have no thunk build up. I wonder if we should not start using nothunk - some reading for inspiration https://well-typed.com/blog/2020/09/nothunks/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. In other parts of our code, nothunks is used, but not here yet. I do want to investigate the space usage of Wallet and WalletState more systematically at some point, but not right now — things are working out ok at the moment, so I figured that I could away with just making sure that I don't accidentally make things worse for the time being.

describe "Sublist" $ do
it "merging is idempotent" $
property prop_idempotent
it "merging has whole list as neutral element" $
Copy link
Contributor

@paweljakubas paweljakubas Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have I=All and A=Some then mergeSublist I A = I is I neutral element really?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're right, the nomenclature "neutral element" is not correct. 🙈 The correct name is "absorbing element".

, (===)
)

import Data.Map.Strict as Map

spec :: Spec
spec = do
describe "Sublist" $ do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also distinct the following properties:

  1. for any two sublists of the same list the result of merge is sublist that contains both sublists, is union sublist.
  2. merging is commutative

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! There's not much that can go wrong in this code, but these laws are aesthetically pleasing. 😊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to write that property 2 follows from property 1 because set union is commutative, but property 2 is slightly stronger due to the order of elements. 🤔

{-------------------------------------------------------------------------------
Generators
-------------------------------------------------------------------------------}
genSublist :: [a] -> Gen (Sublist a)
genSublist xs = unsafeMkSublist <$> sublistOf (zip [0..] xs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sublistOf is from Quickcheck, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

discoverState (Summary discover summary) s
=== discoverState (List blocks) s
where
-- TODO: Test sequence of blocks with more than one block.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, would be great to expand this property in this direction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, nice, that's an older comment. In another pull request, I had started setting up this ApplyBlocks type for generating lists of blocks; I'll plug that in now.

Copy link
Contributor

@paweljakubas paweljakubas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the step should not be invasive and impact negatively the current block processing. I like Sublist abstraction and intention behind it that utilizes lazyness, and look inside the list contents selectively (if I understood intent correctly). Left some pointers/questions.

@HeinrichApfelmus
Copy link
Contributor Author

Thanks Pawel! 😊

Sublist does not use laziness, though, it relies on sharing of data structures, which is enabled by immutability / purity. The point is that fromEntireBlock can use

transactions = All transactions

which copies a pointer to the list of transactions. The previous code

transactions = zip [0..] transactions

deconstructs the list of transactions, decorates each list element with its index, and constructs a new list — this is way more expensive, and since the transaction processing is part of a tight loop, of relevance.

Co-authored-by: paweljakubas <pawel.jakubas@iohk.io>
@piotr-iohk
Copy link
Contributor

Started benchmarks against this branch -> https://buildkite.com/input-output-hk/cardano-wallet-nightly/builds/1289

@HeinrichApfelmus
Copy link
Contributor Author

Benchmarks look entirely neutral. 👍

@HeinrichApfelmus
Copy link
Contributor Author

bors merge

iohk-bors bot added a commit that referenced this pull request Mar 21, 2022
3172: Change `applyBlocks` to also accept `BlockSummary` r=HeinrichApfelmus a=HeinrichApfelmus

### Issue number

ADP-1422, ADP-1428

### Overview

[Light-mode][] (Epic ADP-1422) aims to make synchronisation to the blockchain faster by trusting an off-chain source of aggregated blockchain data. 

  [light-mode]: https://input-output-hk.github.io/cardano-wallet/design/specs/light-mode

In this pull request, we change the `applyBlocks` function to work with both a list of `Block` and a `BlockSummary`. Specifically, we introduce a function `applyBlockData` which performs the following steps:

* `discoverFromBlockData`: discover addresses and and transaction data (`ChainEvents`) contained in the sequence of blocks
* `applyBlockEventsToUTxO`: update the wallet UTxO from the given `ChainEvents`.

### Details

* Property tests for `discoverFromBlockData` check that address discovery gives the same result for both a `List` of blocks and a `Summary`.
* When processing a `List` of `Block`, we have to be a bit careful about performance, because we need to traverse ~ 32M transactions (as of Feb 2022). In this case, the `transactions` contained in `BlockEvents` are essentially a full copy of the `transactions` in `Block`. The best way to copy data is to copy a single reference to the data; for this purpose we introduce a `Sublist` type which avoids deconstructing and reconstructing the transaction list in the case where we do not filter transactions (`All`).
* In order to preserve the current checkpointing mechanism, `applyBlocks` actually calls `applyBlockData` repeatedly instead of passing the full list of blocks. I will need to rethink the checkpointing mechanism in the future — the main issue is that in its current form, a consumer of `BlockSummary` cannot ask for a specific block height in order to set a suitable checkpoint there. In the future, the checkpointing logic will need to mediate between producer and consumer.

### Comments

* I expect a slight regression in benchmarks, due to overhead involved in calling `applyBlockData` repeatedly. However, the introduction of `Sublist` should prevent a serious deterioration.

Co-authored-by: Heinrich Apfelmus <heinrich.apfelmus@iohk.io>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Mar 21, 2022

Build failed:

@HeinrichApfelmus
Copy link
Contributor Author

bors retry

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Mar 21, 2022

Build succeeded:

@iohk-bors iohk-bors bot merged commit 46772e1 into master Mar 21, 2022
@iohk-bors iohk-bors bot deleted the HeinrichApfelmus/ADP-1422/applyBlockEvents branch March 21, 2022 14:04
WilliamKingNoel-Bot pushed a commit that referenced this pull request Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants