Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy deposit vanished #1404

Open
michalisFr opened this issue Sep 5, 2023 · 13 comments
Open

Proxy deposit vanished #1404

michalisFr opened this issue Sep 5, 2023 · 13 comments
Labels
I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@michalisFr
Copy link

There was a case were Account A had unbonding DOT. Account B was set up as Any proxy of Account A. Account B managed the withdrawal of unbonded funds and afterwards transferred all balance from A to B. However, when the process was completed the total balance of the two accounts was 20 DOT less. Apparently the proxy deposit vanished into thin air.

This spreadsheet details the balance of both accounts before and after each action.

Timeline:

  1. Account A has 723 DOT unbonding and 0.81 transferable. Free balance = 723.81 DOT. Account B has 70.60 DOT free (rows 1-2)
  2. Account B sends 19.63 DOT to Account A. This is meant to cover the deposit for the proxy (rows 5-6).
  3. A batch call sets up the proxy and does transfer_all to send any transferable balance from Account A to Account B (now the proxy). The balance transferred is 20.12 DOT (what was sent in the previous transaction plus the previously existing transferable minus fees and tip). The deposit is taken from the frozen amount since (apparently) they now can overlap. (rows 8-9)
  4. When the unbonding period ends, Account B issues a batch call to withdraw_unbonded, transfer_all, and remove_proxies (in that order). This is where it gets weird: As the event history shows there are events to withdraw 723 DOT (the frozen balance) but the transfer event only shows a transfer of 702.959 DOT (the free balance), and there are no events to unreserve the deposit, although the batch item apparently completed successful. And in fact, there's an event to kill the account along with the transfer event. (rows 17-18)

As indicated by these events and the history of the total balance of the two accounts, the proxy deposit seems to have vanished. It was never unreserved and the user has apparently lost 20.041 DOT. It stands to reason that if the user never set up the proxy, but made withdraw_unbonded and transfer_all directly from Account A, they'd have 794 DOT instead of 773 DOT.

So, can someone explain what happened here?

@github-actions github-actions bot added the I10-unconfirmed Issue might be valid, but it's not yet known. label Sep 5, 2023
@michalisFr
Copy link
Author

I could not reproduce the issue on Westend.

This is the equivalent of Account A and this of Account B.

As can be seen in the event history of the same batch call as above (withdraw_unbonded, transfer_all, remove_proxies) the unbonding funds (19.5 WND) are withdrawn, then the free balance (18.48795 WND) is sent to Account B, and finally the deposit is unreserved. Subsequently Account A sends that deposit to Account B.

This is what should have happened on Polkadot too, but it didn't.

@shawntabrizi
Copy link
Member

i do not yet have an exact answer, but there are hints as to what happened here.

It seems the only relevant extrinsic to look deeply at is: https://polkadot.subscan.io/extrinsic/17085482-2

my hunch is that the order of the batch was important, and the problem was that transfer_all was called before remove_proxies.

Transfer all only can calculate and attempt to transfer available balances from the free balance.

This was called before the proxy was removed, so that 20 DOT reserve was untransferable at the time, which is why only 702 DOT was transferred in the transfer_all. If the remove_proxies call was put first, then the balance would have been unreserved, and been part of the transfer_all. The only question left to answer is the condition that would lead to an account being reaped, and thus wiping the account before the reserved balance was empty.

My guess is the code just cleans up any reserved balance after the ED goes below zero, where there is no reference counters.

I will look, but this is the start of the issue.

@shawntabrizi
Copy link
Member

So what I said is correct, the transfer_all put the free balance to zero, even though they had a reserved balance, and that caused the account to be reaped, and the reserved balance to be lost.

This behavior is relatively new due to: paritytech/substrate#12951

And the comment at the top:

Requires Existential Deposit requirements to be made of the free balance, not the total balance.

So in this case, the free balance was the only consideration when reaping the account.

Some solutions / ideas for the problem:

  • the user could have adjusted the order of their operations, that being said, hard to blame them for this mistake.
  • we introduce reference counters into the proxy pallet, so that users with some reserved balance will have a reference counter, and thus will not be reaped when their balance goes below ED
  • we look at reserved balance during the reaping process and stop it (although this seems counter to the explicitly written behavior)

To me, best answer is to explicitly introduce a reference counter into the proxy pallet. It would require a storage migration, but nothing too crazy. The reference counter would be removed when they have no deposits / reserved balances in the pallet.

As for this user, it is not clear to me that this "lost" balance was correctly tracked in the Polkadot system. As in, my current feeling is that this balance is missing (or extra) from the total issuance of Polkadot. In general, because of the behavior I quoted above, it seems likely that it is possible that total issuance can get out of whack, whereas we previously assumed this was impossible.

I would say a simple follow up would be to try and calculate the total issuance of Polkadot and see if we notice a different from what we expect.

Let me know if you have any specific questions about what I found here.

@xlc
Copy link
Contributor

xlc commented Oct 17, 2023

The proxy deposit counting is always a mass. I don't know why people are ignoring all the previous issues until too late.

paritytech/substrate#8550
#237

@michalisFr
Copy link
Author

Thanks for looking into this @shawntabrizi! This does indeed shed some light as to what happened here, but I do have some questions:

  1. Adding a proxy does introduce a consumer reference to the proxied account. As a result (?), when there is simply a proxy, a transfer_all call doesn't reap the account. In this batch call transfer_all transferred the free balance and then remove_proxies unreserved the deposit, that remained in the account as free, as you would expect.

If we simply do a transfer_all from the proxy, then afterwards free == ED, even though keep_alive was set to false:
Screenshot 2023-10-17 at 2 01 24 PM

However, if there's also frozen balance when the proxy is created, then transfer_all leaves the account with free == frozen - reserved, that satisfies the condition free + reserved >= frozen, and apparently, the deposit overlaps with the frozen balance?

This is the account's in question state before that extrinsic:
Screenshot 2023-10-17 at 2 08 53 PM
and this is it before creating the proxy:
Screenshot 2023-10-17 at 2 40 28 PM

It's worth noting that the consumers didn't increase after adding the proxy. As mentioned, with just a proxy you have 1 consumer.

Also, not sure if it's relevant, but with the same setup on Westend, there are 3 consumers:
Screenshot 2023-10-17 at 1 12 53 PM
but you also have 3 consumers before the proxy creation:
Screenshot 2023-10-17 at 1 12 03 PM

So, my question is why does transfer_all behave differently when there is frozen balance (reaping the account) and when there isn't (leaving behind the ED), even though keep_alive was false in both cases?
Also, shouldn't adding the proxy while there's staked balance increase the consumers by 1, as it happens when there's no staked balance?

  1. Would have this been prevented if keep_alive was set to true in the transfer_all call?

  2. Why can't I reproduce this on Westend? Please see my second comment above

Finally, regarding the total issuance, I'll try to calculate it if I find some time (although tbh it might be a bit out of my league), but given that there's no event to account for that 20 DOT I think total issuance will be different from the calculated one.

@shawntabrizi
Copy link
Member

@michalisFr i see no code in the Proxy Pallet which introduces reference counters at any part of the proxy process:

https://github.com/paritytech/polkadot-sdk/blob/master/substrate/frame/proxy/src/lib.rs

So can you clarify what you mean by it DOES add a consumer ref? These ref counters are probably coming from the system pallet simply holding a balance.

I can respond to other comments after this.

@michalisFr
Copy link
Author

@shawntabrizi I meant that after you create the proxy there's a consumer reference that wasn't there before. 😅 After posting I checked the pallet and assumed the reference comes from try_mutate, but what you said clarified the difference for me, that this is not for the proxy itself but for the deposit.

So, is it that when there's frozen balance there's no additional consumer because the deposit overlaps with frozen and there's already a consumer for frozen? And when frozen is removed (from withdraw_unbonded in this case), the consumer is also removed even though there's still a deposit? If so, should that happen?

@shawntabrizi
Copy link
Member

shawntabrizi commented Oct 17, 2023

There should be a reference counter added when a user's balance is reserved by the proxy pallet, and that ref counter should be managed by the proxy pallet.

I believe that to be the ultimate issue here, leading to the problems.

This same logic should apply to any chain reserving a balance from the user, of which there might be more which do not handle it correctly.

@shawntabrizi
Copy link
Member

I have created an end to end test which reproduces the behavior, but requires that at some point the user's consumer reference counters decreased incorrectly:

Here is the test, it can be inserted into the Polkadot Runtime next to the other tests: https://github.com/polkadot-fellows/runtimes/blob/main/relay/polkadot/src/lib.rs#L2407

Show Test
#[test]
fn proxy_balance_e2e() {
	let mut t: sp_io::TestExternalities = frame_system::GenesisConfig::<Runtime>::default()
		.build_storage()
		.unwrap()
		.into();
	// set the minimum
	t.execute_with(|| {});

	use frame_support::assert_ok;

	// Alice will be the main user
	let alice = AccountId::from([1u8; 32]);
	// Bob will be a proxy of alice
	let bob = AccountId::from([2u8; 32]);

	// UNITS == 1 DOT
	let proxy_deposit = 20_041 * UNITS / 1_000;

	t.execute_with(|| {
		// user is initialized with some balance
		assert_ok!(Balances::force_set_balance(
			RuntimeOrigin::root(),
			alice.clone().into(),
			800 * UNITS
		));

		// user does a simple staking bond, locking some token
		assert_ok!(Staking::bond(
			RuntimeOrigin::signed(alice.clone()),
			600 * UNITS,
			pallet_staking::RewardDestination::default()
		));

		// sanity check the current state
		let alice_state = System::account(alice.clone());
		let expected_state = frame_system::AccountInfo {
			nonce: 0,
			consumers: 3,
			providers: 1,
			sufficients: 0,
			data: pallet_balances::AccountData {
				free: 800 * UNITS,
				reserved: 0 * UNITS,
				frozen: 600 * UNITS,
				flags: Default::default(),
			},
		};
		assert_eq!(alice_state, expected_state);

		// alice sends some balance to the proxy
		assert_ok!(Balances::transfer_keep_alive(
			RuntimeOrigin::signed(alice.clone()),
			bob.clone().into(),
			100 * UNITS
		));

		// alice sets bob as a proxy
		assert_ok!(Proxy::add_proxy(
			RuntimeOrigin::signed(alice.clone()),
			bob.clone().into(),
			ProxyType::Any,
			0
		));

		// sanity check the current state
		let alice_state = System::account(alice.clone());
		let expected_state = frame_system::AccountInfo {
			nonce: 0,
			consumers: 3,
			providers: 1,
			sufficients: 0,
			data: pallet_balances::AccountData {
				free: 800 * UNITS - 100 * UNITS - proxy_deposit,
				reserved: proxy_deposit,
				frozen: 600 * UNITS,
				flags: Default::default(),
			},
		};
		assert_eq!(alice_state, expected_state);

		// unbond the user
		assert_ok!(Staking::unbond(
			RuntimeOrigin::signed(alice.clone()),
			600 * UNITS
		));
		// trick forward to a future era
		pallet_staking::CurrentEra::<Runtime>::put(u32::MAX);

		// sanity check the current state
		let alice_state = System::account(alice.clone());
		let expected_state = frame_system::AccountInfo {
			nonce: 0,
			consumers: 3,
			providers: 1,
			sufficients: 0,
			data: pallet_balances::AccountData {
				free: 800 * UNITS - 100 * UNITS - proxy_deposit,
				reserved: proxy_deposit,
				frozen: 600 * UNITS,
				flags: Default::default(),
			},
		};
		assert_eq!(alice_state, expected_state);

		// force reduce a consumer, this is the source of the issue
		System::dec_consumers(&alice);

		// sanity check the current state
		let alice_state = System::account(alice.clone());
		let expected_state = frame_system::AccountInfo {
			nonce: 0,
			consumers: 2,
			providers: 1,
			sufficients: 0,
			data: pallet_balances::AccountData {
				free: 800 * UNITS - 100 * UNITS - proxy_deposit,
				reserved: proxy_deposit,
				frozen: 600 * UNITS,
				flags: Default::default(),
			},
		};
		assert_eq!(alice_state, expected_state);

		let withdraw_unbonded =
			pallet_staking::Call::<Runtime>::withdraw_unbonded { num_slashing_spans: 0 };
		let transfer_all = pallet_balances::Call::<Runtime>::transfer_all { dest: bob.clone().into(), keep_alive: false };
		let remove_proxies = pallet_proxy::Call::<Runtime>::remove_proxies {};

		// execute the batch call
		let batch_call: Vec<RuntimeCall> = vec![
			pallet_proxy::Call::<Runtime>::proxy {
				real: alice.clone().into(),
				force_proxy_type: Some(ProxyType::Any),
				call: Box::new(withdraw_unbonded.into()),
			}
			.into(),
			pallet_proxy::Call::<Runtime>::proxy {
				real: alice.clone().into(),
				force_proxy_type: Some(ProxyType::Any),
				call: Box::new(transfer_all.into()),
			}
			.into(),
			pallet_proxy::Call::<Runtime>::proxy {
				real: alice.clone().into(),
				force_proxy_type: Some(ProxyType::Any),
				call: Box::new(remove_proxies.into()),
			}
			.into(),
		];

		// batch call originates from bob
		assert_ok!(Utility::batch_all(RuntimeOrigin::signed(bob.clone()), batch_call));

		// sanity check the current state
		let alice_state = System::account(alice.clone());
		let expected_state = frame_system::AccountInfo {
			nonce: 0,
			consumers: 0,
			providers: 0,
			sufficients: 0,
			data: pallet_balances::AccountData {
				free: 0 * UNITS,
				reserved: 0 * UNITS,
				frozen: 0 * UNITS,
				flags: Default::default(),
			},
		};
		assert_eq!(alice_state, expected_state);

		let bob_state = System::account(bob.clone());
		let expected_state = frame_system::AccountInfo {
			nonce: 0,
			consumers: 0,
			providers: 1,
			sufficients: 0,
			data: pallet_balances::AccountData {
				free: 800 * UNITS - proxy_deposit,
				reserved: 0 * UNITS,
				frozen: 0 * UNITS,
				flags: Default::default(),
			},
		};
		assert_eq!(bob_state, expected_state);

		// The total issuance is incorrect?
		assert_eq!(Balances::total_issuance(), 800 * UNITS);
	});
}

It can be run with: cargo test -p polkadot-runtime proxy_balance_e2e

Note, in order to reproduce the behavior, I have to insert into the test the following line:

// force reduce a consumer, this is the source of the issue
System::dec_consumers(&alice);

Only then, was the behavior as it was on Polkadot.

I was triggered to do this because the user had the following state:

User: 14MLAat6uQJNHaboXnqtLhzwiQLqRbzueJdaHWLjeZi2nHuS
Block: 0xc28b8e4c971cc1d2ee8a0cb1c213ab50c1dd214edfc196f14acd677725d09296

{
  nonce: 43
  consumers: 2
  providers: 1
  sufficients: 0
  data: {
    free: 7,238,107,222,238
    reserved: 0
    frozen: 7,230,000,000,000
    flags: 170,141,183,460,469,231,731,687,310,945,884,105,728
  }
}

Where you will note they only had 2 consumer reference counters.

However, as you can see in my test, when setting up staking + proxy for the user, 3 reference counters existed on the account.

When running the batch call with the user having 3 consumer counters, there are no errors in the behavior. All funds are accounted for.

However, with only 2 consumer counters, we get the error we saw on Polkadot.

The test can be modified to see the different behaviors.

It is not clear what caused the user to have only 2 consumer reference counters. This is something that you would need to hunt for on-chain, but this is the root of the issue.

@shawntabrizi
Copy link
Member

shawntabrizi commented Oct 18, 2023

Note also, the total issuance is also incorrect here, and it seems in general total issuance could be wrong whenever a user's account is dusted by frame_system, since these values are set to zero, but the balances pallet is not notified of that.

@shawntabrizi
Copy link
Member

The user had only 2 reference counters since over 600 days ago on block hash: 0x77f30d60799b2623aec1a84b955d59e8e00d528da3a6179c2d1d59a99f09fa41

{
  nonce: 10
  consumers: 2
  providers: 1
  sufficients: 0
  data: {
    free: 2,284,268,159,977
    reserved: 0
    miscFrozen: 2,250,000,000,000
    feeFrozen: 2,250,000,000,000
  }
}

So likely, there was some runtime change which introduced additional consumer counters, and this user was not migrated to it... or something else. I will let you hunt for the real reason.

@michalisFr
Copy link
Author

Thanks for looking into this @shawntabrizi! Glad we could finally figure out what happened and also glad this only happens under specific conditions and it's not a generalised situation.

I also checked another case a few weeks prior to this one where the same thing happened, and that account also had 2 consumers before this extrinsic.

I'll try to track down when that consumer was removed and post here if I find anything. But the total issuance issue needs to be looked at too I guess.

A couple more questions to wrap this up (for now), if you don't mind:

  1. Having the proxy pallet explicitly add a consumer would prevent this from happening, even if the consumers were less than they were supposed to be for some other reason, correct?
  2. Setting the keep_alive flag to true in transfer_all would also prevent reaping the account, because it would keep the provider, correct?

@shawntabrizi
Copy link
Member

In this case, having the Proxy Pallet add a consumer reference counter would not have helped because the account would still have one fewer counters than needed. It does seem to be some kind of migration issue, or lack of migration to handle the correct number of reference counters on accounts with this combination of locks and freezes.

If the user set the keep_alive = true then this would not be a problem, and the account would still be alive and got their deposit back.

I tested this by just updating the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
Status: Backlog
Development

No branches or pull requests

3 participants