Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of FIP-0074: Remove cron-based automatic deal settlement #805

Merged
merged 8 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
260 changes: 260 additions & 0 deletions FIPS/fip-0074.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,260 @@
---
fip: "0074"
title: Remove cron-based automatic deal settlement
author: "Alex North (@anorth), Alex Su (@alexytsu)"
discussions-to: https://github.com/filecoin-project/FIPs/discussions/800
status: Draft
type: Technical
category: Core
created: 2023-08-24
---

## Simple Summary
Add a method to the built-in market actor to allow storage providers to settle deal payments manually.
Stop performing automatic deal settlement in the built-in market actor's cron handler.
This removes risk that cron-based deal settlement work will overtake the chain's processing capacity.

## Abstract
Filecoin tipset execution includes a facility for scheduled execution of actor code at the end of every epoch.
This "cron" activity has a real validation cost, which can be accounted in gas units, but is not paid for by any party
(no tokens are burnt as a gas fee).

The built-in storage market actor uses cron at every epoch to perform maintenance
(mainly the settlement of payments) for active deals.
Since this work is "free" to the deal providers and clients, it is essentially a subsidised service.
The cost of this level of service is unsustainable, and threatens chain progress if utilisation grows.
Similar subsidised service cannot be made available to user-programmed smart contracts,
reducing their competitiveness as storage applications or markets.

This proposal instead provides a method for storage providers to settle deal payments manually,
and disables the automatic settlement for new deals.
Existing deals continue to be settled automatically until they expire.

## Change Motivation
A fast and predictable tipset validation time is important to the ability of validating nodes
to sync the chain quickly (especially when catching up),
and critical for block producers to be able to produce new blocks for timely inclusion.
Cron currently accounts for more than half of the computational cost of tipset validation,
and the built-in market actor accounts for a significant proportion of that.
Although there is a significant buffer in the target tipset validation times to account for
network delays and the variability of expected consensus,
extended cron execution can threaten chain quality and increase minimum hardware specs for validators.

Cron is very convenient for some maintenance operations,
but is essentially a subsidy from the network to whichever actors get the free execution (which is only ever built-in actors).
Subsidised, automatic daily settlement of deals is far from the kind of critical network service for which cron was intended,
and of zero value for the 98% of deals today that have zero fee.

The number of deals brokered by the built-in market has increased greatly over the past year.
There are currently about 45 million active deals (~520 deals per epoch), a number which is growing quickly.

The processing cost of this maintenance nearly caused disaster prior to FIP-0060,
which extended each deal's update period from 24 hours to 30 days.
However, this change only bought some time to design a more permanent solution.
There remains significant risk that this processing could grow beyond the ability of the chain to safely execute it.

### Impact of direct data onboarding
A concurrent proposal is to support skipping the built-in storage market actor for some data onboarding,
anorth marked this conversation as resolved.
Show resolved Hide resolved
which provides a cost reduction for arrangements that don’t need on-chain payments.
As this capability is implemented and adopted, the number of built-in market deals might decrease
even as total data onboarding increases.
If and when adopted, this would mitigate the risk significantly.
However, this benefit depends on participants' adoption of the new onboarding methods, which is uncertain.
The built-in market actor might in any case be host a large number of payment-bearing deals.
Thus, direct data onboarding cannot be relied upon in either the near- or long-term as a final solution.

## Specification
Today, at the end of each epoch, the market actor loads all deals that are scheduled for maintenance.
For each deal, the function:
- removes state for any unactivated deals that have timed out (“expired”);
- settles incremental payments for activated deals; and
- settles payments/refunds/penalties and removes state for any completed or terminated deals.

This proposal is to provide a new, explicit user message to settle payments,
and provide an explicit method to settle deals.
Cron is still used to time out un-activated deals.

### Explicit deal settlement
A new `SettleDealPayments` method is added to the built-in storage market actor.
This may be invoked by a storage provider (or anyone on their behalf) to settle
incremental payments for a specified list of deals.
Settling the final payment for a completed deal also removes the deal metadata from state.

```
// Note transparent serialization for single-item struct.
struct SettleDealPaymentsParams {
Deals: Bitfield,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use upper camel casing for fields instead of the Rust convention, since this seems to be Rust pseudo-code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not Rust, just pseudocode. I've used camel casing (and []array) for all my FIPs.

}

struct SettleDealPaymentsReturn {
// Indicators of success or failure for each deal.
Results: {
SuccessCount: u32,
FailCodes: []{
Index: u32,
ExitCode: ExitCode,
},
}
// Results for those deals that successfully settled.
Settlements: []DealSettlementSummary,
}

struct DealSettlementSummary {
// Incremental amount paid to the provider.
Payment: TokenAmount,
// Whether the deal has settled for the final time.
Completed: bool,
}
```

- If the caller specifies a deal which has never existed, settlement fails with the `USR_NOT_FOUND` code.
- If the caller specifies a deal which has not yet activated nor expired, settlement succeeds but has no effect.
- If the caller specifies a deal which has expired (not activated in time) or already completed,
settlement fails with the `EX_DEAL_EXPIRED` exit code (regardless of whether cron has removed the deal proposal from state).
- It is an error to specify a deal that has been marked for termination but not yet cleaned up by cron,
and doing so will result in an abort with `USR_ILLEGAL_ARGUMENT`.
This situation is only transiently possible in the period immediately after this proposal is deployed.

### Immediate clean-up on termination
The existing built-in market actor `OnMinerSectorsTerminate` method is changed to
immediately make a final payment settlement, charge penalties, and clean up state.
It behaves as if `SettleDealPayments` was invoked immediately after marking deals as terminated.
This replaces the current technique of merely recording the termination epoch and then
waiting for cron to do the work.

The existing `GetDealActivation` method result includes a field for termination epoch.
This field is (already) an unreliable source of termination information.
After a deal is completed, there is no information on-chain to distinguish successfully completed deals
from those terminated early, and `GetDealActivation` will fail.
Immediate clean-up on termination will render this field useless.
To avoid confusion, `GetDealActivation` is changed to always fail with `EX_DEAL_EXPIRED`
for expired, completed, or terminated deals, regardless of whether the state has been cleaned up.

### No automatic settlement for new deals
When a deal is first published, it is added to the cron queue after its start epoch.
If it is not activated prior to that epoch, it is expired and cleaned up in cron.
This remains as it works today.

However, when the cron handler finds deal to have been newly-activated (i.e. has not been settled previously),
it is _not_ re-scheduled for subsequent automatic settlement.
Future settlements must be manual.

### Continued automatic settlement for existing deals
Deals that are already activated prior to this change continue to be settled automatically in cron.
Such deals may also be settled manually by the new `SettleDealPayments` method,
which will compute incremental payments since the automatic or manual settlement.

### Transition
This proposal changes termination processing from deferred to immediate.
For 30 days this new code is deployed, it will be possible for a deal to have been marked as terminated
by the old code, but not yet cleaned up by the scheduled cron handler.

Deals that are marked as terminated are not eligible for manual settlement.
Specifying a terminated deal will cause `SettleDealPayments` to abort with `USR_ILLEGAL_ARGUMENT`.

## Design Rationale
An explicit settlement message aligns the gas cost of settling payments with the party deriving the benefit.
It also bounds the cost, like most other state transitions, by the block gas limit.
Providers might settle payments quite infrequently,
reflecting the relatively low utility of doing so, and will prefer periods of low gas demand.
Less frequent updates result in a lower system-wide amount of computation performed.

Storage providers are not expected to invoke settlement for deals that carry no payment.
The metadata for such newly-created but no-payment deals will thus remain in state indefinitely.
However, with direct data onboarding, there is little benefit and some cost to
involving the built-in market actor in new unpaid deals.
Relatively few additional unpaid deals might be accumulated after that is possible.
Stale blockchain state poses a much lesser risk to blockchain execution than continual processing.
If a large amount does accumulate, it could be dealt with independently.

Since no _new_ deals are scheduled for automatic settlement,
the cron cost of settlement will gradually decline until there are no active deals scheduled.
At that point, a future simplification can remove the code for automatic payment settlement from the cron handler.
The scheme proposed involves no state migration.

Since sector termination may be invoked by the miner actor’s cron handler,
this will sometimes still do settlement work in cron.
This may be resolved by future changes to reduce miner actor's use of cron for terminations.
Explicit sector termination by the SP will account for the cost of settlement to their gas usage.

### Alternatives
One possible extension to manual settlement is to provide gas refunds to callers.
Gas used for these calls would still contribute toward the block gas limit (and so be bounded),
but charge only a fraction of the usage to the caller.
This was rejected since it does not align the cost of settlement with the party deriving the benefit,
charging it instead to users of all other blockchain computation by raising their gas price.
With greatly reduced gas cost, providers might settle payments more frequently than is actually valuable to them,
wasting chain capacity, and could even abuse it to push the base fee higher than any natural market level.
Network-level gas refunds would also be impossible to provide to future user-programmed smart markets,
creating an artificial barrier to their development and adoption.

## Backwards Compatibility
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peer review: should mention impact on Filecoin market implementations. They now need to offer functionality to manually trigger settlement, whereas previously they could rely on the network doing that for them automatically. This implies downstream functional change. cc @dirkmc @nonsense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence does not that SPs who host deals need to change, I'll expand it a little to point out that this implies software changes.

This proposal does not remove or change any exported APIs, nor change any state schemas.
It is backwards compatible for all callers.
Since it requires a change to actor code, a network upgrade is required to deploy it.

Immediate clean-up on termination means that `GetDealActivated` will never return a termination epoch for a deal.
After a deal is terminated, the method will instead abort with `EX_DEAL_EXPIRED`.
This is already the behaviour today when a deal has been cleaned up by cron,
but will become the only possible behaviour.

The proposal does remove a subsidised function that some parties may have relied upon.
SPs who host paid-for deals with the built-in market actor must now settle payments manually,
so this change requires downstream software changes for SP operators.

## Test Cases
To be provided with implementation.

- Settlement is no-op for un-activated, expired, terminated, completed, or non-existent deal
- Settling a deal remits appropriate payment
- Repeated settlement at same epoch doesn't repeat payment
- Same total payment results from single or multiple settlements.
- No rounding errors result from frequent settlement

## Security Considerations
This proposal improves the progress of blockchain computation by reducing the amount of unnecessary processing,
and placing the necessary part under the block gas limit and market for computation.

This proposal does not introduce any reductions in security.

## Incentive Considerations
This proposal has some impact on the incentives to use the built-in storage market actor for deals by
placing the cost of settling those deals on the storage provider, rather than being subsidised by the network.
By subjecting deal settlement to gas marketplace, SPs are incentivised to use block space efficiently.
For example, SPs may settle payments less frequently, and thus incur less cost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume when SPs do have deal payments to settle, the gas costs of triggering settlement will be very small (~a normal send?) such that the payout is (almost) always more than the gas needed to trigger it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to question -- had the same thought and wondered if you considered (and rejected) gas refunds? If yes, it might be worth capturing the rationale here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, deal settlement is relatively expensive (which is why this is a risk to network stability). I have now made an estimate of the cost based on the cost of cron doing similar work. Settlement is about half as expensive as publishing a deal. This is the cost of on-chain payments in the very inefficient implementation in the built-in market actor.

I will add this estimate to incentive considerations. I will also add consideration of gas refunds to the design rationale. I do not think they are a good idea.


At epoch 3055532 (2023-07-22 08:46:00) the network hosted approximately 37,488,000 deals
(according to [Starboard](https://dashboard.starboard.ventures/market-deals)).
Deals are processed every 86,400 epochs, giving 433 per epoch (deals are distributed quite uniformly between epochs).
The built-in market cron invocation in this epoch consumed 25,509,119,291 gas
(note that this subsidy _exceeds the entire target_ budget for a tipset),
which implies an upper bound on per-deal settlement cost of 58,912,000 gas units.
Note that the cron invocation also performs other work, so this bound is conservative.
Note also that this access pattern is random; a future optimisation to market state could group deals by provider,
significantly reducing the state churn cost of settling deals that share a provider.

As a reference, publishing a batch of 18 deals in the same epoch cost 2,117,931,981 gas units (117,662,887 per deal).
This means that performing a single settlement after deal completion will likely cost
a little less than half as much as publishing it.

Note that a concurrent proposal FIP-0076 to support direct data onboarding will remove the key incentive
to use the built-in market actor at all,
being that it is the only permitted way to commit data to the network.
Thus participants will have incentive to use the built-in market actor only when on-chain payments need settlement,
and a disincentive (cost) otherwise.
Participants engaging in direct data onboarding need pay no deal settlment costs.
The cost of both publishing and settlement grow with the total number of active deals hosted by the market.
If direct data onboarding is widely adopted, these costs could drop to a fraction of current values.

## Product Considerations
This proposal reduces the (unsustainable) level of service provided by the network to deal participants.
The level of service poses a critical risk to blockchain progress.

However, it also removes a subsidy to the built-in market actor that could never be provided to user-programmed smart contracts.
This change thus supports the future development of user-programmed markets to compete on a more level playing field.

## Implementation
Implementation of this proposal is in progress in https://github.com/filecoin-project/builtin-actors/pull/1377.

## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,4 +111,5 @@ This improvement protocol helps achieve that objective for all members of the Fi
|[0071](https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0071.md) | Deterministic State Access (IPLD Reachability) | FIP |@stebalien| Draft |
|[0072](https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0072.md) | Improved event syscall API | FIP | @fridrik01, @Stebalien | Draft |
|[0073](https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0073.md) | Remove beneficiary from the self_destruct syscall | FIP | @Stebalien | Draft |
|[0074](https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0074.md) | Remove cron-based automatic deal settlement | FIP | @anorth, @alexytsu| Draft |
|[0075](https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0075.md) | Improvements to the FVM randomness syscalls | FIP | @arajasek, @Stebalien | Draft |