Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable refunds and burn all prepaid gas #107

Open
MaksymZavershynskyi opened this issue Aug 14, 2020 · 37 comments
Open

Disable refunds and burn all prepaid gas #107

MaksymZavershynskyi opened this issue Aug 14, 2020 · 37 comments
Assignees

Comments

@MaksymZavershynskyi
Copy link
Contributor

After a brief discussion with @SkidanovAlex I think this issue is severe enough to be promoted to Phase 1. Also, I think #104 is not framing the right problem: the problem is not that it is not expensive enough to perform shard congestion, the problem that it is possible in the first place. It should not be possible to disable transaction processing of a shard for 2mins without losing a large stake in the system, it shouldn't be merely expensive.

Problem description

Suppose, there is a contract that burns 300Tgas during its execution. Suppose I create 200 transactions that call this contract and submit it asynchronously from multiple accounts so that they end up in the same block. All 200 transactions are going to be admitted into a single block, because converting a single function call transaction to a receipt only costs ~2.5Tgas. Unfortunately only 2 such function calls can be processed per block, which means for the next 100 blocks the shard will be doing nothing but processing delayed receipts and will not be processing new transactions. Resulting in almost 2min downtime for the clients that are using our blockchain.

The cost of a single such attack is 60NEAR, but the attacker then can repeat this attack after delayed receipts are processed.

Broken invariant

The root of the problem is that we are breaking the following invariant:

In a given time interval T it is possible to submit transactions with attached gas that can potentially result in more CPU computation that can be processed in a time interval of the same length T.

It is a flow problem -- if source produces more flow than sink can accept then it will accumulate somewhere.

Heuristics

As long, as the variant is broken, no amount of heuristics can fix it. Examples of broken heuristics that will not work:

  • Increasing gas price based on how long or how much the shard has been congested;
    Contr-argument to the above heuristics:

    • Since attacker can perform attack within one block they are not affected by the price change, so the attack is already viable. Attacker can perform this attack at the random times of the day when the gas price is low. For instance, they can cause 1h total downtime per day, by paying 3600NEAR, which is negligible compared to the damadge it can cause. And this attack is completely agnostic to how much the gas price hikes after the block that admits 200 transactions.
  • Receipt priority queue based on gas price. Every function call has gas price attached to it. Receipts in the delayed queue with higher gas price are processed first.
    Contr-argument:

    • It does not solve the fundamental issue -- the congestion is still possible. The 2 min congestion will have exactly the same price. If they do it 36 times in random times during the day, when there is no prior congestion, it will result in total 1 hour downtime per day;
    • The second-order effects are not studied. It might open a full surface of attack vectors and manipulations that we need to argue about. It is not the first time when we introduce a seemingly simple change with complex second order effects that later create significant difficulties. For example, state staking seemed to be simple at first, until we figured out that now some contracts need attached tokens, and can be locked by state exhaustion;
    • Our DevX and UX will need to be significantly more complex. Now developers and partners need to think how to implement the mechanics that would boost their receipts that get stuck. Some of them might need to include additional UI elements and educate their users on "stuck" receipts the way metamask has special functionality to boost transactions. This significantly degrades our UX;
    • Our Applayer would need to be reworked. Components like bridge would need to have special complex logic to track all receipts produced by the given transaction and unstuck all of them. Wallet would need to have special UI elements and mechanics;
    • The receipts can be delayed infinitely;
    • New attack: Because receipts can be delayed infinitely, user can create lots of receipts when there is a dip in the gas price and permanently use our state to store these receipts, we currently don't make users pay for the state that delayed receipts occupy, which creates an attack angle -- someone can grab a lot of state for the infinitely delayed receipts and make validators store unlimited amount of storage infinitely.

Solution

It is clear we need to unbreak the invariant. The only way to unbreak the variant is to make sure that each block only contains transactions and receipts with prepaid gas that is cumulatively less than the block capacity.

Unfortunately, there is no incentive for users not to set too high prepaid gas, since we reimburse all unused gas. Which means people can fill in blocks with transactions that have 300Tgas attached but burn only 5Tgas, preventing everyone else from using the blockchain at a very low cost (0.015NEAR per second or 54NEAR per hour). To make sure users do not overestimate the prepaid gas, we need to burn it.
Advantages:

  • Congestion problem is solved in its entirety;
  • We don't need different gas price between the shards;
  • Without it the prepaid gas is some magic number that users don't care about and set to max, which is a weird concept on its own;
  • There are no refunds, which means there are 30% fewer receipts for function call transactions:
    • As the result contract call TPS is higher;
    • Contract call finality is 50% faster.
  • Receipts cannot be delayed for infinite amount of time.

Disadvantages:

  • The DevX becomes a pain. You need to precisely estimate gas for cross-contract calls to avoid overcharging the user, but also if you underestimate, the chain of calls might fail at unexpected state.
  • It doesn't fully solve the shard congesting, because multiple shards might route receipts to one shard and still create a delayed queue. But this delayed queue can only be blocked by N blocks (where N is the number of shards) per block.
    • Contr-argument: It fully solves shard congestion, because the total source capacity of the blockchain is equal to its total sink. All shard congestions are temporary and resolvable through resharding.

I suggest we go with the full-burn solution. We know it is bullet-proof, if later we come up with a scheme that allows refunds we implement it through upgrades. Doing it the other way around -- turning off the upgrades post Phase 1 launch is going to be significantly more painful to our users.

@MaksymZavershynskyi MaksymZavershynskyi changed the title Disable refunds and burn all gas Disable refunds and burn all prepaid gas Aug 14, 2020
@bowenwang1996
Copy link
Collaborator

Unfortunately, there is no incentive for users not to set too high prepaid gas, since we reimburse all unused gas. Which means people can fill in blocks with transactions that have 300Tgas attached but burn only 5Tgas

I don't think that is the case. When we process receipts, we look at burnt gas to determine the limit. This means that if there are a lot of transactions that have 300Tgas attached but only uses 5Tgas, we will be able to process all receipts in one block. I agree that it is a problem if each of the transaction actually burns 300Tgas, but if that is the case, I don't think what proposed here helps either.

@MaksymZavershynskyi
Copy link
Contributor Author

MaksymZavershynskyi commented Aug 14, 2020

Unfortunately, there is no incentive for users not to set too high prepaid gas, since we reimburse all unused gas. Which means people can fill in blocks with transactions that have 300Tgas attached but burn only 5Tgas

I don't think that is the case. When we process receipts, we look at burnt gas to determine the limit. This means that if there are a lot of transactions that have 300Tgas attached but only uses 5Tgas, we will be able to process all receipts in one block. I agree that it is a problem if each of the transaction actually burns 300Tgas, but if that is the case, I don't think what proposed here helps either.

My argument that you quoted explains why users don't have incentive to care about the prepaid gas. The contr-argument: "This means that if there are a lot of transactions that have 300Tgas attached but only uses 5Tgas, we will be able to process all receipts in one block." does not argue that users will care about the prepaid gas.

Here is an example: Imagine we've been running network for 2 months, users do not care about prepaid gas, because it is refunded, as a bonus all transactions are processed without delay. Then malicious user comes in and exploits the fact that they can stall the system for 2 mins.

I.e. what I am arguing is "we need to unbreak the variant" => "we need to fill block using prepaid gas" => "prepaid gas need to be close to used gas or else blocks will be underutilized" => "users need to care about prepaid gas" => "we need to burn all prepaid gas". What you are trying to disprove is "user don't care about the gas" => "the blocks are congested" which is not what I am saying.

@bowenwang1996
Copy link
Collaborator

I don't think I understand. How is burning all prepaid gas different from a transaction that uses all prepaid gas in the current setting?

@MaksymZavershynskyi
Copy link
Contributor Author

MaksymZavershynskyi commented Aug 14, 2020

I don't think I understand. How is burning all prepaid gas different from a transaction that uses all prepaid gas in the current setting?

The fact that we have occasional transactions that utilize all prepaid gas does not solve the congestion issue, while burning all prepaid gas solves it.

@bowenwang1996
Copy link
Collaborator

I think I understand your argument now. Let's say that we burn all prepaid gas. Now if an attacker wants to execute the same attack they have to saturate every block with transactions that attach a lot of gas. But the overall cost is the same as the attack in the current system because we charge receipts at the gas price of the block in which they are processed. Is your argument that it is much harder for an attacker to continuously saturate blocks with transactions?

@MaksymZavershynskyi
Copy link
Contributor Author

MaksymZavershynskyi commented Aug 14, 2020

I think I understand your argument now. Let's say that we burn all prepaid gas. Now if an attacker wants to execute the same attack they have to saturate every block with transactions that attach a lot of gas. But the overall cost is the same as the attack in the current system because we charge receipts at the gas price of the block in which they are processed. Is your argument that it is much harder for an attacker to continuously saturate blocks with transactions?

Close, but not exactly. The argument is that there won't be a difference between the attacker and the regular user -- if user is willing to saturate the block and pay for it it does not matter whether they saturate it with useful computation, useless computation, or they underutilize the block but still pay for the unused CPU.

@bowenwang1996
Copy link
Collaborator

they underutilize the block but still pay for the unused CPU.

It seems that you suggest that in the current system attacker can abuse without paying for unused CPU. I don't see why that is the case. As I mentioned before, the number of receipts we process depends on their burnt gas, not used gas. So I don't see how the attacker get away without paying for unused CPU.

@MaksymZavershynskyi
Copy link
Contributor Author

MaksymZavershynskyi commented Aug 14, 2020

It seems that you suggest that in the current system attacker can abuse without paying for unused CPU.

In a system where blocks are filled in based on prepaid gas rather than fees the attacker can abuse without paying for CPU, therefore if we fill in based on prepaid gas we also need to burn all the prepaid gas.

We need to fill in blocks based on prepaid gas, rather than the fees to ensure the invariant described in the issue, if invariant is broken no heuristics can prevent the abuse.

@bowenwang1996
Copy link
Collaborator

In a system where blocks are filled in based on prepaid gas rather than fees the attacker can abuse without paying for CPU, therefore if we fill in based on prepaid gas we also need to burn all the prepaid gas.

But that is not the case. When we process receipts the block is filled based on burnt gas, not prepaid gas. So we can process a lot of receipts with 300T prepaid gas but 5T burnt gas.

@evgenykuzyakov
Copy link
Contributor

Has to go to a NEP

@evgenykuzyakov evgenykuzyakov transferred this issue from near/nearcore Aug 14, 2020
@evgenykuzyakov
Copy link
Contributor

@vgrichina @kcole16 @mikedotexe @potatodepaulo for DevX.

@ilblackdragon @SkidanovAlex For comment on whether this is acceptable approach.

@evgenykuzyakov
Copy link
Contributor

While I think it's the best option so far, I'm not quite sure how to address the downfall of it within contract development. Since our fees right now is 3 times higher than reality, all contracts are incentivized to attach more to be on a safe side.

It also going to affect limited allowance access keys, because they will not produce receipts.

@evgenykuzyakov
Copy link
Contributor

evgenykuzyakov commented Aug 14, 2020

Another big issue is compilation cost. If we don't resolve it with magical solution, then every call (including cross-contract promises and callbacks) has to attach ton of gas for potential cold cache compilation hit.

@bowenwang1996 suggested to ignore compilation cost and pre-compile contracts on deploy. We can increase deploy cost to do this. Then during sync your node is responsible into compiling all contracts within a shard. Pre-compiled contracts can be stored on disk (not in the trie).

More details for compilation cost fix: #97 (comment)

@SkidanovAlex
Copy link
Contributor

@evgenykuzyakov pre-compiled contracts don't have to be in-memory, right? If not, I like the approach.

@mikedotexe
Copy link
Contributor

mikedotexe commented Aug 15, 2020

I can't say that I'm tracking all of this but I think I get the gist.
I don't see this being a huge issue for DevX, honestly. I think we would add an important line item to our Go to Mainnet Checklist to make sure partners take gas estimation seriously.

As far as I am aware we have not delivered a demonstration app showing gas estimation (that line item "Basic Tool for Ballparking" on this doc is slated for end of the month), but have delivered docs. We'll definitely want to change this section of that page if we go forward with this plan.

The way I see it, we would instruct partners (or "heavily suggest") to create simulation tests for the most common transactions in their project, gathering the gas costs. During the simulation tests they should add something like:
println!("Log(s) gas burnt {:?}", execution_outcome.gas_burnt); in a place similar to this line. Then use that value for how much gas they should be adding per call. Right now I don't think anyone knows how much gas to add.

Besides that we would also want to sweep the example repositories (including the near-sdk-rs examples directory) and change the large amount of gas set as constants. (Or at the very least have a comment that no one should simply add this max value as it's costly long term.)

@bowenwang1996
Copy link
Collaborator

@SkidanovAlex yes they will be on disk. See our discussion here #97

@bowenwang1996
Copy link
Collaborator

@mikedotexe I actually think the opposite. If we commit to this change, it means that if you attach smaller amount of gas than what is needed, you contract call will fail and you have to pay for it. It is difficult to estimate precisely the cost because it can be dependent on the contract state, which may be constantly changing. It becomes really bad when the contract owner redeploys the contract and changes the logic inside. So you almost always have to attach more gas than needed to err on the side of caution and therefore waste some gas.

@evgenykuzyakov
Copy link
Contributor

Overall with the free compilation (moved to deploy) the cost of a function call will decrease, so you would mostly attach gas only for compute and storage read/writes. We'll also subtract base cost and re-run param estimator.
There is going to be less refunds, so the overall cost will be even lower.

The solution will work perfectly fine for a single shard, but we'll need to consider gas price auctions or max gas-price across shards for sharded system.

So I would vote for doing this with the full cache of compiled contracts. I think these 2 solution fully address the congestion issues and compilation issues in a short term for a single shard. But we'll need to reconsider gas pricing for a multi-sharded version, due to attack on a single shard without global gas increase.

@vgrichina
Copy link

I think our DevX is already pretty bad around gas and there is no way to estimate reliably how much gas to attach besides measuring and overshooting by 1-2 orders of magnitude.

Making it even worse (by burning prepaid gas) doesn't look like good solution from DevX POV at all. Especially when combined by other fun stuff (like having to send some NEAR with the call as well when using fungible token contract, etc).

Is there any way we can make it radically simpler? E.g. maybe attach some fee in NEAR tokens instead of gas?

P.S. I'm not sure why having tokens locked for a while isn't already pretty big deterrent from attaching too much gas.

Like before we decreased default gas in near-api-js – I effectively had to lock 2 NEAR for every function call. Isn't this already big enough deterrent to not set gas limit too high?

cc @mikedotexe @chadoh @kcole16

@vgrichina
Copy link

@nearmax I read your post more attentively and I think I understand why even locking 2 NEAR not big enough deterrent.

However this assumes that there is no other incentives not to spam like that. What will happen if we sample transactions based on the attached prepaid gas (lower gas – higher chance to be included in the block)?

@evgenykuzyakov
Copy link
Contributor

Current solution doesn't address the congestion issue completely. It just makes gas prices being consistent at prepay time with the gas price at the burn time. It prevents a shard from going into delayed queue for too long, since the max total input equals to the total possible output.
It's still possible to spam shard with new transactions and preventing legit transactions from being selected. It's lottery right now.

So we need more capitalistic solution which allows people to pay more to get a priority, instead of everyone paying the same and not being able to affect the order.

@vgrichina
Copy link

@evgenykuzyakov what do you think regarding sampling transactions to include with priority to lower gas ones? Can use transaction hash for deterministic randomness. Should encourage specifying lower gas as it gets lower latency.

@bowenwang1996
Copy link
Collaborator

bowenwang1996 commented Aug 19, 2020

@vgrichina that is not great because people who legitimately need to run transactions that cost more gas will starve.

@MaksymZavershynskyi
Copy link
Contributor Author

@vgrichina Our top goal is to make sure our protocol works and is not abusable. We cannot have a node which has convenient DevX but is abusable. For example, we cannot argue that our consensus is slow and replace it with heuristics that works in most but not all cases and has better DevX.

When you propose a heuristic make sure it fixes the invariant, if it does not then most likely the node remains abusable.

@evgenykuzyakov

So we need more capitalistic solution which allows people to pay more to get a priority, instead of everyone paying the same and not being able to affect the order.

We can add this feature incrementally. First, we need to make sure people cannot grab part of the network's capacity at a cost that's lower than actually using the network capacity.

@vgrichina
Copy link

vgrichina commented Aug 19, 2020

@nearmax I'm assuming the invariant you mention is:

In a given time interval T it is possible to submit transactions with attached gas that can potentially result in more CPU computation that can be processed in a time interval of the same length T.

I don't see how we break it when we use attached gas to estimate how many transactions can be processed. Seems like we have the opposite problem – we can have far less CPU computation than expected given attached gas, i.e. everything will still go on ok but with suboptimal number of transactions in block.

This IMO is nowhere near being a blocker for Phase 1. It is effectively a performance degradation issue which we have plenty of (e.g. heavy RPC load doesn't seem to be handled well either).

We can add this feature incrementally.
First, we need to make sure people cannot grab part of the network's capacity at a cost that's lower than actually using the network capacity.

I think it's exactly the opposite. Being able to pay for priority is more important, as we will have congestion happen (with real load spike) no matter how hard we try to protect from all possible attacks on throughput (especially the ones only profitable for validators).

@MaksymZavershynskyi
Copy link
Contributor Author

I don't see how we break it when we use attached gas to estimate how many transactions can be processed. Seems like we have the opposite problem – we can have far less CPU computation than expected given attached gas, i.e. everything will still go on ok but with suboptimal number of transactions in block.

@vgrichina , @bowenwang1996 to clarify, we are already directly incentivizing system to work only at 50% capacity, by having gas cost inflation when blocks are more than 50% full. So our current system is already directly designed to work at partial capacity.

I don't see how we break it when we use attached gas to estimate how many transactions can be processed.

We currently do not fill blocks based on how much gas is attached.

This IMO is nowhere near being a blocker for Phase 1.

It is not a Phase 1 blocker, we agree on that.

It is effectively a performance degradation issue which we have plenty of (e.g. heavy RPC load doesn't seem to be handled well either).

It is not a performance degradation, it is a explicit vulnerability that allows anyone to create a 1 hour a day downtime for our system at a very low cost.

I think it's exactly the opposite. Being able to pay for priority is more important, as we will have congestion happen (with real load spike) no matter how hard we try to protect from all possible attacks on throughput (especially the ones only profitable for validators).

It is a valid opinion that gas pricing can be more important, but burning all gas is also extremely important, because as I explained above it makes our system extremely vulnerable.

@mikedotexe
Copy link
Contributor

This is going to be rough for partners like Flux that can't accurately determine gas cost.
I suggest, if possible, we try to allow for some kind of dry run or estimation system before we institute this change.

@vgrichina
Copy link

@mikedotexe why Flux specifically cannot determine gas cost?

Note that it should be pretty reasonable to always burn say 30 Tgas even if some tx only takes 5 gas.
As long as we keep gas cost sufficiently low (i.e. < 1 cent per tx), should be ok to just estimate order of magnitude.

@mikedotexe
Copy link
Contributor

I propose we capture the surplus gas, turn it into Ⓝ, and send it to a community fund account; not burn it. That community fund can, down the road, be voted on with governance in order to better the ecosystem.

@bowenwang1996
Copy link
Collaborator

@mikedotexe I suspect that validators will not like it since it means they get less reward.

@mikedotexe
Copy link
Contributor

@mikedotexe I suspect that validators will not like it since it means they get less reward.

This makes me think that I misunderstand this whole issue. The "burn all prepaid gas" part in the title of this issue leads me to believe the validators would not have gotten this reward, it would be burned. I'm suggesting that whatever was going to be burned should be invested in public goods. Do I fundamentally misunderstand this?

@bowenwang1996
Copy link
Collaborator

@mikedotexe I don't think we agreed on whether the surplus is burned without contributing to validator reward. But even if that is the case, validators will still prefer burning them since it decreases the total supply.

@MaksymZavershynskyi
Copy link
Contributor Author

Independently on whether we burn it all or send it to the community fund, it is going to affect contract DevX the same.

@mikedotexe
Copy link
Contributor

I think the best suggestion I can give them is to use Simulation Tests to determine how much each call with be, and for those that may fluctuate (Flux has these) determine some amount (percentage?) of padding on top.
If there are other ways to help them estimate, please let me know.

@MaksymZavershynskyi
Copy link
Contributor Author

The problem is until, we enable these fees: near/nearcore#3279 (comment) their estimation is going to be wrong, however, very likely they will overestimate it significantly. So I suggest them to not spend too much time on sophisticated estimator and just grab some meaningful upper bound.

@mikedotexe
Copy link
Contributor

The problem is until, we enable these fees: nearprotocol/nearcore#3279 (comment) their estimation is going to be wrong, however, very likely they will overestimate it significantly. So I suggest them to not spend too much time on sophisticated estimator and just grab some meaningful upper bound.

Thanks, that's very helpful.

@norwnd
Copy link

norwnd commented Aug 16, 2023

I wonder whether this is still relevant ... ? Anyway, interesting discussion, leaving my 2 cents here.

The source and sink analogy is interesting, but as @evgenykuzyakov pointed out above it's not the whole story.

Analysing the congestion example:

Suppose, there is a contract that burns 300Tgas during its execution. Suppose I create 200 transactions that call this contract and submit it asynchronously from multiple accounts so that they end up in the same block. All 200 transactions are going to be admitted into a single block, because converting a single function call transaction to a receipt only costs ~2.5Tgas. Unfortunately only 2 such function calls can be processed per block, which means for the next 100 blocks the shard will be doing nothing but processing delayed receipts and will not be processing new transactions. Resulting in almost 2min downtime for the clients that are using our blockchain.

The cost of a single such attack is 60NEAR, but the attacker then can repeat this attack after delayed receipts are processed.

there are 2 different downsides (to how it's currently, seemingly, handled):

  • the transactions being executed come from single (perhaps malicious) actor temporarily denying "democratic" access to Near network
  • shard charges too low a fee (during such congested condition)

denying "democratic" access

Since we cannot know whether transactions are coming from the same actor (and whether or not they are maliscious) there seems to be no way to solve it other than user attaching gas price to each transaction (so those who need access get it, otherwise wait in line - it's fine as long as Near network makes enough on transaction fees, it sells block space after all).

shard charges too low

60 NEAR is surprisingly low number to pull off such an attack, which either means shards should either be more "beefy" or transaction execution should be charged a higher rate, or both.

So it looks like Ethereum with EIP-1559 nailed it ?

Edit: 1 thing where Ethereum fee model lacks is it doesn't differentiate between block space consumers and penalizes ALL of them equally, ideally when lots of NFT get minted or somebody is rushing to liquidate someone in Defi - these shouldn't bump my fees for sending stable coins around (because it's totally unrelated to those congested things).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants