Article

The Pitfalls of ‘eth_estimateGas’

Our Lead Blockchain Engineer conducts a critical examination of gas limit calculations and strategies to address hidden issues in Ethereum transactions.

Do you trust eth_estimateGas method? If the answer is yes, read on. By the end of this article, you might reconsider. I'll illustrate how this method can unexpectedly transform code that appears flawless into one riddled with bugs.

Context

Imagine a scenario where your platform extends leverage capabilities to smart contract wallets, effectively functioning as margin accounts. The challenge then arises: how do you develop an algorithm capable of efficiently liquidating these accounts? This task necessitates a complex process of closing accounts, which includes withdrawing from various DeFi protocols and executing strategic swaps to the borrowed asset. Consider a portfolio that might include diverse holdings:

  • A position in Convex Finance utilizing Curve Finance’s 3pool LP token, complemented by stETH
  • A mix of multiple tokens paired with Uniswap V3 NFTs in USDC/USDT and WETH/DAI pools.

Tackling such a multifaceted operation requires a specialized approach. At Arkis, we've developed a bespoke Domain-Specific Language (DSL) engineered precisely for this purpose. This language is compiled into an array of commands, where each individual command represents a deferred external call. The structure of a command is as follows:

struct Command {
    address target;
    uint256 value;
    bytes payload;
}

As you may infer, it is being used as an argument for the CALL opcode. Once compiled, this command array is executed inside the margin account, thereby closing existing positions and conducting the necessary asset swaps. A significant challenge, however, emerges when dealing with correlated positions influencing the same liquidity pool. This scenario poses a dilemma: do we engineer a solution to calculate and manage the complex slippage associated with these transactions, or do we design the system to allow for a certain degree of failure in initial commands, opting to retry them in subsequent transactions with adjusted values?

In an ideal setup, proactively managing complex slippage would be the preferred strategy, ensuring seamless and efficient transaction execution. However, given the intricate nature of integrating DSL code with our C++ backend, we have, for the time being, chosen the latter approach. This method entails allowing specific commands to fail initially, without reverting the entire transaction. Instead, we isolate and revert only the individual, atomic components of our DSL script that encounter issues. This approach ensures that a single script's failure does not revert the entire transaction, allowing for targeted adjustments and retries in subsequent operations. The code to achieve this looks like the following:

// ...
for (uint256 i = 0; i < self.content.length; i++) {
    Script calldata script = self.content[i];
    Command[] memory cmds = compiler.compile(script);

    try cmds.execute() {} catch {
    return success = false;
    }
}
// ...
return success = false

The cmds.execute( ) makes an external call. Consequently, if something goes wrong, all modifications made by the current script will be rolled back, excluding changes from previously executed scripts (thanks to try-catch). Next, we regenerate the DSL for a new portfolio. Since we’ve executed the beginning of the submitted DSL code (until failed script), we have partially closed the account, eliminating the need to redo the entire process from scratch. 

In the majority of cases, this code yields successful results. However, there's a subtle, yet significant challenge embedded within this approach. Can you identify the potential bug? Here's a clue to guide you: the issue doesn't lie within the Solidity code itself, which is logically sound. Nor is the problem rooted in the EVM. Instead, the key to uncovering this bug lies in understanding the broader context — specifically, how and where the code is executed. The answer lies within the realm of the client’s web3 library, such as ethers.js. This aspect often goes overlooked, yet it plays a crucial role in the functionality and reliability of our implementation.

Calculate gas limit (ask for a friend’s help)

How does an application determine the appropriate gas limit for a specific transaction on the Ethereum blockchain? Calculating the exact gas limit is not straightforward using a web3 library alone, as this would necessitate having the entire current state of the blockchain available locally—possible but not desired for light clients. To navigate this challenge, virtually all web3 libraries leverage the eth_estimateGas method provided by the Ethereum API for a more feasible estimation.

Opting to not specify a gas limit is technically possible but generally ill-advised. This approach could leave your account vulnerable to unforeseen ETH expenses, increasing the risk of transaction failure due to insufficient gas.

But what exactly occurs behind the scenes during this process? How does the Ethereum execution client go about estimating the gas required for a transaction?

Estimation is not trivial

The most obvious (and generally correct) way to know how much gas will be spent during a transaction is to simply execute the transaction and measure the gas consumption by the end of the process. This approach works in the vast majority of cases. But what about the outliers in which this process doesn’t work? For instance, consider the following code snippet:

function setFlag(bool _flag) external {
if  (gasleft() < 100000 {
     revert("need more gas");
}
flag = _flag;
}

At the conclusion of the function, our calculations suggest we'll likely expend around 45,000 gas units, assuming the flag was initially set to false. To accommodate any unexpected fluctuations, we increase this estimate by 30%, setting our gas limit to 58,500 (45,000 * 1.3). Despite these precautions, we occasionally encounter a 'need more gas' error, signaling a failed transaction. While this only occurs in about 1% of cases, it highlights a significant limitation of our algorithm. This issue typically arises when using gasleft ( ) within contracts, a practice generally discouraged due to its potential to introduce bugs and make the contract vulnerable to Miner Extractable Value (MEV) attacks. These problems are exacerbated when transaction hash, block number, timestamp, gas, transaction origin, and other dynamic transaction values are utilized, increasing the risk of unforeseen issues.

However, there's a larger segment of cases where this method fails, especially in transactions that involve external calls and gas refunds. In Ethereum's early development stages, issues like this and this were frequently reported. We've learned that accurate gas limit calculation isn't always feasible, especially in transactions that involve external calls and gas refunds. In such scenarios, the most reliable method is an optimized form of trial-and-error: a binary search with a between 21,000 and 30,000,000 (ethereum’s block gas limit) (though in our current implementation, this maximum value isn't always equivalent to the block's gas limit for simplicity). The switch from trivial estimation to a binary search in go-ethereum can be found here.

All but one 64th

We also need to take a look at EIP-150 to fully grasp the intricacies of our gas estimation strategy. This will become clear as we delve into the details. Let's zero in on the critical aspect of this proposal: the 'all but one 64th' rule, which fundamentally altered the mechanics of CALL gas forwarding after its implementation.

Before EIP-150, it was possible to forward all available gas to an external call, subject to a call stack limit of 1024. This approach, however, left the system vulnerable to certain types of attacks. To mitigate these risks and move away from hardcoded constraints, EIP-150 introduced a new rule for gas forwarding: at most 63/64 of the current remaining gas can be forwarded to a subsequent call. This means if you have x amount of gas remaining, the next call in the sequence can receive at most x*(63/64) gas, and the call after that, x*(63/64)*(63/64), and so on. Thus, the maximum gas that can be forwarded at any call depth 'D' is x*(63/64)^D.

The important aspect for us is that at least 1/64 is protected from the CALL opcode. This safeguard ensures that there is always some gas reserved, preventing scenarios where an external call consumes all remaining gas, thereby causing the top-level call to fail due to insufficient gas.

Unit tests sometimes lie

Take a look at the code below:

contract Main {
    bool public jobDone;
    
    function doWork(address optionalWorker) external {
        (bool success, ) = optionalWorker.call(
              abi.encodeCall(Optionalworker.doOptionalwork,())
        );

        jobDone = true;
    }
}

contract OptionalWorker {
    uint256[] private worked;
    uint256 public constant WORK_COUNT = 100;

    function doOptionalwork() external {
        for (uint256 i; i < WORK_COUNT; i++) {
            worked. push(i);
        }
    }
    function totalworked() external view returns (uint256) {
        return worked. length;
    }
}

To test the happy path, we’ve written a couple of unit tests:

import (Test} from "forge-std/Test.sol";
import (Main, OptionalWorker} from "../src/Demo.sol";

contract DemoTest is Test {
   Main private immutable main = new Main() ;
   address private immutable worker = address (new Optionalworker());

   function test_doWork() external {
       main.doWork (worker) ;
       assertTrue (main. jobDone ()) ;
   }

   function test totalworked() external {
       main.doWork (worker) ;
       Optionalworker w = Optionalworker (worker) ;
       assertEq(w. totalworked() , w.WORK_COUNT ()) ;
   }
}

Subsequently, we executed those tests and obtained [PASS] results for both cases — seemingly successful. I have created a minimal demo to play around here. To try it out, refer to the instructions on running the Anvil node, deploying contracts, and executing the doWork function in the readme. After execution of doWork, inspect the logs in the running Anvil node. The final log from the optional worker is “finished working on 63.”

63? But the tests assured that all 100 works were completed, didn’t they? If you run the tests with forge test -vvv, you’ll observe the log “gas after call: 9079256848776580291” — wow, that’s a lot of gas to burn. This is the reason for obtaining [PASS] instead of [FAIL]. On the other hand, when we call eth_estimateGas method, Anvil uses the familiar algorithm to determine the minimal gas limit for a transaction—a binary search from the minimum to the maximum. If the transaction succeeds, we lower the limit; if it fails, we raise it. You can check the full code here, at the end of do_estimate_gas_with_state.

The fundamental challenge we face isn't with the binary search itself, but rather with the criteria we use to determine the success or failure of a transaction. Our current approach deems a transaction successful if it doesn't get reverted, prompting us to move to the next iteration of the binary search with a reduced gas limit. This methodology, while practical, overlooks scenarios where a nested external call might run out of gas. We rely on the success of the top-level call to gauge the transaction's overall success, which generally aligns with standard practices. Typically, we would revert a transaction if a low-level call was unsuccessful.

But let's revisit the initial problem we encountered: closing margin accounts using our DSL. In dealing with this, our approach diverges from the best practices. When mitigating slippage, we've opted for a more nuanced strategy. Instead of reverting the entire transaction when a specific command in the DSL script fails, we only undo the changes made by that particular failing script. This selective reversal allows other parts of the transaction that are already executed to be applied to the blockchain state

Explanation of the bug

Now, there is enough knowledge and context to understand where the bugs were in the DSL execution code. The code is actually working in many cases, but in others, the transaction runs without problems yet DSL execution cannot complete. If you debug the code, you might find out that the execution usually stops at a random opcode, and it is even possible to have something like this:

console.log("Hello from here,");
// Literally zero code between
console.log("and also from here");

but only receive “Hello from here,” at a runtime. I encountered that during the debugging when I had no idea about the root cause. To explain what is going on, let’s change the first snippet a bit:

for (uint256 i; i < self. content. length; i++) {
    Script calldata script = self.content[i];
    Command[] memory cmds = compiler.compile(script) ;
    try cmds.execute) { catch
        return success = false;
    }
}
uint256 gas_l = gasleft();
// Work after the loop till the end of the transaction
uint256 gas_2 = gasleft();

Now, let’s focus on the last iteration of the loop — the final cmds.execute() call. How much gas will be forwarded to the call after we’ve asked eth_estimateGas and submitted the transaction with the suggested estimate? Is it just enough to finish, or maybe at most (gas_2-gas_1)*(63/64)? The answer is, of course, the latter. Regardless of how many cmds there are and how big they are, we always forward a constant gas amount, depending only on the work after the call. So, if there isn’t a lot of work to do after the last loop iteration, the last loop’s cmds.execute() will probably not finish its work. Because during the estimation, the client does not care whether some external call finished or went out of gas—only the status of the top-level call matters in the binary search.

How to fix the code

Now that we’ve identified the root of the problem, how do we fix it? It’s actually just a couple of lines of code. Let’s finally address our example:

// Somewhere above the function
error OutofGas();

//...
for (uint256 i; i < self.content.length; i++) {
    Script calldata script = self. content[i];
    Command[] memory cds = compiler.compile(script);
    uint256 gasBefore = gasleft();
    try cmds. execute() {} catch {
        if (gasleft() < gasBefore / 8) revert OutofGas();
        return success = false;
    }
}
// ...
return success = false
//...

By reverting the top-level call if we used almost all the remaining gas in the embedded external call, we are ensuring that during the eth_estimateGas, the client will perform an overestimate instead of an underestimate. To wrap up, it’s worth noting that enforcing an overestimate won’t result in spending more gas, because all the remaining gas will be returned to the caller. It simply refines the binary search for us.

References

https://github.com/ethereum/go-ethereum/pull/3587?source=post_page45ede84d54f0;

https://ethereum.org/en/developers/docs/apis/json-rpc/?source=post_page45ede84d54f0

About Arkis

Arkis offers multichain, undercollateralized leverage powered by portfolio margin. Author Danil Menkin is the Lead Blockchain Engineer.

You can follow us on LinkedIn and Twitter and join our Announcements channel on Telegram to be the first to hear about product updates.