Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yul optimisation #2

Open
sillytuna opened this issue Jul 29, 2022 · 8 comments
Open

Yul optimisation #2

sillytuna opened this issue Jul 29, 2022 · 8 comments

Comments

@sillytuna
Copy link

Solidity does a pretty good job generally but some things help, for example:

        //uint8(LOOKUP_TABLE_256[(isolateLS1B256(bb) * DEBRUIJN_256) >> 248]);
        bytes memory table = LOOKUP_TABLE_256;
        assembly {
            bb := and(bb, sub(0, bb))
            bb := mul(bb, DEBRUIJN_256)
            bb := shr(248, bb)
            result := mload(add(table, add(bb, 1)))
            result := and(0xff, result)
        }
    }

Whereas this function was the same either way:

    function set(BitMap storage bitmap, uint256 index) internal {
        uint256 bucket = index >> 8;
        uint256 mask = MASK_INDEX_ZERO >> (index & 0xff);
        bitmap._data[bucket] |= mask;
        //assembly {
        //    let bucket := shr(8, index)
        //    let mask := shr(and(index, 0xff), 0x8000000000000000000000000000000000000000000000000000000000000000)
        //    mstore(0, bucket)
        //    mstore(0x20, bitmap.slot)
        //    let slot := keccak256(0, 0x40)
        //    let current := sload(slot)
        //    sstore(slot, or(current, mask))
        //}
    }

However, I've done a test version with Chiru's 4.2 ERC721A (combined with a lengthless array optimisation around storage for this and another struct) and it was easiest to simply port across the two functions used and merge them in rather than me go through and yul/test the solidity-bits lib.

Happy to help if you want the lib improved though.

@sillytuna
Copy link
Author

Another function I converted:

        bytes memory table = LOOKUP_TABLE_256;
        assembly {
             let bucketIndex
             let bb
            //bucket = index >> 8;
            let bucket := shr(8, index)
            // index within the bucket
            //uint256 bucketIndex = (index & 0xff);
            bucketIndex := and(index, 0xff)

            for {
                // load a bitboard from the bitmap.
                //uint256 bb = bitmap._data[bucket];
                mstore(0, bucket)
                mstore(0x20, bitmap.slot)
                bb := sload(keccak256(0, 0x40))
                // offset the bitboard to scan from `bucketIndex`.
                //bb = bb >> (0xff ^ bucketIndex); // bb >> (255 - bucketIndex)
                bb := shr(xor(0xff, bucketIndex), bb)
                if iszero(bb) {
                    bucketIndex := 255
                }
            } iszero(bb) {
                mstore(0, bucket)
                bb := sload(keccak256(0, 0x40))
            } {
                if iszero(bucket) {
                    revert(0, 0)
                }
                bucket := sub(bucket, 1)
            }

            //bb.bitScanForward256()
            bb := and(bb, sub(0, bb))
            bb := mul(bb, DEBRUIJN_256)
            bb := shr(248, bb)
            bb := mload(add(table, add(bb, 1)))
            bb := and(0xff, bb)
            //result = (bucket << 8) | (bucketIndex - bb.bitScanForward256());
            result := or(shl(8, bucket), sub(bucketIndex, bb))
        }
    }

@sillytuna
Copy link
Author

The 721A version and notes can be found in my comment here:

chiru-labs/ERC721A#391 (comment)

@estarriolvetch
Copy link
Owner

@sillytuna This is interesting. Do you have a gas usage comparison (with solidity optimizer enabled)?

@sillytuna
Copy link
Author

I was testing in situ. Marginal gains from the second two funcs but worth it on something like this, esp if could be called a lot.

Should be quick for you to put in your code and run a quick gas comparison tho.

@estarriolvetch
Copy link
Owner

My concern is it may affect the code readability. I also saw you lengthless array PR for ERC721A.

Maybe shifting from mapping to lengthless array for the bitMap datastructure will be a bigger improvement.

@sillytuna
Copy link
Author

My view on that is it depends on use. Code like this can be needed in gas performant functions and will be write once use many. I'd keep the original solidity in hand to be read for sure and yul would need well testing and gas checks.

Your code is already quite hard to understand just because of the algorithm so I'm not sure yul really changes that. Can be well commented and include solidity comments.

@estarriolvetch
Copy link
Owner

        //uint8(LOOKUP_TABLE_256[(isolateLS1B256(bb) * DEBRUIJN_256) >> 248]);
        bytes memory table = LOOKUP_TABLE_256;
        assembly {
            bb := and(bb, sub(0, bb))
            bb := mul(bb, DEBRUIJN_256)
            bb := shr(248, bb)
            result := mload(add(table, add(bb, 1)))
            result := and(0xff, result)
        }
    }

I took a look and it seems the gas saving comes from two factors:

  • removing the require(bb>0) check.
  • inline the isolateLS1B

Unless there is a good reason, I don't think require(bb>0) should be removed. As for the merging the function, solidity optimizer now have this optimization stage and should take care that automatically based on the optimization parameter you specify. It's a deployment gas vs run time gas problem.

@sillytuna
Copy link
Author

sillytuna commented Jul 29, 2022

Makes sense. I forgot that I'd removed that require because of the specific use case essentially having it covered.

It's certainly worth checking what's worth being yul / inline and what isn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants