Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a config to tune between linear and binary search for segmented aggregation #17941

Open
zacw7 opened this issue Jun 24, 2022 · 0 comments

Comments

@zacw7
Copy link
Member

zacw7 commented Jun 24, 2022

#17886 introduced segmented aggregation. In the current implementation, it uses linear search to look for the start row of the last segment in each page. We tried binary search which was implemented as follows:

private int findLastSegmentStart(PagesHashStrategy pagesHashStrategy, Page page)
{
    int left = 0;
    int right = page.getPositionCount() - 1;
    while (left < right) {
        int middle = left + (right - left) / 2;
        if (pagesHashStrategy.rowEqualsRow(page.getPositionCount() - 1, page, middle, page)) {
            // Same segment - The last segment starts somewhere in [lo, mid]
            right = middle;
        }
        else {
            // Different segment - The last segment starts somewhere in (mid ~ hi]
            left = middle + 1;
        }
    }
    return right;
}

As no significant improvement was observed based on our testing, we switched it back to linear search given that the code would be more readable and concise.

However, ideally we'd like to keep both implementations and introduce a config to let the user decide to use which approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant