Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve pagination performance #879

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

feat: improve pagination performance #879

wants to merge 14 commits into from

Conversation

yajra
Copy link
Owner

@yajra yajra commented Aug 9, 2024

Improve pagination performance by using ROW_NUMBER() and handling default orders.

fix: #878, #870
test: pagination improvement SQL changes
fix: subSelect where in query
fix: pagination when using fromSub

alexc-jeromes and others added 3 commits August 7, 2024 10:35
Improve pagination performance by using ROW_NUMBER() and handling default orderBys.
test: pagination improvement
Co-authored-by: StyleCI Bot <bot@styleci.io>
@yajra
Copy link
Owner Author

yajra commented Aug 9, 2024

Requesting for review @alexc-jeromes and @mixaster, TIA!

@mixaster
Copy link

mixaster commented Aug 9, 2024

So, @yajra I implemented this in our app to check, and got some new errors on previously working queries, specifically an "Invalid Identifier" in the outer query on the "tablename.*" column.

Also, the initial query that was giving me trouble still took more than 30 seconds.

@mixaster
Copy link

mixaster commented Aug 9, 2024

Is there a reason why there's a wrapper query? if I take my regular query and just add something like "FETCH FIRST 20 ROWS ONLY", it comes back in 0.28 seconds on a table of 281K records (in SQL Developer).

Honest question, I really don't know too much about the inner workings of Oracle.

@mixaster
Copy link

mixaster commented Aug 9, 2024

@yajra I did some experimenting....

I replaced compileRowConstraint with the following:

protected function compileRowConstraint($query)
    {
        if ($query->limit == 1 && is_null($query->offset)) {
            return "fetch next {$query->limit} rows only";
        }

        if ($query->offset && is_null($query->limit)) {
            return "offset {$query->offset} rows";
        }

        return "offset {$query->offset} rows fetch next {$query->limit} rows only";
    }

and then compileTableExpression just does a

return "{$sql} {$constraint}";

It seems to work throughout my app in initial testing. Looking for a sanity check. I could make a PR if you'd like.

@yajra
Copy link
Owner Author

yajra commented Aug 10, 2024

Unfortunately, there is a need to support a lower Oracle version down to 10g which is why I can't use fetch syntax.

I considered making this configurable and pagination will depend on the stated Oracle version but I couldn't execute it yet.

@yajra
Copy link
Owner Author

yajra commented Aug 10, 2024

So, @yajra I implemented this in our app to check, and got some new errors on previously working queries, specifically an "Invalid Identifier" in the outer query on the "tablename.*" column.

Also, the initial query that was giving me trouble still took more than 30 seconds.

Can you provide a snippet to reproduce the invalid identifier error?

Based on my testing, some of my queries only improved by a few milliseconds. But I will still have to check on a bigger application.

@mixaster
Copy link

mixaster commented Aug 10, 2024

@yajra This QueryBuilder call:

$studies = QueryBuilder::for(Study::query()->withUserCount())
            ->withCount(['sites'])
            ->paginate(getPerPage(10))
            ->withQueryString();

Produced this SQL:

select "STUDIES".*, (select count(*) from "STUDY_USER" inner join "MODEL_HAS_ROLES" on "STUDY_USER"."USER_ID" = "MODEL_HAS_ROLES"."MODEL_ID" inner join "ROLES" on "ROLES"."ID" = "MODEL_HAS_ROLES"."ROLE_ID" where "ROLES"."NAME" in (*****) and "STUDY_USER"."STATUS" = active and "STUDY_USER"."STUDY_ID" = "STUDIES"."ID") as "STUDY_USER_COUNT", (select count(*) from "SITES" inner join "STUDY_SITE" on "SITES"."ID" = "STUDY_SITE"."SITE_ID" where "STUDIES"."ID" = "STUDY_SITE"."STUDY_ID" and "SITES"."DELETED_AT" is null) as "SITES_COUNT" from (select t1.*, row_number() over (order by "NAME" asc) as rn from (select "STUDIES".*, (select count(*) from "STUDY_USER" inner join "MODEL_HAS_ROLES" on "STUDY_USER"."USER_ID" = "MODEL_HAS_ROLES"."MODEL_ID" inner join "ROLES" on "ROLES"."ID" = "MODEL_HAS_ROLES"."ROLE_ID" where "ROLES"."NAME" in (?, ?, ?, ?, ?) and "STUDY_USER"."STATUS" = ? and "STUDY_USER"."STUDY_ID" = "STUDIES"."ID") as "STUDY_USER_COUNT", (select count(*) from "SITES" inner join "STUDY_SITE" on "SITES"."ID" = "STUDY_SITE"."SITE_ID" where "STUDIES"."ID" = "STUDY_SITE"."STUDY_ID" and "SITES"."DELETED_AT" is null) as "SITES_COUNT" from "STUDIES" where "STUDIES"."DELETED_AT" is null order by "NAME" asc) t1) where rn between 1 and 10

and the error was ORA-00904: "STUDIES": invalid identifier

@yajra
Copy link
Owner Author

yajra commented Aug 11, 2024

I adjusted the table alias to use the original table in $query->from to preserve the compatibility in selected columns. Can you please check again @mixaster.

I will also add some tests using Eloquent with relation count when I can.

Thanks!

@yajra yajra marked this pull request as draft August 11, 2024 05:03
@yajra yajra marked this pull request as ready for review August 11, 2024 05:25
@yajra
Copy link
Owner Author

yajra commented Aug 11, 2024

Initial tests result vs 1.1k records:

Before:

select t2.* from ( select rownum AS "rn", t1.* from (select "ROLES".*, (select count(*) from "USERS" inner join "ROLE_USER" on "USERS"."ID" = "ROLE_USER"."USER_ID" where "ROLES"."ID" = "ROLE_USER"."ROLE_ID") as "USERS_COUNT", (select count(*) from "PERMISSIONS" inner join "PERMISSION_ROLE" on "PERMISSIONS"."ID" = "PERMISSION_ROLE"."PERMISSION_ID" where "ROLES"."ID" = "PERMISSION_ROLE"."ROLE_ID") as "PERMISSIONS_COUNT" from "ROLES" order by "ID" desc) t1 ) t2 where t2."rn" between 1 and 10
  • 1st: 607ms
  • 2nd: 312ms
  • 3rd: 255ms
  • 4th: 259ms

After:

select "ROLES".*, (select count(*) from "USERS" inner join "ROLE_USER" on "USERS"."ID" = "ROLE_USER"."USER_ID" where "ROLES"."ID" = "ROLE_USER"."ROLE_ID") as "USERS_COUNT", (select count(*) from "PERMISSIONS" inner join "PERMISSION_ROLE" on "PERMISSIONS"."ID" = "PERMISSION_ROLE"."PERMISSION_ID" where "ROLES"."ID" = "PERMISSION_ROLE"."ROLE_ID") as "PERMISSIONS_COUNT" from (select "ROLES".*, row_number() over (order by "ID" desc) as rn from (select "ROLES".*, (select count(*) from "USERS" inner join "ROLE_USER" on "USERS"."ID" = "ROLE_USER"."USER_ID" where "ROLES"."ID" = "ROLE_USER"."ROLE_ID") as "USERS_COUNT", (select count(*) from "PERMISSIONS" inner join "PERMISSION_ROLE" on "PERMISSIONS"."ID" = "PERMISSION_ROLE"."PERMISSION_ID" where "ROLES"."ID" = "PERMISSION_ROLE"."ROLE_ID") as "PERMISSIONS_COUNT" from "ROLES" order by "ID" desc) "ROLES") "ROLES" where rn between 1 and 10
  • 1st: 147ms
  • 2nd: 129ms
  • 3rd: 112ms
  • 4th: 114ms

@mixaster
Copy link

I had to wait until we had our dev environment available to test... unfortunately, the pagination was still slow with this version.

This query:

select
  *
from
  (
    select
      "ACTIVITY_LOG".*,
      row_number() over (
        order by
          "CREATED_AT" asc
      ) as rn
    from
      (
        select
          *
        from
          "ACTIVITY_LOG"
        order by
          "CREATED_AT" asc
      ) "ACTIVITY_LOG"
  ) "ACTIVITY_LOG"
where
  rn between 1
  and 20

with 800K rows took 23 seconds.

@alexc-jeromes
Copy link

@mixaster do you have indexes on that table? Curious to see if/how it changes things. Our tables were 5M or so.

@mixaster
Copy link

I do have indexes. it's the inner select I think that's causing the slowdown.

@yajra
Copy link
Owner Author

yajra commented Aug 24, 2024

Oic, we could trim down the query to a single inner select. Something like:

select *
from (select users.*, row_number() over (order by rowid) as rn from USERS) users
where rn between 1 and 10

I will try to implement it.

@mixaster, can you confirm if this trimmed query will improve your use case?

@mixaster
Copy link

There's still a full scan, so in my test of the raw query it took over 14 seconds.

@yajra
Copy link
Owner Author

yajra commented Sep 10, 2024

Apologies, I accidentally tagged v11.6.1 on this branch.

On the good side, this PR got tested and seemed to break #891.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants