Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: perf issues when purging system.web_sessions #88622

Closed
aadityasondhi opened this issue Sep 23, 2022 · 0 comments · Fixed by #88766
Closed

server: perf issues when purging system.web_sessions #88622

aadityasondhi opened this issue Sep 23, 2022 · 0 comments · Fixed by #88766
Assignees
Labels
A-observability-inf C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@aadityasondhi
Copy link
Collaborator

aadityasondhi commented Sep 23, 2022

This issue was seen in: https://github.com/cockroachlabs/support/issues/1812

In summary, the customer experienced increased CPU usage when the system.web_sessions table grew large.

In the investigation, it was found that the suggested fix would be to:

  1. raise the default LIMIT of 10 to something much greater.
  2. remove ORDER BY random() in
    // purgeOldSessions deletes old web session records.
    // Performs three purges: (1) one for sessions with expiration
    // older than the purge TTL, (2) one for sessions with revocation
    // older than the purge TTL, and (3) one for sessions that have
    // timed out since they were last used.
    func (s *authenticationServer) purgeOldSessions(ctx context.Context) {
    var (
    deleteOldExpiredSessionsStmt = `
    DELETE FROM system.web_sessions
    WHERE "expiresAt" < $1
    ORDER BY random()
    LIMIT $2
    RETURNING 1
    `
    deleteOldRevokedSessionsStmt = `
    DELETE FROM system.web_sessions
    WHERE "revokedAt" < $1
    ORDER BY random()
    LIMIT $2
    RETURNING 1
    `
    deleteSessionsAutoLogoutStmt = `
    DELETE FROM system.web_sessions
    WHERE "lastUsedAt" < $1
    ORDER BY random()
    LIMIT $2
    RETURNING 1
    `
    settingsValues = &s.sqlServer.execCfg.Settings.SV
    internalExecutor = s.sqlServer.internalExecutor
    currTime = s.sqlServer.execCfg.Clock.PhysicalTime()
    purgeTTL = webSessionPurgeTTL.Get(settingsValues)
    autoLogoutTimeout = webSessionAutoLogoutTimeout.Get(settingsValues)
    limit = webSessionPurgeLimit.Get(settingsValues)
    purgeTime = currTime.Add(purgeTTL * time.Duration(-1))
    autoLogoutTime = currTime.Add(autoLogoutTimeout * time.Duration(-1))
    )
    if _, err := internalExecutor.ExecEx(
    ctx,
    "delete-old-expired-sessions",
    nil, /* txn */
    sessiondata.InternalExecutorOverride{User: username.RootUserName()},
    deleteOldExpiredSessionsStmt,
    purgeTime,
    limit,
    ); err != nil {
    log.Errorf(ctx, "error while deleting old expired web sessions: %+v", err)
    }
    if _, err := internalExecutor.ExecEx(
    ctx,
    "delete-old-revoked-sessions",
    nil, /* txn */
    sessiondata.InternalExecutorOverride{User: username.RootUserName()},
    deleteOldRevokedSessionsStmt,
    purgeTime,
    limit,
    ); err != nil {
    log.Errorf(ctx, "error while deleting old revoked web sessions: %+v", err)
    }
    if _, err := internalExecutor.ExecEx(
    ctx,
    "delete-sessions-timeout",
    nil, /* txn */
    sessiondata.InternalExecutorOverride{User: username.RootUserName()},
    deleteSessionsAutoLogoutStmt,
    autoLogoutTime,
    limit,
    ); err != nil {
    log.Errorf(ctx, "error while deleting web sessions older than auto-logout timeout: %+v", err)
    }
    }
    as it causes extra CPU load when purging larger tables

Jira issue: CRDB-19912

@aadityasondhi aadityasondhi added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-observability-inf labels Sep 23, 2022
@aadityasondhi aadityasondhi self-assigned this Sep 23, 2022
aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue Sep 26, 2022
The previous purging limit was found to be too low which resulted in the
`system.web_sessions` table growing rapidly. Additionally, we were using
`ORDER BY random()` when deleting expired sessions from the table which
caused a full table scan of a large table. This resulted in high cpu
usage on every purge attempt with only very few rows actually being
deleted. Over time, this turns into a nonideal feedback loop.

This change fixes the two issues by:
1. Increasing the defauilt purging limit to 1000 from 10.
2. Remove the `ORDER BY random()` when purging the table to avoid full
   table scans on every purge.

support ticket: cockroachlabs/support#1812

resolves cockroachdb#88622

Release note: None
aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue Sep 29, 2022
The previous purging limit was found to be too low which resulted in the
`system.web_sessions` table growing rapidly. This happened when a
customer-side automation script was generating a large number of
sessions. Additionally, we were using `ORDER BY random()` when deleting
expired sessions from the table which caused a full table scan of a large
table. This resulted in high cpu usage on every purge attempt with only very
few rows actually being deleted. Over time, this turns into a nonideal
feedback loop.

This change fixes the two issues by:
1. Increasing the defauilt purging limit to 1000 from 10.
2. Remove the `ORDER BY random()` when purging the table to avoid full
   table scans on every purge.

support ticket: cockroachlabs/support#1812

resolves cockroachdb#88622

Release note: None
craig bot pushed a commit that referenced this issue Oct 13, 2022
88766: server: fix bottlenecks when purging system.web_sessions r=dhartunian a=aadityasondhi

The previous purging limit was found to be too low which resulted in the
`system.web_sessions` table growing rapidly. This happened when a
customer-side automation script was generating a large number of
sessions. Additionally, we were using `ORDER BY random()` when deleting
expired sessions from the table which caused a full table scan of a large
table. This resulted in high cpu usage on every purge attempt with only very
few rows actually being deleted. Over time, this turns into a nonideal
feedback loop.

This change fixes the two issues by:
1. Increasing the defauilt purging limit to 1000 from 10.
2. Remove the `ORDER BY random()` when purging the table to avoid full
   table scans on every purge.

support ticket: https://github.com/cockroachlabs/support/issues/1812

resolves #88622

Release note: None

Co-authored-by: Aaditya Sondhi <aadityas@cockroachlabs.com>
@craig craig bot closed this as completed in 58f2b9f Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-observability-inf C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant