Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: poor performance on small table without index, likely due to contention #16252

Closed
tbg opened this issue May 31, 2017 · 8 comments
Closed
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Milestone

Comments

@tbg
Copy link
Member

tbg commented May 31, 2017

Not sure how actionable this is, but user @Freeaqingme reported very poor behavior in production with the following schema and queries. The main issue seemed to be contention from the full table scan associated with the first SELECT statement below, but the actual behavior was extremely bad. Adding an index on webhostingServerFqdn appears to have completely resolved the situation. When unhappy, reads would routinely have to wait ~2s for single overlapping requests, and would still take ~300ms after having cleared the command queue. The leader node for the processing table would be constantly maxed out on CPU. Since many operations in this setting take >500ms, SendNextTimeout kicks in and makes an even bigger mess.

I can understand that an insertion-heavy schema with full table scans performs poorly due to contention, but I don't understand why individual operations should take ~1s; it seems that the contention somehow causes a more fundamental badness that shouldn't be there. I think one would be able to strip this schema down to the bare necessities and to repro the bad behavior with load similar to the one explained below.

CREATE TABLE request (
  id SERIAL PRIMARY KEY,
  command STRING NOT NULL,
  arguments STRING NOT NULL,
  webhostingServerFqdn STRING NOT NULL,
  workflowId INT NOT NULL,
  workflowRequest STRING NOT NULL,
  version INT NOT NULL,
  requestedAt TIMESTAMP NOT NULL DEFAULT NOW(),
  FAMILY requestFamily (command, arguments, webhostingServerFqdn, version, requestedAt),
  FAMILY responseFamily (workflowId, workflowRequest)
);

CREATE TABLE processing (
  id SERIAL PRIMARY KEY,
  command STRING NOT NULL,
  arguments STRING NOT NULL,
  webhostingServerFqdn STRING NOT NULL,
  workflowId INT NOT NULL,
  workflowRequest STRING NOT NULL,
  version INT NOT NULL,
  requestedAt TIMESTAMP NOT NULL,
  processingAt TIMESTAMP NOT NULL DEFAULT NOW(),
  FAMILY requestFamily (command, arguments, webhostingServerFqdn, version, requestedAt, processingAt),
  FAMILY responseFamily (workflowId, workflowRequest)
);

CREATE TABLE response (
  id SERIAL PRIMARY KEY,
  command STRING NOT NULL,
  arguments STRING NOT NULL,
  webhostingServerFqdn STRING NOT NULL,
  workflowId INT NOT NULL,
  workflowRequest STRING NOT NULL,
  version INT NOT NULL,
  requestedAt TIMESTAMP NOT NULL,
  processingAt TIMESTAMP NOT NULL,
  responseAt TIMESTAMP NOT NULL DEFAULT NOW(),
  responseOutput STRING NOT NULL,
  responseError STRING NOT NULL,
  exitCode INT NOT NULL,
  FAMILY requestFamily (command, arguments, webhostingServerFqdn, version, requestedAt, processingAt),
  FAMILY responseFamily (workflowId, workflowRequest, responseAt, responseOutput, responseError, exitCode)
);

grep 'SELECT\|UPDATE\|INSERT\|DELETE' QueueProcessor/* | grep -v _test.go  | cut -d '"' -f'2-' | sed 's/)$//g' | sed 's/"$//g'

# SELECT query is executed every 5 seconds from ~100 machines
SELECT id, webhostingServerFqdn, version, command, arguments, workflowId, workflowRequest, requestedAt FROM " + tableRequest + " WHERE webhostingServerFqdn IN (" + webhostingServerFqdnIn + ") ORDER BY id ASC LIMIT 1

# If the query above has a result (which typically only applies to a few machines at a time), the following:
DELETE FROM " + tableRequest + " WHERE id=$1
SELECT id, webhostingServerFqdn, version, command, arguments, workflowId, workflowRequest, requestedAt, processingAt FROM " + tableProcessing + " WHERE id=$1 ORDER BY id ASC LIMIT 1
INSERT INTO " + tableProcessing + " (id, webhostingServerFqdn, version, command, arguments, workflowId, workflowRequest, requestedAt) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
DELETE FROM " + tableProcessing + " WHERE id=$1
INSERT INTO " + tableResponse + " (id, webhostingServerFqdn, version, command, arguments, workflowId, workflowRequest, requestedAt, processingAt, responseOutput, responseError, exitCode) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)

# Logic performed by a handful of nodes every few seconds
SELECT id, webhostingServerFqdn, version, command, arguments, workflowId, workflowRequest, requestedAt, processingAt, responseAt, responseOutput, responseError, exitCode FROM " + tableResponse + " ORDER BY id ASC LIMIT 1
DELETE FROM " + tableResponse + " WHERE id = $1

@petermattis
Copy link
Collaborator

@spencerkimball Can you perform a bit of initial triage given your recent work on high contention scenarios?

@bdarnell
Copy link
Contributor

bdarnell commented Jun 1, 2017

SELECT query is executed every 5 seconds from ~100 machines

Without the index, that's 20 full scans of the table per second. This table is being used as a queue, so there are probably more tombstones than there is actual data. These add to the cost of the full table scan but not the properly-indexed query (because the deleted tombstones generally have lower IDs, when we add the index we don't have to scan them. I'm not sure that we need contention-related effects to explain what's going on here.

@petermattis
Copy link
Collaborator

Hmm, that's a good point. We might be scanning over a ton of deleted keys to find the first non-deleted key. Should be straightforward to verify that with a simple test. I don't have any good thoughts on how to fix this, though.

@bdarnell
Copy link
Contributor

bdarnell commented Jun 1, 2017

We're not even looking for the first non-deleted key. We're looking for the first non-deleted key whose webhostingServerFqdn matches the query. I don't think there's anything we could reasonably do here that would improve the performance of this query without an index. Ultimately it's up to the application to use the right indexes for their queries.

We should, on the other hand, think about how we could make it more apparent to the user that this index is necessary. A LIMIT 1 query that the planner can't figure out how to do without a full query scan is usually a bad sign. The postgres protocol allows us to send "notice" messages back with the response to give warnings to the user. The visibility of these warnings depends on the tool and is generally poor, but it's better than nothing (especially if we can show them in our sql shell). We also could let the user pass in some sort of "max tombstones" query hint to have the query abort rather than become expensive if something like this happens.

@Freeaqingme
Copy link

Although of course ideally we'd find out some sort of optimization that fixes this (besides users using sane indexes - something we clearly weren't), there may be two metrics you could perhaps add to the admin console:

  • The number of queries that exceed a certain rows/tombstones scanned ratio
  • The number of queries that required a full scan

Once you drill down to a specific query the warnings in the sql shell may be of interest (as well as the TRACE and EXPLAIN output, perhaps). But in order to decide what's going in ("the cluster seems slow and pulling a lot of cpu"), some metrics in the admin console hinting what's going on could be a first step into determining what's going on.

@petermattis petermattis modified the milestone: 1.1 Jun 1, 2017
@petermattis
Copy link
Collaborator

I wonder if there is some way to keep track of the non-deleted span of keys for a range. Seems a bit challenging to do at the MVCC/Replica level, but perhaps this could be added to the sstable metadata as part of compactions and iterators could indicate that they are interested in non-deleted keys.

@petermattis
Copy link
Collaborator

Any improvement here will have to wait until 1.2.

@petermattis petermattis modified the milestones: 1.1, 1.2 Sep 26, 2017
@bdarnell bdarnell modified the milestones: 2.0, Later Feb 8, 2018
@petermattis petermattis added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-transactions Relating to MVCC and the transactional model. labels Jul 21, 2018
@tbg
Copy link
Member Author

tbg commented Jul 31, 2018

Closing this for #17229.

@tbg tbg closed this as completed Jul 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

No branches or pull requests

5 participants