Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUMULUS-3642: postgres query adjustment #3731

Merged
merged 21 commits into from
Jul 31, 2024
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
## [Unreleased]

### Replace ElasticSearch Phase 1

- **CUMULUS-3239**
- Updated `executions` list api endpoint and added `ExecutionSearch` class to query postgres
- **CUMULUS-3240**
Expand All @@ -17,6 +18,11 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Removed elasticsearch dependency from granules endpoint
- **CUMULUS-3641**
- Updated `collections` api endpoint to query postgres instead of elasticsearch except if `includeStats` is in the query parameters
- **CUMULUS-3642**
- Adjusted queries to improve performance:
- Used count(*) over count(id) to count rows
- Estimated row count for large tables (granules and executions) by default for basic query
- Updated stats summary to default to the last day
- **CUMULUS-3688**
- Updated `stats` api endpoint to query postgres instead of elasticsearch
- **CUMULUS-3689**
Expand Down
21 changes: 7 additions & 14 deletions packages/api/endpoints/granules.js
Original file line number Diff line number Diff line change
Expand Up @@ -487,7 +487,6 @@ async function patchByGranuleId(req, res) {
granulePgModel = new GranulePgModel(),
knex = await getKnexClient(),
} = req.testContext || {};
let pgGranule;
const body = req.body;
const action = body.action;

Expand All @@ -500,18 +499,11 @@ async function patchByGranuleId(req, res) {
);
}

try {
pgGranule = await await getUniqueGranuleByGranuleId(
knex,
req.params.granuleId,
granulePgModel
);
} catch (error) {
if (error instanceof RecordDoesNotExist) {
log.info('Granule does not exist');
return res.boom.notFound('No record found');
}
}
const pgGranule = await getUniqueGranuleByGranuleId(
knex,
req.params.granuleId,
granulePgModel
);

const collectionPgModel = new CollectionPgModel();
const pgCollection = await collectionPgModel.get(knex, {
Expand Down Expand Up @@ -681,17 +673,18 @@ async function delByGranuleId(req, res) {
const {
knex = await getKnexClient(),
} = req.testContext || {};
let pgGranule;
const granuleId = req.params.granuleId;
log.info(`granules.del ${granuleId}`);

let pgGranule;
try {
pgGranule = await getUniqueGranuleByGranuleId(knex, granuleId);
} catch (error) {
if (error instanceof RecordDoesNotExist) {
log.info('Granule does not exist');
return res.boom.notFound(`Granule ${granuleId} does not exist or was already deleted`);
}
throw error;
}

const deletionDetails = await deleteGranuleAndFiles({
Expand Down
16 changes: 13 additions & 3 deletions packages/api/endpoints/stats.js
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,16 @@ function getType(req) {
* @returns {Promise<Object>} the promise of express response object
*/
async function summary(req, res) {
const stats = new StatsSearch({ queryStringParameters: req.query }, 'granule');
const params = req.query;

const now = Date.now();
params.timestamp__from = Number.parseInt(get(
params,
'timestamp__from',
now - 24 * 3600 * 1000
), 10);
params.timestamp__to = Number.parseInt(get(params, 'timestamp__to', now), 10);
const stats = new StatsSearch({ queryStringParameters: params }, 'granule');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so now, it gets the recent day of granule stats summary, can they still have it where it shows the summary of all time? or would that be too slow/time-consuming api-reponse-wise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aggregations (count, avg) take forever without limiting the time range

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, it wasn't returning for me when using the lp-db in most cases, thanks for the explanation 👍

const r = await stats.summary();
return res.send(r);
}
Expand All @@ -50,8 +59,9 @@ async function summary(req, res) {
* @returns {Promise<Object>} the promise of express response object
*/
async function aggregate(req, res) {
if (getType(req)) {
const stats = new StatsSearch({ queryStringParameters: omit(req.query, 'type') }, getType(req));
const type = getType(req);
if (type) {
const stats = new StatsSearch({ queryStringParameters: omit(req.query, 'type') }, type);
const r = await stats.aggregate();
return res.send(r);
}
Expand Down
20 changes: 11 additions & 9 deletions packages/api/tests/endpoints/stats.js
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,10 @@ test.before(async (t) => {
granules.push(fakeGranuleRecordFactory({
collection_cumulus_id: num % 20,
status: statuses[num % 4],
created_at: (new Date(2018 + (num % 6), (num % 12), (num % 30))),
updated_at: (new Date(2018 + (num % 6), (num % 12), ((num + 1) % 29))),
created_at: num === 99
? new Date() : (new Date(2018 + (num % 6), (num % 12), (num % 30))),
updated_at: num === 99
? new Date() : (new Date(2018 + (num % 6), (num % 12), ((num + 1) % 29))),
error: errors[num % 5],
duration: num + (num / 10),
}))
Expand Down Expand Up @@ -209,17 +211,17 @@ test('GET /stats/aggregate with an invalid access token returns an unauthorized
assertions.isInvalidAccessTokenResponse(t, response);
});

test('GET /stats returns correct response, defaulted to all', async (t) => {
test('GET /stats returns correct response, defaulted to the last day', async (t) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small nit, instead of the last day id say the last 24 hours, it makes it more clear its from the recent past versus some time frame, not really breaking though

const response = await request(app)
.get('/stats')
.set('Accept', 'application/json')
.set('Authorization', `Bearer ${jwtAuthToken}`)
.expect(200);

t.is(response.body.errors.value, 80);
t.is(response.body.processingTime.value, 54.44999999642372);
t.is(response.body.granules.value, 100);
t.is(response.body.collections.value, 20);
t.is(response.body.errors.value, 0);
t.is(response.body.processingTime.value, 108.9000015258789);
t.is(response.body.granules.value, 1);
t.is(response.body.collections.value, 1);
});

test('GET /stats returns correct response with date params filters values correctly', async (t) => {
Expand Down Expand Up @@ -261,10 +263,10 @@ test('GET /stats/aggregate filters correctly by date', async (t) => {

const expectedCount = [
{ key: 'failed', count: 16 },
{ key: 'running', count: 9 },
{ key: 'completed', count: 8 },
{ key: 'queued', count: 8 },
{ key: 'running', count: 8 },
];
t.is(response.body.meta.count, 41);
t.is(response.body.meta.count, 40);
t.deepEqual(response.body.count, expectedCount);
});
2 changes: 1 addition & 1 deletion packages/api/tests/endpoints/test-executions.js
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ test.serial('GET executions returns list of executions by default', async (t) =>
t.is(results.length, 3);
t.is(meta.stack, process.env.stackName);
t.is(meta.table, 'executions');
t.is(meta.count, 3);
t.true(meta.count > 0);
const arns = fakeExecutions.map((i) => i.arn);
results.forEach((r) => {
t.true(arns.includes(r.arn));
Expand Down
2 changes: 1 addition & 1 deletion packages/api/tests/endpoints/test-granules.js
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,7 @@ test.serial('default lists and paginates correctly from querying database', asyn
t.is(results.length, 4);
t.is(meta.stack, process.env.stackName);
t.is(meta.table, 'granules');
t.is(meta.count, 4);
t.true(meta.count > 0);
results.forEach((r) => {
t.true(granuleIds.includes(r.granuleId));
});
Expand Down
48 changes: 46 additions & 2 deletions packages/db/src/search/BaseSearch.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { Knex } from 'knex';
import get from 'lodash/get';
import omit from 'lodash/omit';
import Logger from '@cumulus/logger';

Expand Down Expand Up @@ -83,6 +84,17 @@ class BaseSearch {
return !!(not?.providerName || term?.providerName || terms?.providerName);
}

/**
* Determine if an estimated row count should be returned
*
* @param countSql - sql statement for count
* @returns whether an estimated row count should be returned
*/
protected shouldEstimateRowcount(countSql: string): boolean {
const isBasicQuery = (countSql === `select count(*) from "${this.tableName}"`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so itll do an estimate if its a basic query, without any where, from, range, etc. ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's for returning the table rowcount. With filters, we can't estimate the count and have to run the query.

return this.dbQueryParameters.estimateTableRowCount === true && isBasicQuery;
}

/**
* Build the search query
*
Expand Down Expand Up @@ -439,6 +451,28 @@ class BaseSearch {
throw new Error('translatePostgresRecordsToApiRecords is not implemented');
}

/**
* Get estimated table rowcount
*
* @param params
* @param params.knex - DB client
* @param [params.tableName] - table name
* @returns rowcount
*/
protected async getEstimatedRowcount(params: {
knex: Knex,
tableName? : string,
}) : Promise<number> {
const { knex, tableName = this.tableName } = params;
const query = knex.raw(`EXPLAIN (FORMAT JSON) select * from "${tableName}"`);
log.debug(`Estimating the row count ${query.toSQL().sql}`);
const countResult = await query;
const countPath = 'rows[0]["QUERY PLAN"][0].Plan["Plan Rows"]';
const estimatedCount = get(countResult, countPath);
const count = Number(estimatedCount ?? 0);
return count;
}

/**
* Build and execute search query
*
Expand All @@ -448,12 +482,22 @@ class BaseSearch {
async query(testKnex?: Knex) {
const knex = testKnex ?? await getKnexClient();
const { countQuery, searchQuery } = this.buildSearch(knex);

const shouldEstimateRowcount = countQuery
? this.shouldEstimateRowcount(countQuery?.toSQL().sql)
: false;
const getEstimate = shouldEstimateRowcount
? this.getEstimatedRowcount({ knex })
: undefined;

try {
const [countResult, pgRecords] = await Promise.all([countQuery, searchQuery]);
const [countResult, pgRecords] = await Promise.all([
getEstimate || countQuery, searchQuery,
]);
const meta = this._metaTemplate();
meta.limit = this.dbQueryParameters.limit;
meta.page = this.dbQueryParameters.page;
meta.count = Number(countResult[0]?.count ?? 0);
meta.count = shouldEstimateRowcount ? countResult : Number(countResult[0]?.count ?? 0);

const apiRecords = await this.translatePostgresRecordsToApiRecords(pgRecords, knex);

Expand Down
4 changes: 2 additions & 2 deletions packages/db/src/search/CollectionSearch.ts
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ export class CollectionSearch extends BaseSearch {
searchQuery: Knex.QueryBuilder,
} {
const countQuery = knex(this.tableName)
.count(`${this.tableName}.cumulus_id`);
.count('*');

const searchQuery = knex(this.tableName)
.select(`${this.tableName}.*`);
Expand Down Expand Up @@ -136,7 +136,7 @@ export class CollectionSearch extends BaseSearch {
const granulesTable = TableNames.granules;
const statsQuery = knex(granulesTable)
.select(`${granulesTable}.collection_cumulus_id`, `${granulesTable}.status`)
.count(`${granulesTable}.status`)
.count('*')
.groupBy(`${granulesTable}.collection_cumulus_id`, `${granulesTable}.status`)
.whereIn(`${granulesTable}.collection_cumulus_id`, collectionCumulusIds);

Expand Down
27 changes: 17 additions & 10 deletions packages/db/src/search/ExecutionSearch.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import { Knex } from 'knex';
import Logger from '@cumulus/logger';
import pick from 'lodash/pick';
import set from 'lodash/set';
import { constructCollectionId } from '@cumulus/message/Collections';
import { ApiExecutionRecord } from '@cumulus/types/api/executions';
import { BaseSearch } from './BaseSearch';
Expand All @@ -24,6 +25,10 @@ interface ExecutionRecord extends BaseRecord, PostgresExecutionRecord {
*/
export class ExecutionSearch extends BaseSearch {
constructor(event: QueryEvent) {
// estimate the table rowcount by default
if (event?.queryStringParameters?.estimateTableRowCount !== 'false') {
set(event, 'queryStringParameters.estimateTableRowCount', 'true');
}
super(event, 'execution');
}

Expand All @@ -34,8 +39,7 @@ export class ExecutionSearch extends BaseSearch {
*/
protected searchAsync(): boolean {
const { not, term, terms } = this.dbQueryParameters;
return (!!(not?.asyncOperationId ||
term?.asyncOperationId || terms?.asyncOperationId));
return (!!(not?.asyncOperationId || term?.asyncOperationId || terms?.asyncOperationId));
}

/**
Expand All @@ -45,8 +49,7 @@ export class ExecutionSearch extends BaseSearch {
*/
protected searchParent(): boolean {
const { not, term, terms } = this.dbQueryParameters;
return (!!(not?.parentArn ||
term?.parentArn || terms?.parentArn));
return (!!(not?.parentArn || term?.parentArn || terms?.parentArn));
}

/**
Expand All @@ -66,20 +69,24 @@ export class ExecutionSearch extends BaseSearch {
executions: executionsTable,
} = TableNames;

const searchQuery = knex(`${this.tableName} as ${this.tableName}`)
const searchQuery = knex(`${this.tableName}`)
.select(`${this.tableName}.*`)
.select({
collectionName: `${collectionsTable}.name`,
collectionVersion: `${collectionsTable}.version`,
asyncOperationId: `${asyncOperationsTable}.id`,

});

if (this.searchAsync() || this.dbQueryParameters.includeFullRecord) {
searchQuery.select({ asyncOperationId: `${asyncOperationsTable}.id` });
}

if (this.searchParent() || this.dbQueryParameters.includeFullRecord) {
searchQuery.select({ parentArn: `${executionsTable}_parent.arn` });
}

const countQuery = knex(this.tableName)
.count(`${this.tableName}.cumulus_id`);
.count('*');

if (this.searchCollection()) {
countQuery.innerJoin(collectionsTable, `${this.tableName}.collection_cumulus_id`, `${collectionsTable}.cumulus_id`);
Expand All @@ -91,7 +98,7 @@ export class ExecutionSearch extends BaseSearch {
if (this.searchAsync()) {
countQuery.innerJoin(asyncOperationsTable, `${this.tableName}.async_operation_cumulus_id`, `${asyncOperationsTable}.cumulus_id`);
searchQuery.innerJoin(asyncOperationsTable, `${this.tableName}.async_operation_cumulus_id`, `${asyncOperationsTable}.cumulus_id`);
} else {
} else if (this.dbQueryParameters.includeFullRecord) {
searchQuery.leftJoin(asyncOperationsTable, `${this.tableName}.async_operation_cumulus_id`, `${asyncOperationsTable}.cumulus_id`);
}

Expand Down Expand Up @@ -138,8 +145,8 @@ export class ExecutionSearch extends BaseSearch {
log.debug(`translatePostgresRecordsToApiRecords number of records ${pgRecords.length} `);
const apiRecords = pgRecords.map((executionRecord: ExecutionRecord) => {
const { collectionName, collectionVersion, asyncOperationId, parentArn } = executionRecord;
const collectionId = collectionName && collectionVersion ?
constructCollectionId(collectionName, collectionVersion) : undefined;
const collectionId = collectionName && collectionVersion
? constructCollectionId(collectionName, collectionVersion) : undefined;
const apiRecord = translatePostgresExecutionToApiExecutionWithoutDbQuery({
executionRecord,
collectionId,
Expand Down
7 changes: 6 additions & 1 deletion packages/db/src/search/GranuleSearch.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import { Knex } from 'knex';
import pick from 'lodash/pick';
import set from 'lodash/set';

import { ApiGranuleRecord } from '@cumulus/types/api/granules';
import Logger from '@cumulus/logger';
Expand Down Expand Up @@ -30,6 +31,10 @@ interface GranuleRecord extends BaseRecord, PostgresGranuleRecord {
*/
export class GranuleSearch extends BaseSearch {
constructor(event: QueryEvent) {
// estimate the table rowcount by default
if (event?.queryStringParameters?.estimateTableRowCount !== 'false') {
set(event, 'queryStringParameters.estimateTableRowCount', 'true');
}
super(event, 'granule');
}

Expand All @@ -50,7 +55,7 @@ export class GranuleSearch extends BaseSearch {
pdrs: pdrsTable,
} = TableNames;
const countQuery = knex(this.tableName)
.count(`${this.tableName}.cumulus_id`);
.count('*');

const searchQuery = knex(this.tableName)
.select(`${this.tableName}.*`)
Expand Down
Loading