Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUMULUS-3642: postgres query adjustment #3731

Merged
merged 21 commits into from
Jul 31, 2024
Merged

Conversation

jennyhliu
Copy link
Contributor

@jennyhliu jennyhliu commented Jul 16, 2024

Summary: Summary of changes

Addresses CUMULUS-3642 Granules, collections, execution and stats performance test and query adjustment with large scale database

Changes

  • Adjusted queries to improve performance:
    • Used count(*) over count(id) to count rows
    • Estimated row count for large tables (granules and executions) by default for basic query
  • Updated stats summary to default to the last day
  • Updated ExecutionSearch to not include asyncOperationId by default
  • Fixed lint
  • Put back a few logic in master back to granules endpoints

PR Checklist

  • Update CHANGELOG
  • Unit tests
  • Ad-hoc testing - Deploy changes and test manually
  • Integration tests

@jennyhliu jennyhliu changed the base branch from master to feature/es-phase-1 July 16, 2024 20:17
@jennyhliu jennyhliu marked this pull request as ready for review July 29, 2024 17:33
@jennyhliu jennyhliu changed the title CUMULUS-3642: query adjustment CUMULUS-3642: postgres query adjustment Jul 29, 2024
now - 24 * 3600 * 1000
), 10);
params.timestamp__to = Number.parseInt(get(params, 'timestamp__to', now), 10);
const stats = new StatsSearch({ queryStringParameters: params }, 'granule');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so now, it gets the recent day of granule stats summary, can they still have it where it shows the summary of all time? or would that be too slow/time-consuming api-reponse-wise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aggregations (count, avg) take forever without limiting the time range

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, it wasn't returning for me when using the lp-db in most cases, thanks for the explanation 👍

* @returns whether an estimated row count should be returned
*/
protected shouldEstimateRowcount(countSql: string): boolean {
const isBasicQuery = (countSql === `select count(*) from "${this.tableName}"`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so itll do an estimate if its a basic query, without any where, from, range, etc. ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's for returning the table rowcount. With filters, we can't estimate the count and have to run the query.

@@ -72,8 +77,8 @@ class StatsSearch extends BaseSearch {
/**
* Formats the postgres records into an API stats/aggregate response
*
* @param {Record<string, Aggregate>} result - the postgres query results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whyd you remove the jsdoc stuff for StatsSearch? just wondering what the reasoning was

Copy link
Contributor Author

@jennyhliu jennyhliu Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cause eslint errors in vscode.

@@ -145,6 +147,7 @@ test('ExecutionSearch returns correct response for basic query', async (t) => {
test('ExecutionSearch supports page and limit params', async (t) => {
const { knex } = t.context;
let queryStringParameters = {
estimateTableRowCount: 'false',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so estimateTableRowCount is by default true unless its specifically passed in as false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For execution and granules, estimateTableRowCount is set to true by default, unless it's passed in as false. I set them to false in these tests, so that the accurate count is returned.

Copy link
Contributor

@Nnaga1 Nnaga1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering my questions, this looks great! Great job as usual 🚀 🎊

@@ -209,17 +211,17 @@ test('GET /stats/aggregate with an invalid access token returns an unauthorized
assertions.isInvalidAccessTokenResponse(t, response);
});

test('GET /stats returns correct response, defaulted to all', async (t) => {
test('GET /stats returns correct response, defaulted to the last day', async (t) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small nit, instead of the last day id say the last 24 hours, it makes it more clear its from the recent past versus some time frame, not really breaking though

@jennyhliu jennyhliu merged commit 867b325 into feature/es-phase-1 Jul 31, 2024
3 checks passed
@jennyhliu jennyhliu deleted the CUMULUS-3642 branch July 31, 2024 18:00
jennyhliu added a commit that referenced this pull request Aug 27, 2024
* CUMULUS-3692: Update granules List endpoints to query postgres for basic queries (#3637)

* CUMULUS-3692:Granule list endpoint for basic postgres query

* CUMULUS-3694: Update granules List endpoints to query postgres - filter by field value (#3656)

* CUMULUS-3692:Granule list endpoint for basic postgres query

* refactor

* refactor

* typing

* add changelog entry

* skip search_after

* skip searchafter unit tests

* add granule list test

* rename

* refactor

* build query parameters

* update comment

* add field-mapping

* update jsdoc

* use type over interface,add log

* update test description

* build term/terms

* buildDbQueryParameters

* add unit test no terms search

* add doc

* rename

* add unit test

* add fields test

* add more unit tests

* support error.Error search

* fix lint

* rename functions

* ignore files

* add convert query unit tests

* add all types

* add unit test for fieldmapping types fix timestamp

* update timestamp test

* add multiple term field test

* ignore execution in granule list record

* CUMULUS-3689: Update Stats/Summary and Stats/Aggregate endpoints to use psql (#3659)

* first commit on new branch

* CHANGELOG change

* small fix

* PR feedback

* adding jsdoc + fixing spelling/grammar

* CUMULUS-3693: Update granules List endpoints to query postgres - range (#3660)

* add range query support

* CUMULUS-3695 - Update Granules endpoint to handle SortFields (#3663)

* first committ

* CHANGELOG change

* fixing sortQueryMethod

* simplifying code

* PR feedback

* merge conflicts + improving code

* small jsdoc fix

* PR feedback

* PR feedback

* PR feedback

* fixing test

* PR feedback

* PR feedback

* CUMULUS-3696: Update granules List endpoints to query postgres - match (#3674)

* add methods to convert terms,not,exists

* CUMULUS-3641 - Update Collections LIST endpoint to query Postgres basic  (#3681)

* reopening PR

* PR feedback

* small test fix

* small PR feedbacks

* adding new tests from match queries

* PR feedback/formatting

* temporary reversion to list endpoint for reconreport tests

* reverting changes

* adding logging

* more logging

* more logging

* removing logging + commenting reconrep test temp

* commenting out failing createReconReport spec

* removing comment

* reverting changes to reconReport test

* reverting previous change

* adding ts-check

* PR feedback

* PR feedback

* adding in test

* PR feedback fix

* PR feedback

* CUMULUS-3699 - Update collection List endpoints to query postgres - includeStats (#3688)

* first commit

* CHANGELOG

* fixing small things

* changes + fixes

* PR feedback

* splitting queries separately

* PR feedback

* PR feedback

* PR feedback

* CUMULUS-3639: Add support to db/CollectionSearch to retrieve active collections (#3693)

* CUMULUS-3639:Add support to db/CollectionSearch to retrieve active collections

* add test active collections

* add ts-check

* update /collections/active unit test

* test snyk

* fix field mapping

* parallel search and fix urlPath

* add cumulus-lp stack

* add limit 1 to subquery

* CUMULUS-3239 - Update Executions LIST endpoint to query Postgres basic (#3684)

* first commit

* adding changes

* storing changes

* updating progress

* linting + small fixes

* small fix

* changing timestamp to string in tests

* fixing timestamp

* commenting out tests failing in CI but not locally

* saving changes

* collection support

* adding async_ops support

* changing endpoint + tests

* fixing test

* uncommenting tests + adding var

* commenting out tests failing in CI but not locally

* adding parentArn support + changing tests

* added parentArn support + fixing tests

* small endpoint test fix

* Pr feedback + code improvements

* small CHANGELOG fix

* PR feedback

* PR feedback + linting

* PR feedback

* PR feedback

* fixing test

* fixing execution tests after removing asyncCumulusOPId from mapping

* PR feedback

* removed includeFullRecord from search classes

* PR feedback

* PR feedback

* reverting change

* fix changelog

* Cumulus 3640/3242- Update granule non-LIST endpoints and other granule related es queries to query postgres (#3727)

* removing granules

* fixing lint

* fixing test

* small change

* adding back in some deleted things

* removing more

* lint fix

* removing tests

* skipping execution search-by-granules tests

* skipping execution tests

* removing tests

* more removing

* adding in deleted test

* removing more

* adding back in needed code

* removing ES_HOST, query, and index from bulk_ops

* fixing bulk_ops tests

* adding back in ELK stack refs

* changing reconreports test to skip/adding back in getGranulesByPayload

* PR feedback

* adding back in skipped tests

* CHANGELOG

* PR feedback

* PR feedback' .
;

* PR feedback + syntax check

* adding back sort to write-granules test

* CUMULUS-3240: Remove Elasticsearch dependency from executions endpoints (#3723)

* CUMULUS-3240:Remove ElasticSearch dependency from Executions endpoints

* update test-executions

* remove es dependencies for execution

* update changelog

* fix lint and warning

* address PR feedback

* remove esClient from createExecutionRecords

* CUMULUS-3642: postgres query adjustment (#3731)

* estimate table row count

* add more indexes

* use count *

* fix test

* fix test

* fix lint and add file index

* add more index

* fix lint

* fix lint update test

* execution asyncOperationId is optional

* defautl stats last day

* fix granule patchByGranuleId put logic back

* vaccum tables

* update changelog

* remove sql script to another pr

* update changelog

* update active collection query

* Revert "update active collection query"

This reverts commit 88024c2.

* CUMULUS-3238: Remove ElasticSearch dependency from Collection POST, PUT, and DEL endpoints (#3746)

* first commit

* CHANGELOG change

* PR feedback

* PR feedback

* CUMULUS-3792: Add db indexes to improve search performance (#3751)

* CUMULUS-3792:Add table indexes to improve search performance

* fix changelog

---------

Co-authored-by: Naga Nages <66387215+Nnaga1@users.noreply.github.com>
Co-authored-by: Nate Pauzenga <npauzenga@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants