Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tsi): optimize series iteration #20544

Merged
merged 1 commit into from
Jan 25, 2021

Conversation

lesam
Copy link
Contributor

@lesam lesam commented Jan 19, 2021

When using queries like 'select count(_seriesKey) from bigmeasurement`, we
should iterate over the tsi structures to serve the query instead of loading
all the series into memory up front.

Closes #20543

Describe your proposed changes here.

seriesOpt := opt
if len(opt.Dimensions) == 0 {
// no point ordering the series if we are just aggregating them
seriesOpt.Ordered = false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure on this particular check - maybe needs to be more granular depending on the type of call iterator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there might be some call iterators that need sorting.

@@ -2588,6 +2649,18 @@ func (e *Engine) createVarRefIterator(ctx context.Context, measurement string, o
return nil, nil
}

// check for optimized series iteration for tsi index
if e.index.Type() == tsdb.TSI1IndexName {
Copy link
Contributor Author

@lesam lesam Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a unit test but I'm not sure there's actually a way to hit it from an influxql query without opt.Ordered=true, so might need to remove this section.

@lesam
Copy link
Contributor Author

lesam commented Jan 19, 2021

Note we also have the _series System Iterator for show series queries - this also requires everything to be slurped into memory per-measurement.

@lesam lesam force-pushed the series-iteration-optimization branch from b4bc184 to 2174d54 Compare January 19, 2021 17:10
@lesam
Copy link
Contributor Author

lesam commented Jan 19, 2021

Now with gofmt run

@lesam
Copy link
Contributor Author

lesam commented Jan 21, 2021

We're going to do this in flux instead of influxql, so this can be closed.

@lesam lesam closed this Jan 21, 2021
@lesam lesam reopened this Jan 21, 2021
@lesam
Copy link
Contributor Author

lesam commented Jan 21, 2021

@benbjohnson We're back to planning on doing this in influxql, so this PR is back on the menu.

benbjohnson
benbjohnson previously approved these changes Jan 22, 2021
Copy link
Contributor

@benbjohnson benbjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think the code makes sense. I agree that disabling sorting when there is a group by may not always be safe. Otherwise 👍

seriesOpt := opt
if len(opt.Dimensions) == 0 {
// no point ordering the series if we are just aggregating them
seriesOpt.Ordered = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there might be some call iterators that need sorting.

When using queries like 'select count(_seriesKey) from bigmeasurement`, we
should iterate over the tsi structures to serve the query instead of loading
all the series into memory up front.

Closes influxdata#20543
@lesam lesam force-pushed the series-iteration-optimization branch from ab9c1f6 to 98a76a1 Compare January 25, 2021 19:45
@lesam lesam requested a review from davidby-influx January 25, 2021 19:49
Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May not require changes, but just a verification of the questions I ask below: should a cursor that failed to open be closed, and should iterators that failed to open not be closed.

tsdb/engine/tsm1/engine.go Show resolved Hide resolved
tsdb/engine/tsm1/engine.go Show resolved Hide resolved
tsdb/engine/tsm1/engine.go Show resolved Hide resolved
Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for assuaging my doubts.

@lesam lesam merged commit d28bcb8 into influxdata:master-1.x Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants