Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/rowflow: excessive memory allocation in setup flow #42770

Closed
ajwerner opened this issue Nov 26, 2019 · 6 comments
Closed

sql/rowflow: excessive memory allocation in setup flow #42770

ajwerner opened this issue Nov 26, 2019 · 6 comments

Comments

@ajwerner
Copy link
Contributor

What is your situation?

Looking at a heap profile while running KV95, I observe that 16.12% of allocated space is due to rowflow.(*rowBasedFlow).Setup. There are few delinquent things here calls.

Screenshot from 2019-11-25 22-13-18

  1. The first is that we seem to fail to release the underlying tableReader in flow.Cleanup.

The tableReaders are allocated from a sync.Pool and should be released with Release:

// Release releases this tableReader back to the pool.

The flowinfra.FlowBase generally releases its underlying processors in Cleanup:

for _, p := range f.processors {

The problem is that we remove the head processor from f.processors in FlowBase.Run:

f.processors = f.processors[:len(f.processors)-1]

Unfortunately it's not clear that the head processor can be freed with the other processors. The following patch didn't seem to work:

index 88e86df63e..02fe9c8746 100644
--- a/pkg/sql/flowinfra/flow.go
+++ b/pkg/sql/flowinfra/flow.go
@@ -284,8 +284,9 @@ func (f *FlowBase) GetLocalProcessors() []execinfra.LocalProcessor {
 // listen for a cancellation on the same context.
 func (f *FlowBase) startInternal(ctx context.Context, doneFn func()) (context.Context, error) {
        f.doneFn = doneFn
+       processors := f.processors[1:]
        log.VEventf(
-               ctx, 1, "starting (%d processors, %d startables)", len(f.processors), len(f.startables),
+               ctx, 1, "starting (%d processors, %d startables)", len(processors), len(f.startables),
        )

        ctx, f.ctxCancel = contextutil.WithCancel(ctx)
@@ -315,14 +316,14 @@ func (f *FlowBase) startInternal(ctx context.Context, doneFn func()) (context.Co
        for _, s := range f.startables {
                s.Start(ctx, &f.waitGroup, f.ctxCancel)
        }
-       for i := 0; i < len(f.processors); i++ {
+       for i := 0; i < len(processors); i++ {
                f.waitGroup.Add(1)
                go func(i int) {
-                       f.processors[i].Run(ctx)
+                       processors[i].Run(ctx)
                        f.waitGroup.Done()
                }(i)
        }
-       f.startedGoroutines = len(f.startables) > 0 || len(f.processors) > 0 || !f.IsLocal()
+       f.startedGoroutines = len(f.startables) > 0 || len(processors) > 0 || !f.IsLocal()
        return ctx, nil
 }

@@ -360,7 +361,6 @@ func (f *FlowBase) Run(ctx context.Context, doneFn func()) error {
                return errors.AssertionFailedf("no processors in flow")
        }
        headProc = f.processors[len(f.processors)-1]
-       f.processors = f.processors[:len(f.processors)-1]

        var err error
        if ctx, err = f.startInternal(ctx, doneFn); err != nil {
  1. We always construct a new ImmutableTableDescriptor. These things are meant to be cached. I'm not sure how best to plumb a cache (or which one).

  2. We could pool the rowBasedFlow objects (right?).

Maybe there's something I'm missing about life cycles which make this pooling hard. Either way, it's the most obvious offender in a memory profile and it seems like mostly low hanging fruit.

Screenshot from 2019-11-25 22-33-51

@ajwerner
Copy link
Contributor Author

@yuzefovich any thoughts?

@jordanlewis
Copy link
Member

We disabled some of the pooling because it was broken, but forgot to re-enable it. We should get back to this. You're not wrong that the lifecycles are kind of tricky - that's why we had the bugs.

@yuzefovich
Copy link
Member

We had some issues with correctly releasing the specs (see #38952), but I think eventually we didn't merge anything that was touching the releasing of the specs. (I quickly took a peek at release-19.2 branch as it was in June and confirmed that the logic that headProc was never Released was already there.) The problems Jordan referred to were probably caused by the fallback logic from vectorized to row-based flow, and I'm not sure why the diff Andrew pasted didn't work. I'll dig in.

@yuzefovich
Copy link
Member

Oh, I see - the diff is slightly messed: headProc is confusingly not the first but the last processor, and I think the local copy of f.processors needs to be made in a different place.

@yuzefovich
Copy link
Member

I addressed points 1 and 3 in #42809 (hopefully, the build is green). I'm not sure about point 2 though.

craig bot pushed a commit that referenced this issue Nov 27, 2019
42809: sql: pool some allocations during flow setup r=yuzefovich a=yuzefovich

**flowinfra: slightly tweak the setup of processors in the flow**

Previously, the last processor ('headProc') would be removed from
f.processors before flow.startInternal to be run in the same goroutine
as the one that is doing the setup. However, this was problematic
because if that processor implements 'Releasable' interface, it will not
be returned to the pool on the flow clean up. Now this is fixed.

Addresses: #42770.

Release note: None

**sql: pool flow allocations**

sql: pool flow allocations

Previously, new structs for rowBasedFlow and vectorizedFlow would be
allocated upon creation. This commit creates pools for both of them.
flowinfra.Releasable interface is moved into execinfra package because
now components from rowflow, rowexec, and colflow packages implement
that.

In order to actually be able to release the flow structs, I needed to
create separate Cleanup methods (which still share most of the logic)
which allows for removal of vectorized memory monitoring logic from
the shared FlowCtx.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@yuzefovich
Copy link
Member

Points 1 and 3 have been addressed, and I opened up #55960 for the second point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants