Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ptstorage, *: deprecate storage.ex with txn-bound internal executor #91404

Closed
wants to merge 11 commits into from

Conversation

ZhouXing19
Copy link
Collaborator

@ZhouXing19 ZhouXing19 commented Nov 7, 2022

This is part of the effort to reduce the usage of free-floating internal executors and use those bound to the outer txn.

Most of the changes in this PR are mechanical. The one that may interest you is the last commit that passes the outer txn
from the internal executor to the conn executor's planner. Without it, it will fail when creating a job while committing the txn.

Informs: #91004

@cockroach-teamcity
Copy link
Member

This change is Reviewable

So that we can create an internal executor bound to the txn.

Release note: None
So that we can create an internal executor bound to the txn.

Release note: None
So that we can create an internal executor bound to the txn.

Release note: None
….GetState()

Internal executor should always bound to the txn. They should be created
and passed to functions together.

Release note: None
…etRecord()

This is part of the effort to reduce free floating internal executor, and use
those bound to the outer txn.

Release note: None
…UpdateTimestamp()

This is part of the effort to reduce free floating internal executor, and use
those bound to the outer txn.

Release note: None
…rkVerified()

IE means sqlutil.InternalExecutor.

This is part of the effort to reduce free floating internal executor, and use
those bound to the outer txn.

Release note: None
@ZhouXing19 ZhouXing19 changed the title storage, job: migrate usages of internal executor ptstorage, *: deprecate storage.ex with txn-bound internal executor Nov 15, 2022
@ZhouXing19 ZhouXing19 marked this pull request as ready for review November 15, 2022 18:14
@ZhouXing19 ZhouXing19 requested a review from a team November 15, 2022 18:14
@ZhouXing19 ZhouXing19 requested review from a team as code owners November 15, 2022 18:14
@ZhouXing19 ZhouXing19 requested a review from a team November 15, 2022 18:14
@ZhouXing19 ZhouXing19 requested review from a team as code owners November 15, 2022 18:14
@ZhouXing19 ZhouXing19 requested review from herkolategan, srosenberg, ajwerner, a team, smg260, renatolabs and rafiss and removed request for a team, herkolategan and srosenberg November 15, 2022 18:14
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's some serious plumbing. The fatalist part of me is inclined to accept it, because it moves on a goal, and, hey we can always clean things up later. The question I feel required to ask is whether this is going to make such future cleanup all the more worse.
This is a change which I understand the motivation for, but is causing me anxiety because of the sprawl it is forced to contend with. At its core, the cruft feels like it's coming from a failure to separate dependencies from argument in interfaces forcing dependency injection to flow through many layers.

I wonder if we might been better off re-thinking the API which took a *kv.Txn to make them indirect through an object to hold the transaction or what not.

For example, if today we have (*jobs.Job).Update imagine if we had:

package jobs

func (j *Job) WithTxn(txn *kv.Txn, ie sqlutil.InternalExecutor) Updater {
   return Updater{j: j, txn: txn, ie: ie}
}

func (j *Job) NoTxn() Updater {
   return Updater{j: j}
}

func (u Updater) Update(fn UpdateFn) error {
   // ...
}

Then we could hang a whole lot of other methods off this structure too. I think the same pattern could be applied elsewhere. Consider the protected timestamp stuff. What if that had layers such that there was an interface which just took arguments and then mechanisms to construct those interfaces either with no txn, meaning the methods would create their own transactions, or with the txn and ie.


separately, when this is going to merge, squash the commits.

Reviewed 6 of 6 files at r1, 3 of 3 files at r2, 4 of 4 files at r3, 12 of 12 files at r4, 5 of 5 files at r5, 27 of 27 files at r6, 7 of 7 files at r7, 105 of 105 files at r8, 19 of 23 files at r9.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy, @rafiss, @renatolabs, @rhu713, @smg260, and @ZhouXing19)


pkg/ccl/changefeedccl/alter_changefeed_stmt.go line 100 at r8 (raw file):

		var job *jobs.Job
		if err := p.WithInternalExecutor(ctx, func(ctx context.Context, txn *kv.Txn, ie sqlutil.InternalExecutor) error {
			job, err = p.ExecCfg().JobRegistry.LoadJobWithTxn(ctx, jobID, p.Txn(), ie)

I feel confused by a lot of this. Why are we using p.WithInternalExecutor if the JobRegistry has these dependencies already? I'm starting to form a view that ambiguity about whether at Txn/IE combo is allowed to be nil is a problem. If it is allowed to be nil, why are we created a txn here? If it's not then that's its own problem.


pkg/jobs/scheduled_job_executor.go line 30 at r8 (raw file):

	// Executes scheduled job;  Implementation may use provided transaction.
	// Modifications to the ScheduledJob object will be persisted.
	ExecuteJob(ctx context.Context, cfg *scheduledjobs.JobExecutionConfig, env scheduledjobs.JobSchedulerEnv, schedule *ScheduledJob, txn *kv.Txn, ie sqlutil.InternalExecutor) error

nit: rewrap this


pkg/jobs/scheduled_job_executor.go line 68 at r8 (raw file):

	// SCHEDULE` query. OnDrop may drop the schedule's dependent schedules and will
	// return the number of additional schedules it drops.
	OnDrop(ctx context.Context, scheduleControllerEnv scheduledjobs.ScheduleControllerEnv, env scheduledjobs.JobSchedulerEnv, schedule *ScheduledJob, txn *kv.Txn, descsCol *descs.Collection, ie sqlutil.InternalExecutor) (int, error)

nit: rewrap this


pkg/sql/job_exec_context.go line 121 at r8 (raw file):

		ctx context.Context,
		run func(ctx context.Context, txn *kv.Txn, ie sqlutil.InternalExecutor) error,
	) error

if we have an ExecCfg, why this?

Code quote:

	WithInternalExecutor(
		ctx context.Context,
		run func(ctx context.Context, txn *kv.Txn, ie sqlutil.InternalExecutor) error,
	) error

pkg/sql/planhook.go line 131 at r6 (raw file):

	BufferClientNotice(ctx context.Context, notice pgnotice.Notice)
	Txn() *kv.Txn
	WithInternalExecutor(

do we really need this given we have the internal executor factory already in the ExecCfg?

Code quote:

		ctx context.Context,
		run func(ctx context.Context, txn *kv.Txn, ie sqlutil.InternalExecutor) error,
	) error

pkg/sql/schemachanger/scdeps/exec_deps.go line 48 at r8 (raw file):

	MakeJobID() jobspb.JobID
	CreateJobWithTxn(ctx context.Context, record jobs.Record, jobID jobspb.JobID, txn *kv.Txn) (*jobs.Job, error)
	UpdateJobWithTxn(ctx context.Context, jobID jobspb.JobID, txn *kv.Txn, ie sqlutil.InternalExecutor, useReadLock bool, updateFunc jobs.UpdateFn) error

nit: wrap this

@ZhouXing19
Copy link
Collaborator Author

I wonder if we might been better off re-thinking the API which took a *kv.Txn to make them indirect through an object to hold the transaction or what not.

I agree -- passing the ie along the txn through layers is torturing :)
I tried modified structs that might match what you mentioned:

// SQLTranslator is the concrete implementation of spanconfig.SQLTranslator.
type SQLTranslator struct {
	ptsProvider protectedts.Provider
	codec       keys.SQLCodec
	knobs       *spanconfig.TestingKnobs
	txnBundle txnBundle
}

// txnBundle is created to emphasize that the SQL translator is correspond to
// a certain txn, and all fields here are a whole. It essentially keeps the
// semantics of “translate at a snapshot in time”. This means that this
// txnBundle should be written only in `NewTranslator`.
type txnBundle struct {
	txn      *kv.Txn
	descsCol *descs.Collection
	ie sqlutil.InternalExecutor
}

Do you think this is what you would prefer?

@ajwerner
Copy link
Contributor

That commit itself is fine, but I don't really understand the exported getters. I think the next step would then be to give the protected timestamp interfaces a similar treatment such that when interacting with protected timestamps you don't think about which txn you're using either.

@ZhouXing19
Copy link
Collaborator Author

Is this roughly close to what you were thinking? ZhouXing19@4de105e#diff-25c1841c99ae0297b18c357c13b913e454b2cafb70cc7fa557326dc11a1ff892R49

@ajwerner
Copy link
Contributor

Not quite what I had in mind. I was not expecting there to be any setters or any exported logic that is aware of whether anything was set.

@ajwerner
Copy link
Contributor

I guess it is like what I had in mind in that you removed the Txn argument from GetMetadata

@ajwerner
Copy link
Contributor

Imagine if the Txn argument was removed from all of the methods in that interface.

@rafiss
Copy link
Collaborator

rafiss commented Nov 15, 2022

I wonder where we should limit the scope of this refactor / updating internalExecutor usages. The redesign being proposed here seems like something that would be better suited for the owners of the protectedts code. Part of the reason I was advocating for an earlier announcement of the new internal executor interfaces was so that the work to use the new interfaces and discover opportunities to improve interfaces would come to the teams that own the various parts of the code that use the internal executor.

It does not seem tenable to have Jane (or more generally, SQL engineers) make somewhat substantial refactors in any area of the code just because it so happens to touch the internal executor. I recall earlier when Jane pushed back against having to make the updates for all the internal executor usages because it would be a giant undertaking, the response was that it was a lot of changes, but all mechanical, and mechanical refactors are good. It seems to me that we're going further away from just mechanical refactors.

Concretely: I would like Jane's project to not continue to balloon in scope too much. I feel that a way to achieve that is to keep the refactoring here mechanical, and leave any deeper questions of how the protectedts package should plumb different objects up to the code owners. Thoughts?

@ajwerner
Copy link
Contributor

I feel that a way to achieve that is to keep the refactoring here mechanical, and leave any deeper questions of how the protectedts package should plumb different objects up to the code owners. Thoughts?

I buy the sentiment. It was where I started when I first saw the PR. I was going to write that you have my stamp. I have a harder time imagining that unnamed codeowners are going to come around and do anything unless there are names and deadlines.

What really got me to rethink here was that it feels to me like the mechanical refactors could be made smaller with some initial refactors elsewhere. These nominally mechanical changes do introduce new txn loops right-and-left and then require capturing things in closure scope that dilutes the intention of the code. It seems pretty clear how to rework these interfaces to avoid the fundamental flaws and if we don't do it, this codebase is reaching the point where we'll never do them.

I think that if we want to ease the burden, what we should do is create two tasks:

  1. Move the optional txn from the sprawling set of methods on the *jobs.Job object to an intermediate structure
  2. Remove the optional txn from the sprawing set of methods on the protectedts.Storage interface out to an intermediate structure

Then say that progress is blocked on that. I'm interested in helping with that. It's hard for me to find the time to do that in the next week.

I reference fatalism above because it's a feeling that I get when thinking about these various choices. I want to at least have some courage to ask us to understand more deeply why this refactor hurts and what other future refactor we might need to do will hurt just as much, or more because we're not attacking the root cause of the inflexibility. What's going on here is to couple, philosophically, sql transactional concepts with other parts of the sql subsystem like via the internal executor.

Having written the above statement, perhaps what we ought to do is to make a new struct which can hold these two objects. That might make this change more mechanical. I still don't feel like it's better than decomposing the dependencies from the arguments, but it'd be something.

@@ -264,7 +264,9 @@ func (p *storage) GetRecord(
return &r, nil
}

func (p *storage) MarkVerified(ctx context.Context, txn *kv.Txn, id uuid.UUID) error {
func (p *storage) MarkVerified(
ctx context.Context, txn *kv.Txn, executor sqlutil.InternalExecutor, id uuid.UUID,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is executor intended to replace p.ex a few lines below? Below (in Release) the same arg. is called ie.

As a person unfamiliar with this code, I find the new interface utterly confusing. Can p.ex be injected directly with the correct InternalExecutor across method boundaries as opposed to threading it? The fact that both executors are now in the same lexical scope makes it error-prone.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is executor intended to replace p.ex a few lines below?

Yes, and it replaced the p.ex in a later commit. Eventually, this PR removed all usages of p.ex, and removed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jane, I didn't realize I was looking at an intermediate commit. Feel free to ignore my follow-up comment, but I am still curious why we're threading the InternalExecutor dependency instead of embedding.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can p.ex be injected directly with the correct InternalExecutor across method boundaries as opposed to threading it?

I believe this is what Andrew also suggested here, so I replied below.

@ZhouXing19
Copy link
Collaborator Author

ZhouXing19 commented Nov 16, 2022

I think that if we want to ease the burden, what we should do is create two tasks:

  • Move the optional txn from the sprawling set of methods on the *jobs.Job object to an intermediate structure
  • Remove the optional txn from the sprawing set of methods on the protectedts.Storage interface out to an intermediate structure

Maybe we can just replace the txn in those methods' argument with sqlutil.InternalExecutor?

For the 2nd part, I think we can remove the txn from the arguments of those methods, as the txn is only used for internal executor query functions. What about we have a ie.GetTxn() as a (temporary) replacement in these query functions?

I.e. can we have

func (p *storage) UpdateTimestamp(
	ctx context.Context,  id uuid.UUID, timestamp hlc.Timestamp, ie sqlutil.InternalExecutor,
) error {
	row, err := ie.QueryRowEx(ctx, "protectedts-update", ie.GetTxn(),
		sessiondata.InternalExecutorOverride{User: username.NodeUserName()},
		updateTimestampQuery, id.GetBytesMut(), timestamp.WithSynthetic(false).AsOfSystemTime())
	if err != nil {
		return errors.Wrapf(err, "failed to update record %v", id)
	}
	if len(row) == 0 {
		return protectedts.ErrNotExists
	}
	return nil
}

For the 1st part I think it's the same, but a bit more implicit. I looked at all usages of Job.Update() and most of txn arguments are useless, or they got passed to methods in protectedts.Storage. So similarly, we can just have

func (j *Job) Update(ctx context.Context, ie sqlutil.InternalExecutor, updateFn UpdateFn) error {
	const useReadLock = false
	return j.update(ctx, ie, useReadLock, updateFn)
}

If we want to do this, it seems still pretty mechanical to me and I don't think intermediate structures are needed anymore.
But still, this means that we're still refactoring stuff through the usages of methods/functions. It just makes it look cleaner , but the number of changes, I assume, is pretty much the same.

@ZhouXing19
Copy link
Collaborator Author

Maybe we can just replace the txn in those methods' argument with sqlutil.InternalExecutor?

Continuing on this but on a higher level, do you think it makes sense that txn should always be encompassed by an internal executor, rather than being passed in parallel with an internal executor to methods?

@ZhouXing19
Copy link
Collaborator Author

ZhouXing19 commented Nov 16, 2022

I wonder if we might been better off re-thinking the API which took a *kv.Txn to make them indirect through an object to hold the transaction or what not.

Can p.ex be injected directly with the correct InternalExecutor across method boundaries as opposed to threading it?

Given that txn might not be needed for Job.Update() and methods of protectedts.Storage, can we do this:

// Job manages logging the progress of long-running system processes, like
// backups and restores, to the system.jobs table.
type Job struct {
	// TODO(benesch): avoid giving Job a reference to Registry. This will likely
	// require inverting control: rather than having the worker call Created,
	// Started, etc., have Registry call a setupFn and a workFn as appropriate.
	registry *Registry
	ex *sqlutil.InternalExecutor

	id        jobspb.JobID
	createdBy *CreatedByInfo
	session   sqlliveness.Session
	mu        struct {
		syncutil.Mutex
		payload  jobspb.Payload
		progress jobspb.Progress
		status   Status
		runStats *RunStats
	}
}

func (j *Job) WithTxn(txnBoundInternalExecutor sqlutil.InternalExecutor, f func() error) error {
	j.ex = txnBoundInternalExecutor
	defer func() { j.ex = nil }()
	return f()
}

// storage interacts with the durable state of the protectedts subsystem.
type storage struct {
	settings *cluster.Settings

	knobs *protectedts.TestingKnobs
	ie sqlutil.InternalExecutor
}

func (s *storage) WithTxn(txnBoundInternalExecutor sqlutil.InternalExecutor, f func()) {
	s.ie = txnBoundInternalExecutor
	defer func() { s.ie = nil }()
	f()
}

So that the internal executor hangs off Job or storage only in a certain closure, to avoid getting misused under a txn that it's not bound to.

@dhartunian dhartunian removed the request for review from a team November 21, 2022 16:18
@ZhouXing19
Copy link
Collaborator Author

Per an offline convo with @ajwerner we've decided to pause on this work. We need refactoring of the internal executor interface and related functions in sqlutil. Essentially, there needs to be a better binding of *kv.Txn and sqlutil.InternalExecutor, and fewer functions with closures.
I'll go back to this PR and the remaining migration after the refactoring is done.

@ajwerner
Copy link
Contributor

I've been hacking on it in the background. It'll take a little more focused time, but I think the end state that's coming together is starting to feel good. Thanks for kicking off the conversation with this.

@ZhouXing19
Copy link
Collaborator Author

Closing this as it has been done by #93218.

@ZhouXing19 ZhouXing19 closed this Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants