Add `reminder` service with empty sendReminders logic #2638

Vyom-Yadav · 2024-03-13T20:20:47Z

Summary

Provide a brief overview of the changes and the issue being addressed.
Explain the rationale and any background necessary for understanding the changes.
List dependencies required by this change, if any.

Issue #2262

Does not fix the issue, addresses:

Continuously update the evaluation status for entities against registered rules.

Does not address:

Continuously evaluate the state of the webhook created by the minder on GitHub. (As the user may accidentally delete it)

Change Type

Mark the type of change your PR introduces:

Bug fix (resolves an issue without affecting existing features)
Feature (adds new functionality without breaking changes)
Breaking change (may impact existing functionalities or require documentation updates)
Documentation (updates or additions to documentation)
Refactoring or test improvements (no bug fixes or new functionality)

Testing

Outline how the changes were tested, including steps to reproduce and any relevant configurations.
Attach screenshots if helpful.

Review Checklist:

Reviewed my own code for quality and clarity.
Added comments to complex or tricky code sections.
Updated any affected documentation.
Included tests that validate the fix or feature.
Checked that related changes are merged.

coveralls · 2024-03-13T20:25:36Z

coverage: 49.765% (+0.07%) from 49.695%
when pulling dfe85c1 on Vyom-Yadav:empty-reminder-service
into 17412a2 on stacklok:main.

evankanderson

Thanks for breaking this up, and sorry it took me awhile to get around to the review.

I'm trying to figure out how we can make Minder and Reminder as similar as possible, so that it's easier for people in one context to get into the other as well.

evankanderson · 2024-03-25T12:44:18Z

cmd/reminder/app/root.go

+		os.Exit(1)
+	}
+
+	err = reminderconfig.ValidateConfig(cfgFileData)


I have a general preference for having reasonable defaults rather than throwing an error (modulo the above "null keys when we need a value" check. It seems like we have two different "check config" endpoints here -- can we consolidate one into the other?

(Ideally, the code for initConfig would look similar for minder, server and reminder...)

modulo the above "null keys when we need a value" check

The null values check solves a different purpose. It is there to prevent corruption of viper key value store. It is there in all other cli's too.

can we consolidate one into the other?

I would rather not. The validation portion has a different method signature than the null value one, and doing something to workaround would break the pattern across CLIs rather than giving any form of uniformity.

Ideally, the code for initConfig would look similar for minder, server and reminder...)

It is almost the same, except reminder does validation.

evankanderson · 2024-03-25T12:48:30Z

config/reminder-config.yaml.example

+  sql_connection:
+    dbhost: "reminder-event-postgres"
+    dbport: 5432
+    dbuser: postgres
+    dbpass: postgres
+    dbname: reminder
+    sslmode: disable


Oh, this is interesting... by using events here, we're requiring the using of the watermill-SQL driver in Minder for reminder to work (the default is go-channel). That's probably okay, but I hadn't put it together as a limitation of this approach until now.

Nope, it's not like that. Minder message queue system would be untouched. We will just add a new subscriber to the server for reminder events. Minder can continue to run using go-channel. Minder would connect to reminder using separate config.

I'm confused as to what this event configuration is for if it's not being used to trigger events in the minder server. Is this a separate SQL-only eventing system stand-alone for reminder?

Yes, it is a standalone publishing system for reminder that minder would subscribe to.

Does this mean that we'll have two different messaging SQL databases, or was your thought to have this end up landing on the same database we use for watermill messaging in Minder today?

Hmm, interesting.

We could theoretically run multiple reminder servers which would interoperate with each other.

I can see this happening, but I don't have the design pattern for it, and it isn't required now, so I won't brainstorm on that side.

The idea would be that we could run multiple servers for high availability. I agree that it's not necessary for right now, as long as we make sure that it's not strictly required to only run one Reminder for correctness.

(You good with storing in memory for now (cache) ?)

Yes.

Also, do we want to modify type RepoReconcilerEvent struct to have a source string field for visibility and helping to differentiate whether the event was generated by reminder or minder. (For topic: TopicQueueReconcileRepoInit)

Rather than putting it in the type RepoReconcilerEvent struct, can you put something like periodic=true or trigger=revisit in the message metadata?

Ideally, we'd have a good split between routing metadata (message subject, event type, etc) and the message contents (diff of the change, for example). Having the metadata in the outer envelope allows us to generically do things like 2-lane queues (handle revisits at lower priority than live notifications) and record success rates for revisits separately. The generic top-level handling layer should not know about the payload schema -- that's for the actual endpoint to decode.

Adding cursors to the db would look something like:

CREATE TABLE reminder_project_cursor ( id SERIAL PRIMARY KEY, -- Some sort of ID to differentiate between different reminders cursor TEXT NOT NULL ); CREATE TABLE reminder_repository_cursor ( id SERIAL PRIMARY KEY, -- Some sort of ID to differentiate between different reminders project_id UUID NOT NULL, provider TEXT NOT NULL, cursor TEXT NOT NULL, FOREIGN KEY (project_id, provider) REFERENCES providers(project_id, name) ON DELETE CASCADE, UNIQUE (project_id, provider) );

Using this approach every reminder service would have some unique identifier associated with it. I had a few concerns:

Are we simplifying by storing in db or complicating it more than required? (With in-memory cache and all that stuff)

Do we even need to run multiple reminder instances? You told me a lot of other things would break before we scale.

If yes, wouldn't it be simpler to have a reminder service operate on shard? Or set a limit somehow? (Taking a wild guess here)

Are we over-engineering this 😬 ?

Do we need fairness at both the project and the repo level? My naive thought would be to iterate across all the entities -- projects which had more repos registered would get more "revisits / day" than smaller projects, but that seems appropriate. We probably want to limit how many concurrent reconciliations are happening in a project, but that might make more sense to do at the Minder level, since we could have a project that genuinely gets changes to 50 repos at the same time.

It's hard to guarantee fairness until we know what fairness looks like. 😁

In terms of running multiple reminder instances, my goal is that we should be able to upgrade Reminder as follows:

Reminder v.123 is running one instance

We start a Reminder v.124

We shut down Reminder v.123

This is easy to do with a Kubernetes Deployment with maxSurge: 1, and avoids getting into a situation where v.124 doesn't work, but we've corrupted state needed for v.123 to operate, so we end up needing to do manual maintenance. This is also the model we use with the core Minder binary, so it's less surprising than having a different model. As long as we can use that manner of deployment, I'm happy with many different solutions:

a. Iterate over the database several times faster than the reminder interval, but honor some pacing on actually sending the events. In this case, we might not need to keep a cursor at all.
b. Add a database table to track the Reminder cursors.
c. Use "continuation messages" in Watermill to describe what we just processed. e.g. at the end of processing up to row N, send a "now process >N" message, then acknowledge the "now process >N-10" message that was in queue earlier. If there's no message, start at a random point.

Do we need fairness at both the project and the repo level? My naive thought would be to iterate across all the entities -- projects which had more repos registered would get more "revisits / day" than smaller projects, but that seems appropriate. We probably want to limit how many concurrent reconciliations are happening in a project, but that might make more sense to do at the Minder level, since we could have a project that genuinely gets changes to 50 repos at the same time.

It's hard to guarantee fairness until we know what fairness looks like. 😁

Exactly. My main idea behind this was to ensure fair sharing among projects. If we reconcile large number of entities in a single project, we might hate the rate limits (unprobable, but possible). Current model (which is yet to be pushed) will work on the logic of picking min number of repos form a project (to ensure some fair share), and picking additional repos if there are more spots.

Let's discuss cursors in a discord thread. I also would like reminder to run as a deployment rather than a stateful set.

Discord thread: https://discord.com/channels/1184987096302239844/1185287949240242258/1232246213005283348

docker/reminder/Dockerfile

evankanderson · 2024-03-25T12:57:57Z

internal/config/reminder/config.go

+	if cfg.RecurrenceConfig.BatchSize <
+		cfg.RecurrenceConfig.MaxPerProject*cfg.RecurrenceConfig.MinProjectFetchLimit {
+		return fmt.Errorf("batch_size %d cannot be less than max_per_project(%d)*min_project_fetch_limit(%d)=%d",


Rather than erroring, I have a slight preference for printing an Error-level message and continuing with a larger BatchSize.

(Also, it feels like we should add these to RecurrenceConfig.Validate, since they deal with only those fields.)

internal/config/reminder/config.go

internal/reminder/reminder.go

evankanderson · 2024-03-26T00:01:17Z

internal/reminder/reminder.go

+	r.logger.Debug().Msg("storing cursor state")
+
+	var buf bytes.Buffer
+	enc := gob.NewEncoder(&buf)


Why use gob rather than simply JSON?

I read in a few testing results that gob is better for large data. So I chose it.

How large is this data going to be? I see that it's stored in a map, so I'm wondering how many keys there will be in the map.

If it's just the two keys, then performance really doesn't matter. If the file is going to be e.g. 100MB, we need to talk more generally about storing state. 😁

It is directly proportional to the number of projects being processed in an iteration, which is configurable, so it can get as large as you'd want

I based it on: https://github.com/RSheremeta/gob-serialization/blob/master/Analysis.md

Why do we need to keep track of all the projects processed in an iteration? It should be fine to re-run a small number of projects, so we could keep just the first (or last) item in the batch.

Binary formats are a pain for humans to work with (we need to build extra tools), so I'd prefer JSON or YAML if possible so we can clean up intermediate state in an emergency with simple tools.

Why do we need to keep track of all the projects processed in an iteration? It should be fine to re-run a small number of projects, so we could keep just the first (or last) item in the batch.

We have project cursor and repo cursor for every project we traverse. The repo cursor list grows as we traverse more projects because we only pick small no of repos from a project to have sort of fair share type algo and avoid hitting rate limits for that project.

We have to keep track of all repo cursors as when we return to the same project in subsequent loops, we want to pick the repos that haven't been picked yet.

Binary formats are a pain for humans to work with (we need to build extra tools), so I'd prefer JSON or YAML if possible so we can clean up intermediate state in an emergency with simple tools.

I just had some performance concerns (though they may not matter that much), I'll change to json or yaml.

Why do we need to keep track of all the projects processed in an iteration? It should be fine to re-run a small number of projects, so we could keep just the first (or last) item in the batch.

We have project cursor and repo cursor for every project we traverse. The repo cursor list grows as we traverse more projects because we only pick small no of repos from a project to have sort of fair share type algo and avoid hitting rate limits for that project.

If we were to traverse the lists in order of repo ID and rely on Minder to limit reconciliations during contention, would that serve the same result? I'm worried that we only have a partial view of what's going on in terms of quota usage here, so I'd rather have Reminder act as a reliable clock but not get too fancy in terms of trying to push back because Minder will still need to deal with quota exhaustion in other cases. (We had one about a week ago due to poor cache behavior, for example.)

If we were to traverse the lists in order of repo ID and rely on Minder to limit reconciliations during contention, would that serve the same result?

Well, yes, technically, we can achieve the same results by just pushing all repos for reconciliation, and minder actually creates a batch of what is going to be reconciled. But wasn't the original plan to have this as a separate microservice? I would just like minder to process the events and give errors if it can't due to rate limits. In such a case, those entities would be picked in the next complete cycle.

evankanderson · 2024-03-26T00:02:03Z

internal/reminder/reminder.go

+		return err
+	}
+
+	return os.WriteFile(r.cfg.CursorFile, buf.Bytes(), 0600)


Do you need to make sure that storeCursorState isn't called concurrently?

It isn't called concurrently. Firstly reminder is initialized (NewReminder) and Start is called in a separate goroutine, so it shouldn't be required.

Can you add a comment to the method that it's not safe to call this method from multiple threads?

Done. But only three methods are exposed (start, stop, new), so it shouldn't be called from multiple threads unless someone decides to call Start multiple times in different go routines. Should we prohibit that?

I didn't see storeCursorState being called at all, so I had to guess about where and how the call chain to it happened.

evankanderson · 2024-03-26T00:03:03Z

internal/reminder/reminder.go

+	})
+}
+
+func (r *reminder) storeCursorState() error {


I don't see where this is called. Should it be called in the loop from Start?

No, this is not being used right now. This would be used by batch-building methods. restoreCursorState() is used and present, so it made sense to add this (for completeness and testing)

internal/reminder/reminder.go

evankanderson

Just a couple comments, this is looking like a good skeleton to build from.

.mk/build.mk

internal/reminder/reminder.go

evankanderson · 2024-03-27T09:52:35Z

internal/reminder/reminder.go

+		r.logger.Error().Err(err).Msg("error restoring cursor state")
+	}
+
+	pub, cl, err := r.setupSQLPublisher(context.Background())


I still think you want to pass in ctx here.

Suggested change

pub, cl, err := r.setupSQLPublisher(context.Background())

pub, cl, err := r.setupSQLPublisher(ctx)

evankanderson · 2024-03-27T09:54:43Z

internal/reminder/reminder.go

+	r.logger.Debug().Msg("storing cursor state")
+
+	var buf bytes.Buffer
+	enc := gob.NewEncoder(&buf)


If it's just the two keys, then performance really doesn't matter. If the file is going to be e.g. 100MB, we need to talk more generally about storing state. 😁

evankanderson · 2024-03-27T09:56:15Z

internal/reminder/reminder.go

+		return err
+	}
+
+	return os.WriteFile(r.cfg.CursorFile, buf.Bytes(), 0600)


Can you add a comment to the method that it's not safe to call this method from multiple threads?

Vyom-Yadav · 2024-04-09T13:13:48Z

@evankanderson ping, in-case you missed the review req.

evankanderson

Sorry about the delay -- I was chaperoning a solo-parent-with-two-kids trip last week, and we've been crunching a bit in preparation for Open Source Summit. My ability to review will be a bit spotty for the next 3-ish days due to the same, and then I should be able to give this some more full-time attention.

evankanderson · 2024-04-05T18:51:42Z

cmd/reminder/app/root.go

+
+func init() {
+	cobra.OnInitialize(initConfig)
+	reminderconfig.SetViperDefaults(viper.GetViper())


Why is this in init here, but in initConfig in server/app/root.go?

This won't cause any difference in functionality. It is in init as when we RegisterReminderFlags, we actually lookup the deafult from viper which populates it using struct fields. In server cli code, the default values for flags is hardcoded, so this is actually better.

Can you file an issue to move server to this pattern? (Don't do it in this PR, but we should track these cleanup opportunities)

Created #3145

evankanderson · 2024-04-05T18:52:58Z

cmd/reminder/app/root.go

+		RootCmd.PrintErrln("Error registering reminder flags")
+		os.Exit(1)


It looks like we use log.Fatal().Err(err).Msg("...") instead of RootCmd.PrintErrln("..."); os.Exit(1) in server/app/root.go. Can we use the same pattern here (for consistency)?

I did it, but the server cli uses is just at a place or two, in the rest of the places it uses os.Exit. We specifically set:

RootCmd.SetOut(os.Stdout) RootCmd.SetErr(os.Stderr)

So, it introduces some inconsistency if we want to write logs to a file, but I changed it to use zerolog.

evankanderson · 2024-04-05T18:53:45Z

cmd/reminder/app/root.go

+		os.Exit(1)
+	}
+
+	RootCmd.PersistentFlags().String("config", "", fmt.Sprintf("config file (default is $PWD/%s)", configFileName))


Can we put this flag declaration right after OnInitialize(initConfig) (again, just for consistency, not because it will change behavior)?

evankanderson · 2024-04-15T13:25:07Z

cmd/reminder/app/start.go

+	err = cfg.Validate()
+	if err != nil {
+		var batchSizeErr *reminderconfig.InvalidBatchSizeError
+		if errors.As(err, &batchSizeErr) {
+			// Update Batch Size in viper store
+			updateBatchSize(cmd, batchSizeErr)
+
+			// Complete config is read again to update the batch size.
+			cfg, err = config.ReadConfigFromViper[reminderconfig.Config](viper.GetViper())
+			if err != nil {
+				return fmt.Errorf("unable to read config: %w", err)
+			}
+		} else {
+			return fmt.Errorf("invalid config: %w", err)
+		}
+	}


If we do this as an accessor in config.GetBatchSize(), then we might be able to avoid needing cfg.Validate() at all, right?

(I'm asking because I worry about this block which is outside of test coverage growing over time... app/serve.go is also bigger than I'd want.)

I modified the logic to be out of the CLI (so we can test it). Should be fine now, ptal :)

evankanderson · 2024-04-15T13:27:00Z

internal/config/utils.go

+	// Try to parse it as a time.Duration
+	var parseErr error
+	defaultValue, parseErr = time.ParseDuration(value)
+	if parseErr == nil {
+		return defaultValue, nil
+	}


This is because we lose the type aliasing at compile-time, right?

It's worth commenting, because it technically also allows us to write noncePeriod = 1h in config/server, which means something very different (it would be 1 billion hours). This seems like an acceptable risk, since this is only setting defaults.

We might also want to figure out how to migrate noncePeriod, but that's a different discussion, and definitely doesn't belong in this PR.

This is because we lose the type aliasing at compile-time, right?

Yes, we cannot get user defined type using reflection.

evankanderson · 2024-04-15T13:34:21Z

internal/config/reminder/config.go

+// SetViperDefaults sets the default values for the configuration to be picked up by viper
+func SetViperDefaults(v *viper.Viper) {
+	v.SetEnvPrefix("reminder")
+	v.SetEnvKeyReplacer(strings.NewReplacer(".", "_"))


We replace - with _ in the server/config.go version of this -- any harm in doing the same here?

evankanderson · 2024-04-15T13:45:11Z

internal/config/reminder/recurrence.go

+// InvalidBatchSizeError is a custom error type for the case when batch_size is less than
+// max_per_project * min_project_fetch_limit
+type InvalidBatchSizeError struct {
+	BatchSize            int
+	MaxPerProject        int
+	MinProjectFetchLimit int
+}
+
+func (e *InvalidBatchSizeError) Error() string {
+	return fmt.Sprintf("batch_size %d cannot be less than max_per_project(%d)*min_project_fetch_limit(%d)=%d",
+		e.BatchSize, e.MaxPerProject, e.MinProjectFetchLimit, e.MaxPerProject*e.MinProjectFetchLimit)
+}
+
+// Validate checks that the recurrence config is valid
+func (r RecurrenceConfig) Validate() error {
+	if r.MinElapsed < 0 {
+		return fmt.Errorf("min_elapsed %s cannot be negative", r.MinElapsed)
+	}
+
+	if r.Interval < 0 {
+		return fmt.Errorf("interval %s cannot be negative", r.Interval)
+	}
+
+	if r.BatchSize < r.MaxPerProject*r.MinProjectFetchLimit {
+		return &InvalidBatchSizeError{
+			BatchSize:            r.BatchSize,
+			MaxPerProject:        r.MaxPerProject,
+			MinProjectFetchLimit: r.MinProjectFetchLimit,
+		}
+	}
+
+	return nil
+}


I think you could also do this as:

Suggested change

// InvalidBatchSizeError is a custom error type for the case when batch_size is less than

// max_per_project * min_project_fetch_limit

type InvalidBatchSizeError struct {

BatchSize int

MaxPerProject int

MinProjectFetchLimit int

}

func (e *InvalidBatchSizeError) Error() string {

return fmt.Sprintf("batch_size %d cannot be less than max_per_project(%d)*min_project_fetch_limit(%d)=%d",

e.BatchSize, e.MaxPerProject, e.MinProjectFetchLimit, e.MaxPerProject*e.MinProjectFetchLimit)

}

// Validate checks that the recurrence config is valid

func (r RecurrenceConfig) Validate() error {

if r.MinElapsed < 0 {

return fmt.Errorf("min_elapsed %s cannot be negative", r.MinElapsed)

}

if r.Interval < 0 {

return fmt.Errorf("interval %s cannot be negative", r.Interval)

}

if r.BatchSize < r.MaxPerProject*r.MinProjectFetchLimit {

return &InvalidBatchSizeError{

BatchSize: r.BatchSize,

MaxPerProject: r.MaxPerProject,

MinProjectFetchLimit: r.MinProjectFetchLimit,

}

}

return nil

}

func (r RecurrenceConfig) GetInterval() time.Duration {

// Values this low should only be used for testing...

minInterval = 1 * time.Minute

if r.Interval < minInterval {

return minInterval

}

return r.Interval

}

func (r RecurrenceConfig) GetBatchSize() int {

if r.BatchSize < r.MaxPerProject*r.MinProjectFetchLimit {

return r.MaxPerProject*r.MinProjectFetchLimit

}

return r.BatchSize

}

func (r RecurrencConfig) GetMinElapsed() time.Duration {

minElapsed = 1.5 * time.Minute

if r.MinElapsed < minElapsed {

return minElapsed

}

return r.MinElapsed

}

And then you could avoid needing specific Validate and error-handling code elsewhere.

I feel like this would be too much patching. If the time specified is < 0, it's very fair to give an error. If we patch every user mistake, bad configs might creep in git repos, build scripts, etc. An error would force the user to fix it. wdyt?

I'm willing to respect that. We'll go with validation here.

evankanderson · 2024-04-15T13:58:47Z

internal/reminder/reminder.go

+func (r *reminder) Stop() {
+	if r.ticker != nil {
+		defer r.ticker.Stop()
+	}
+	r.stopOnce.Do(func() {
+		close(r.stop)
+		r.eventDBCloser()
+	})
+}


So, a couple things you can do:

You can return a cancel function from Start. This lets you ensure that the caller can't get a handle on the function before Start is called.

You can use something like a condition variable or a mutex to protect the fields managed by Stop.

evankanderson · 2024-04-15T16:42:45Z

internal/reminder/reminder.go

+	r.logger.Debug().Msg("storing cursor state")
+
+	var buf bytes.Buffer
+	enc := gob.NewEncoder(&buf)


Why do we need to keep track of all the projects processed in an iteration? It should be fine to re-run a small number of projects, so we could keep just the first (or last) item in the batch.

Binary formats are a pain for humans to work with (we need to build extra tools), so I'd prefer JSON or YAML if possible so we can clean up intermediate state in an emergency with simple tools.

evankanderson · 2024-04-15T16:44:28Z

internal/reminder/reminder.go

+		return err
+	}
+
+	return os.WriteFile(r.cfg.CursorFile, buf.Bytes(), 0600)


I didn't see storeCursorState being called at all, so I had to guess about where and how the call chain to it happened.

Signed-off-by: Vyom-Yadav <jackhammervyom@gmail.com>

evankanderson

I'm willing to merge this and revisit the cursor storage in a subsequent PR if that helps -- I do want to get some progress on this, but I'm hoping that we can simplify the design and amount of state that Reminder is carrying, since it doesn't have direct visibility into things like "how much API quota has this app installation used in the last hour".

With that said, my biggest stand-firm is on making sure that it's simple and reliable to roll forward and back Reminder while it's running, so I'd strongly prefer using a Kubernetes Deployment over a StatefulSet.

evankanderson · 2024-04-22T17:28:11Z

config/reminder-config.yaml.example

+  sql_connection:
+    dbhost: "reminder-event-postgres"
+    dbport: 5432
+    dbuser: postgres
+    dbpass: postgres
+    dbname: reminder
+    sslmode: disable


Hmm, interesting.

We could theoretically run multiple reminder servers which would interoperate with each other.

I can see this happening, but I don't have the design pattern for it, and it isn't required now, so I won't brainstorm on that side.

The idea would be that we could run multiple servers for high availability. I agree that it's not necessary for right now, as long as we make sure that it's not strictly required to only run one Reminder for correctness.

(You good with storing in memory for now (cache) ?)

Yes.

evankanderson · 2024-04-22T17:31:35Z

config/reminder-config.yaml.example

+  sql_connection:
+    dbhost: "reminder-event-postgres"
+    dbport: 5432
+    dbuser: postgres
+    dbpass: postgres
+    dbname: reminder
+    sslmode: disable


Also, do we want to modify type RepoReconcilerEvent struct to have a source string field for visibility and helping to differentiate whether the event was generated by reminder or minder. (For topic: TopicQueueReconcileRepoInit)

Rather than putting it in the type RepoReconcilerEvent struct, can you put something like periodic=true or trigger=revisit in the message metadata?

Ideally, we'd have a good split between routing metadata (message subject, event type, etc) and the message contents (diff of the change, for example). Having the metadata in the outer envelope allows us to generically do things like 2-lane queues (handle revisits at lower priority than live notifications) and record success rates for revisits separately. The generic top-level handling layer should not know about the payload schema -- that's for the actual endpoint to decode.

evankanderson · 2024-04-22T17:39:12Z

config/reminder-config.yaml.example

+  sql_connection:
+    dbhost: "reminder-event-postgres"
+    dbport: 5432
+    dbuser: postgres
+    dbpass: postgres
+    dbname: reminder
+    sslmode: disable


Adding cursors to the db would look something like:

CREATE TABLE reminder_project_cursor ( id SERIAL PRIMARY KEY, -- Some sort of ID to differentiate between different reminders cursor TEXT NOT NULL ); CREATE TABLE reminder_repository_cursor ( id SERIAL PRIMARY KEY, -- Some sort of ID to differentiate between different reminders project_id UUID NOT NULL, provider TEXT NOT NULL, cursor TEXT NOT NULL, FOREIGN KEY (project_id, provider) REFERENCES providers(project_id, name) ON DELETE CASCADE, UNIQUE (project_id, provider) );

Using this approach every reminder service would have some unique identifier associated with it. I had a few concerns:

Are we simplifying by storing in db or complicating it more than required? (With in-memory cache and all that stuff)

Do we even need to run multiple reminder instances? You told me a lot of other things would break before we scale.

If yes, wouldn't it be simpler to have a reminder service operate on shard? Or set a limit somehow? (Taking a wild guess here)

Are we over-engineering this 😬 ?

Do we need fairness at both the project and the repo level? My naive thought would be to iterate across all the entities -- projects which had more repos registered would get more "revisits / day" than smaller projects, but that seems appropriate. We probably want to limit how many concurrent reconciliations are happening in a project, but that might make more sense to do at the Minder level, since we could have a project that genuinely gets changes to 50 repos at the same time.

It's hard to guarantee fairness until we know what fairness looks like. 😁

In terms of running multiple reminder instances, my goal is that we should be able to upgrade Reminder as follows:

Reminder v.123 is running one instance

We start a Reminder v.124

We shut down Reminder v.123

This is easy to do with a Kubernetes Deployment with maxSurge: 1, and avoids getting into a situation where v.124 doesn't work, but we've corrupted state needed for v.123 to operate, so we end up needing to do manual maintenance. This is also the model we use with the core Minder binary, so it's less surprising than having a different model. As long as we can use that manner of deployment, I'm happy with many different solutions:

a. Iterate over the database several times faster than the reminder interval, but honor some pacing on actually sending the events. In this case, we might not need to keep a cursor at all.
b. Add a database table to track the Reminder cursors.
c. Use "continuation messages" in Watermill to describe what we just processed. e.g. at the end of processing up to row N, send a "now process >N" message, then acknowledge the "now process >N-10" message that was in queue earlier. If there's no message, start at a random point.

evankanderson · 2024-04-22T17:50:22Z

internal/reminder/reminder.go

+	r.logger.Debug().Msg("storing cursor state")
+
+	var buf bytes.Buffer
+	enc := gob.NewEncoder(&buf)


Why do we need to keep track of all the projects processed in an iteration? It should be fine to re-run a small number of projects, so we could keep just the first (or last) item in the batch.

We have project cursor and repo cursor for every project we traverse. The repo cursor list grows as we traverse more projects because we only pick small no of repos from a project to have sort of fair share type algo and avoid hitting rate limits for that project.

If we were to traverse the lists in order of repo ID and rely on Minder to limit reconciliations during contention, would that serve the same result? I'm worried that we only have a partial view of what's going on in terms of quota usage here, so I'd rather have Reminder act as a reliable clock but not get too fancy in terms of trying to push back because Minder will still need to deal with quota exhaustion in other cases. (We had one about a week ago due to poor cache behavior, for example.)

Vyom-Yadav · 2024-04-23T06:39:05Z

@evankanderson Ready when you are. We can discuss enhancements and cursors in a discord thread for a separate PR.

Vyom-Yadav requested a review from a team as a code owner March 13, 2024 20:20

Vyom-Yadav mentioned this pull request Mar 13, 2024

Add Background Reconciliation for Entities (Reminder) #2475

Closed

10 tasks

Vyom-Yadav force-pushed the empty-reminder-service branch from e6945fb to 435be78 Compare March 13, 2024 20:27

Vyom-Yadav force-pushed the empty-reminder-service branch 2 times, most recently from 5e05a0f to 6571395 Compare March 23, 2024 16:29

evankanderson reviewed Mar 26, 2024

View reviewed changes

Vyom-Yadav force-pushed the empty-reminder-service branch 7 times, most recently from 43c00cf to c6ac1db Compare March 27, 2024 09:32

Vyom-Yadav requested a review from evankanderson March 27, 2024 09:38

evankanderson reviewed Mar 27, 2024

View reviewed changes

Vyom-Yadav force-pushed the empty-reminder-service branch from c6ac1db to 009a191 Compare March 27, 2024 10:12

Vyom-Yadav mentioned this pull request Mar 28, 2024

Revert default config #2822

Merged

10 tasks

Vyom-Yadav mentioned this pull request Apr 4, 2024

Refactor: Move SetViperStructDefaults to utils #2920

Merged

10 tasks

Vyom-Yadav force-pushed the empty-reminder-service branch 2 times, most recently from d95eba8 to 43b9177 Compare April 4, 2024 18:59

Vyom-Yadav requested a review from evankanderson April 4, 2024 19:06

evankanderson reviewed Apr 15, 2024

View reviewed changes

Vyom-Yadav force-pushed the empty-reminder-service branch 2 times, most recently from f6d9810 to f73e03f Compare April 17, 2024 19:55

Add reminder service with empty sendReminders logic

588571f

Signed-off-by: Vyom-Yadav <jackhammervyom@gmail.com>

Vyom-Yadav force-pushed the empty-reminder-service branch from f73e03f to 588571f Compare April 18, 2024 06:46

evankanderson approved these changes Apr 22, 2024

View reviewed changes

Merge branch 'main' into empty-reminder-service

dfe85c1

evankanderson merged commit 388da06 into mindersec:main Apr 23, 2024
20 checks passed

	pub, cl, err := r.setupSQLPublisher(context.Background())
	pub, cl, err := r.setupSQLPublisher(ctx)

		RootCmd.PrintErrln("Error registering reminder flags")
		os.Exit(1)

Add reminder service with empty sendReminders logic #2638

Add reminder service with empty sendReminders logic #2638

Conversation

Vyom-Yadav commented Mar 13, 2024 • edited Loading

Summary

Change Type

Testing

Review Checklist:

coveralls commented Mar 13, 2024 • edited Loading

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Vyom-Yadav commented Apr 9, 2024

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Vyom-Yadav commented Apr 23, 2024

Add `reminder` service with empty sendReminders logic #2638

Add `reminder` service with empty sendReminders logic #2638

Vyom-Yadav commented Mar 13, 2024 •

edited

Loading

coveralls commented Mar 13, 2024 •

edited

Loading