Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ludicrous Mode #4872

Merged
merged 26 commits into from
Mar 9, 2020
Merged

Ludicrous Mode #4872

merged 26 commits into from
Mar 9, 2020

Conversation

animesh2049
Copy link
Contributor

@animesh2049 animesh2049 commented Feb 28, 2020

This change is Reviewable

@animesh2049 animesh2049 requested review from manishrjain, martinmr and a team as code owners February 28, 2020 07:50
worker/draft.go Outdated
if x.WorkerConfig.LudicrousMode {
txnTimeStamp := proposal.Mutations.StartTs
maxTs := posting.Oracle().MaxAssigned()
n.commitOrAbort(proposal.Key, &pb.OracleDelta{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of n.commitOrAbort is not checked (from errcheck)

Copy link
Contributor

@harshil-goel harshil-goel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 9 files reviewed, 15 unresolved discussions (waiting on @animesh2049, @manishrjain, and @martinmr)


dgraph/cmd/alpha/run.go, line 192 at r7 (raw file):

	flag.Bool("graphql_introspection", true, "Set to false for no GraphQL schema introspection")
	flag.Bool("ludicrous-mode", false, "Run alpha in ludicrous mode")

Should we think of ways where users can turn it off and on without restarting the alpha?


dgraph/cmd/live/batch.go, line 240 at r7 (raw file):

func (l *loader) conflictKeysForNQuad(nq *api.NQuad) ([]uint64, error) {
	return nil, nil

Should only happen when in ludicrous mode.


dgraph/cmd/zero/raft.go, line 687 at r7 (raw file):

				}
				span.Annotatef(nil, "Saved to storage")
			}

Spin another thread to do it in the background?


edgraph/server.go, line 332 at r7 (raw file):

	// what are the cases in which a transaction could be aborted. Doesn it make sense
	// to abort txn in ludicrous mode ??

This comment looks wrong.
Can you write a comment explaining why we return in case of ludicrous mode instead?


edgraph/server.go, line 812 at r7 (raw file):

		} else {
			req.StartTs = posting.Oracle().MaxAssigned()
			if len(qc.req.GetMutations()) > 0 {

If it is inside isMutation, do we really need this?


edgraph/server.go, line 880 at r7 (raw file):

		assignTimestampStart := time.Now()
		if !x.WorkerConfig.LudicrousMode {
			qc.req.StartTs = worker.State.GetTimestamp(qc.req.ReadOnly)

Can you write a comment explaining what's happening here.


posting/writer.go, line 117 at r7 (raw file):

// Flush waits until all operations are done and all data is written to disk.
func (w *TxnWriter) Flush() error {
	if x.WorkerConfig.LudicrousMode {

Should we block it from the place Flush() is being called?


worker/draft.go, line 329 at r7 (raw file):

		}(start, end)
	}

Extra line.


worker/draft.go, line 356 at r7 (raw file):

			n.commitOrAbort(proposal.Key, &pb.OracleDelta{
				Txns: []*pb.TxnStatus{
					{StartTs: txnTimeStamp, CommitTs: maxTs},

Why maxTs? I thought we were doing Commit at StartTs (commitNow). Committing at maxTs might create unpredictable behavior. As, the user won't know when it would be committed and hence won't be understand the cases when there might be a data loss.
Is my understanding correct?


worker/draft.go, line 922 at r7 (raw file):

			n.SaveToStorage(&rd.HardState, rd.Entries, &rd.Snapshot)
			if !x.WorkerConfig.LudicrousMode {
				timer.Record("disk")

Spin a background thread to do this?


worker/draft.go, line 1166 at r7 (raw file):

func (n *node) blockingAbort(req *pb.TxnTimestamps) error {
	if x.WorkerConfig.LudicrousMode {
		return nil

This isn't required.


worker/draft.go, line 1211 at r7 (raw file):

// abort. Note that only the leader runs this function.
func (n *node) abortOldTransactions() {
	if x.WorkerConfig.LudicrousMode {

This isn't required.


worker/task.go, line 801 at r7 (raw file):

	span.Annotatef(nil, "Waiting for startTs: %d at node: %d, gid: %d",
		q.ReadTs, groups().Node.Id, gid)

Extra line.


worker/task.go, line 805 at r7 (raw file):

		return &pb.Result{}, err
	}

Extra line.

Copy link

@arijitAD arijitAD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 9 files reviewed, 19 unresolved discussions (waiting on @animesh2049, @manishrjain, and @martinmr)


dgraph/cmd/live/batch.go, line 240 at r7 (raw file):

func (l *loader) conflictKeysForNQuad(nq *api.NQuad) ([]uint64, error) {
	return nil, nil

Add ludicrous mode flag in live loader as well


dgraph/cmd/zero/raft.go, line 680 at r7 (raw file):

			if !x.WorkerConfig.LudicrousMode {
				timer.Record("disk")
				if rd.MustSync {

We should periodically call sync or maybe sync in background?
Also, is the ludicrous mode flag exposed for zero?


edgraph/server.go, line 805 at r7 (raw file):

	// For mutations, we update the startTs if necessary.
	var startTs uint64
	if isMutation && req.StartTs == 0 {

Is this needed? isMutation is false when doing a query. This value would be already set in processQuery() because startTs would be zero.


posting/writer.go, line 116 at r7 (raw file):

// Flush waits until all operations are done and all data is written to disk.
func (w *TxnWriter) Flush() error {

Should we flush the data periodically? I think this might be the reason we were hitting OOM. Since the alpha badger data stays in-memory.

Copy link
Contributor Author

@animesh2049 animesh2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @harshil-goel from 5 discussions.
Reviewable status: 0 of 7 files reviewed, 14 unresolved discussions (waiting on @animesh2049, @manishrjain, and @martinmr)


worker/draft.go, line 329 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Extra line.

Done.


worker/draft.go, line 1166 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

This isn't required.

Done.


worker/draft.go, line 1211 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

This isn't required.

Done.


worker/task.go, line 801 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Extra line.

Done.


worker/task.go, line 805 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Extra line.

Done.

Copy link
Contributor Author

@animesh2049 animesh2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @harshil-goel from a discussion.
Reviewable status: 0 of 7 files reviewed, 13 unresolved discussions (waiting on @animesh2049, @manishrjain, and @martinmr)


worker/draft.go, line 356 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Why maxTs? I thought we were doing Commit at StartTs (commitNow). Committing at maxTs might create unpredictable behavior. As, the user won't know when it would be committed and hence won't be understand the cases when there might be a data loss.
Is my understanding correct?

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 17 unresolved discussions (waiting on @animesh2049, @arijitAD, @harshil-goel, @manishrjain, and @martinmr)


dgraph/cmd/alpha/run.go, line 192 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Should we think of ways where users can turn it off and on without restarting the alpha?

ludicrous_mode with underscore. "Run Alpha in .... Uses eventual consistency for better performance. Adds other optimization hacks."


dgraph/cmd/live/batch.go, line 240 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Should only happen when in ludicrous mode.

Don't think we need to do any changes in live loader.


dgraph/cmd/zero/raft.go, line 680 at r7 (raw file):

Previously, arijitAD (Arijit Das) wrote…

We should periodically call sync or maybe sync in background?
Also, is the ludicrous mode flag exposed for zero?

Yeah. We need a background goroutine. Add a TODO then.


dgraph/cmd/zero/raft.go, line 678 at r9 (raw file):

			}
			n.SaveToStorage(&rd.HardState, rd.Entries, &rd.Snapshot)
			if !x.WorkerConfig.LudicrousMode {

if !ludicrousmode && rd.MustSync { ... }


edgraph/server.go, line 327 at r9 (raw file):

	qc.span.Annotatef(nil, "Txn Context: %+v. Err=%v", resp.Txn, err)

	if x.WorkerConfig.LudicrousMode {

It should ensure that the client knows that the txn got committed. Otherwise, it would try to commit it.


edgraph/server.go, line 810 at r9 (raw file):

			startTs = worker.State.GetTimestamp(false)
			req.StartTs = startTs
		} else {

I'd try to avoid all this here, and do the timestamp logic within the functions (processQuery) and others.


edgraph/server.go, line 824 at r9 (raw file):

	if x.WorkerConfig.LudicrousMode {
		req.StartTs = startTs

req.StartTs = getnewts(1)

// TODO: We should also try to see if we can get a bunch of ts upfront and then use them.

Copy link
Contributor Author

@animesh2049 animesh2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @arijitAD from a discussion.
Reviewable status: 0 of 7 files reviewed, 16 unresolved discussions (waiting on @animesh2049, @arijitAD, @golangcibot, @harshil-goel, @manishrjain, and @martinmr)


dgraph/cmd/alpha/run.go, line 192 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

ludicrous_mode with underscore. "Run Alpha in .... Uses eventual consistency for better performance. Adds other optimization hacks."

Done.


dgraph/cmd/live/batch.go, line 240 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Don't think we need to do any changes in live loader.

Done.


dgraph/cmd/zero/raft.go, line 680 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Yeah. We need a background goroutine. Add a TODO then.

Done.


dgraph/cmd/zero/raft.go, line 687 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Spin another thread to do it in the background?

Done.


dgraph/cmd/zero/raft.go, line 678 at r9 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

if !ludicrousmode && rd.MustSync { ... }

Done.


edgraph/server.go, line 332 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

This comment looks wrong.
Can you write a comment explaining why we return in case of ludicrous mode instead?

Done.


edgraph/server.go, line 327 at r9 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

It should ensure that the client knows that the txn got committed. Otherwise, it would try to commit it.

Done.


edgraph/server.go, line 810 at r9 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

I'd try to avoid all this here, and do the timestamp logic within the functions (processQuery) and others.

I gave this some thought, but I can always see in which we will need some logic here. I am not able to think other neat way to do it.


posting/writer.go, line 116 at r7 (raw file):

Previously, arijitAD (Arijit Das) wrote…

Should we flush the data periodically? I think this might be the reason we were hitting OOM. Since the alpha badger data stays in-memory.

Done.


posting/writer.go, line 117 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Should we block it from the place Flush() is being called?

Done.


worker/draft.go, line 354 at r7 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

Error return value of n.commitOrAbort is not checked (from errcheck)

Done.


worker/draft.go, line 922 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Spin a background thread to do this?

Done.

Copy link
Contributor

@harshil-goel harshil-goel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 17 unresolved discussions (waiting on @animesh2049, @arijitAD, @golangcibot, @harshil-goel, @manishrjain, and @martinmr)


dgraph/cmd/alpha/run.go, line 581 at r10 (raw file):

		AbortOlderThan:      abortDur,
		StartTime:           startTime,
		LudicrousMode:       Alpha.Conf.GetBool("ludicrous-mode"),

ludicrous_mode


dgraph/cmd/zero/raft.go, line 687 at r10 (raw file):

			} else if x.WorkerConfig.LudicrousMode && rd.MustSync {
				go func() {
					_ = n.Store.Sync()

Don't ignore the error.


dgraph/cmd/zero/run.go, line 98 at r10 (raw file):

	flag.String("datadog.collector", "", "Send opencensus traces to Datadog. As of now, the trace"+
		" exporter does not support annotation logs and would discard them.")
	flag.Bool("ludicrous_mode", false, "Run alpha in ludicrous mode")

Run zero


dgraph/cmd/zero/run.go, line 175 at r10 (raw file):

	x.WorkerConfig = x.WorkerOptions{
		LudicrousMode: Zero.Conf.GetBool("ludicrous_mode"),

Maybe just make it a constant? It's bound to lead to mistakes.


worker/draft.go, line 567 at r10 (raw file):

	if x.WorkerConfig.LudicrousMode {
		go func() {
			writer.Flush()

Log the error.


worker/draft.go, line 939 at r10 (raw file):

			} else if x.WorkerConfig.LudicrousMode && rd.MustSync {
				go func() {
					_ = n.Store.Sync()

Log the error.

Copy link
Contributor Author

@animesh2049 animesh2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 17 unresolved discussions (waiting on @animesh2049, @arijitAD, @golangcibot, @harshil-goel, @manishrjain, and @martinmr)


dgraph/cmd/alpha/run.go, line 581 at r10 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

ludicrous_mode

Done.


dgraph/cmd/zero/raft.go, line 687 at r10 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Don't ignore the error.

Done.


dgraph/cmd/zero/run.go, line 98 at r10 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Run zero

Done.


worker/draft.go, line 567 at r10 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Log the error.

Done.


worker/draft.go, line 939 at r10 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Log the error.

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 8 files at r1, 1 of 3 files at r2, 1 of 2 files at r9, 4 of 4 files at r11, 1 of 1 files at r12.
Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @animesh2049, @arijitAD, @golangcibot, @harshil-goel, @manishrjain, and @martinmr)


dgraph/cmd/zero/raft.go, line 684 at r11 (raw file):

				}
				timer.Record("sync")
				span.Annotatef(nil, "Saved to storage")

This should be moved out.


dgraph/cmd/zero/raft.go, line 687 at r11 (raw file):

			} else if x.WorkerConfig.LudicrousMode && rd.MustSync {
				go func() {
					if err := n.Store.Sync(); err != nil {

No. Do it periodically in the background with one goroutine -- with a 1s ticker. It can call sync on both the p and w.


edgraph/server.go, line 825 at r11 (raw file):

	}

	if x.WorkerConfig.LudicrousMode {

might not need this.


worker/draft.go, line 351 at r12 (raw file):

		}
		if x.WorkerConfig.LudicrousMode {
			txnTimeStamp := proposal.Mutations.StartTs

ts :=


worker/draft.go, line 941 at r12 (raw file):

			} else if x.WorkerConfig.LudicrousMode && rd.MustSync {
				go func() {
					if err := n.Store.Sync(); err != nil {

Nope.


worker/draft.go, line 982 at r12 (raw file):

						if x.WorkerConfig.LudicrousMode {
							// Assuming that there will be no error
							// while proposing.

Move this to one liner.

Copy link
Contributor Author

@animesh2049 animesh2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @harshil-goel, and @harshil-goel from 7 discussions.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @animesh2049, @harshil-goel, @manishrjain, and @martinmr)

Copy link
Contributor Author

@animesh2049 animesh2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 7 files reviewed, 14 unresolved discussions (waiting on @animesh2049, @arijitAD, @harshil-goel, @manishrjain, and @martinmr)


dgraph/cmd/zero/raft.go, line 684 at r11 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This should be moved out.

Done.


dgraph/cmd/zero/raft.go, line 687 at r11 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

No. Do it periodically in the background with one goroutine -- with a 1s ticker. It can call sync on both the p and w.

Done.


dgraph/cmd/zero/run.go, line 175 at r10 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Maybe just make it a constant? It's bound to lead to mistakes.

Not sure about this. Leaving it as it is for now.


edgraph/server.go, line 805 at r7 (raw file):

Previously, arijitAD (Arijit Das) wrote…

Is this needed? isMutation is false when doing a query. This value would be already set in processQuery() because startTs would be zero.

Done.


edgraph/server.go, line 812 at r7 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

If it is inside isMutation, do we really need this?

Done.


edgraph/server.go, line 810 at r9 (raw file):

Previously, animesh2049 (Animesh Chandra Pathak) wrote…

I gave this some thought, but I can always see in which we will need some logic here. I am not able to think other neat way to do it.

Done.


edgraph/server.go, line 824 at r9 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

req.StartTs = getnewts(1)

// TODO: We should also try to see if we can get a bunch of ts upfront and then use them.

Done.


edgraph/server.go, line 825 at r11 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

might not need this.

Done.


worker/draft.go, line 351 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

ts :=

Done.


worker/draft.go, line 941 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Nope.

Done.


worker/draft.go, line 982 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Move this to one liner.

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I'll take over.

Reviewed 2 of 2 files at r13.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @animesh2049, @arijitAD, @harshil-goel, @manishrjain, and @martinmr)


dgraph/cmd/zero/raft.go, line 620 at r13 (raw file):

func (n *node) StoreSync(closer *y.Closer) {
	closer.AddRunning(1)

Move all the adds on sync.Waitgroup or closer, etc. outside of the goroutine.


dgraph/cmd/zero/raft.go, line 655 at r13 (raw file):

	go n.checkQuorum(closer)
	go n.RunReadIndexLoop(closer, readStateCh)
	if x.WorkerConfig.LudicrousMode {

closer.Add(1) here.


worker/draft.go, line 567 at r13 (raw file):

	if x.WorkerConfig.LudicrousMode {
		go func() {
			if err := writer.Flush(); err != nil {

Might create a writer.Wait(), which doesn't call db.Sync. And do the db.Sync in a goroutine.


worker/draft.go, line 578 at r13 (raw file):

	g := groups()
	if delta.GroupChecksums != nil {

Also check if it has a greater than zero checksum.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few changes.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @animesh2049, @arijitAD, @harshil-goel, @manishrjain, and @martinmr)


dgraph/cmd/zero/raft.go, line 619 at r13 (raw file):

}

func (n *node) StoreSync(closer *y.Closer) {

This is repeated code. The common func can just be written once and reused.


worker/draft.go, line 794 at r13 (raw file):

}

func (n *node) StoreSync(closer *y.Closer) {

A func which takes a DB and closer.

SyncPeriodically(db *badger.DB, closer *y.Closer) Then you can call it on n.Store and pstore.

Copy link
Contributor Author

@animesh2049 animesh2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 17 unresolved discussions (waiting on @animesh2049, @arijitAD, @harshil-goel, @manishrjain, and @martinmr)


dgraph/cmd/zero/raft.go, line 619 at r13 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This is repeated code. The common func can just be written once and reused.

Done.


dgraph/cmd/zero/raft.go, line 620 at r13 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Move all the adds on sync.Waitgroup or closer, etc. outside of the goroutine.

Done.


dgraph/cmd/zero/raft.go, line 655 at r13 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

closer.Add(1) here.

Done.


worker/draft.go, line 567 at r13 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Might create a writer.Wait(), which doesn't call db.Sync. And do the db.Sync in a goroutine.

Done.


worker/draft.go, line 578 at r13 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Also check if it has a greater than zero checksum.

Done.


worker/draft.go, line 794 at r13 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

A func which takes a DB and closer.

SyncPeriodically(db *badger.DB, closer *y.Closer) Then you can call it on n.Store and pstore.

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Very nice! Good to merge.

Reviewed 4 of 4 files at r14.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @animesh2049, @arijitAD, @harshil-goel, @manishrjain, and @martinmr)

@animesh2049 animesh2049 merged commit 139ece3 into master Mar 9, 2020
@joshua-goldstein joshua-goldstein deleted the animesh2049/lud-mode-2 branch August 11, 2022 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants