Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update violates set semantics #2134

Closed
kim opened this issue Jan 17, 2025 · 11 comments · Fixed by #2156
Closed

Update violates set semantics #2134

kim opened this issue Jan 17, 2025 · 11 comments · Fixed by #2156
Assignees

Comments

@kim
Copy link
Contributor

kim commented Jan 17, 2025

Using the update method on a #[primary_key]'ed table inserts a new row even if the row is unchanged.
This prevents the database to restart from the commitlog:

internal error: failed to get or launch module host: Error inserting row ProductValue { elements: [U64(1), U64(1)] } during transaction 6 playback: TableError: Duplicate insertion of row RowPointer(r: 0, pi: 0, po: 0, so: 1) violates set semantics: Duplicate insertion of row RowPointer(r: 0, pi: 0, po: 0, so: 1) violates set semantics    

To reproduce, create a module like so:

use spacetimedb::{ReducerContext, Table};

#[spacetimedb::table(name = boeuf)]
pub struct Boeuf {
    #[primary_key]
    pub id: u64,
    pub n: u64,
}

#[spacetimedb::reducer]
pub fn add(ctx: &ReducerContext, id: u64, n: u64) {
    match ctx.db.boeuf().id().find(id) {
        Some(row) => ctx.db.boeuf().id().update(Boeuf { n, ..row }),
        None => ctx.db.boeuf().insert(Boeuf { id, n }),
    };
}
  • Publish the module, then call the add reducer twice with the same arguments.
  • Restart the server
  • Access the database, e.g. by running an SQL statement or calling the reducer again
  • Observe above mentioned error
@gefjon
Copy link
Contributor

gefjon commented Jan 17, 2025

Reduced repro to:

use spacetimedb::{ReducerContext, Table};

#[spacetimedb::table(name = boeuf)]
pub struct Boeuf {
    #[primary_key]
    pub id: u64,
    pub n: u64,
}

#[spacetimedb::reducer]
pub fn add(ctx: &ReducerContext, id: u64, n: u64) {
    ctx.db.boeuf().id().delete(id);
    ctx.db.boeuf().insert(Boeuf { id, n });
}
spacetime start &
spacetime publish kim-test
spacetime call kim-test add 1 1
spacetime call kim-test add 1 1
fg
^C
spacetime start &
spacetime sql kim-test 'select * from boeuf'

Unsurprising, since update is delete followed by insert, but still potentially interesting.

@Shubham8287
Copy link
Contributor

+1, this is failing replication cluster restart smoketest

@kim
Copy link
Contributor Author

kim commented Jan 20, 2025

I'm marking this P0.

@cloutiertyler
Copy link
Contributor

@kim could you post the result of your bisect here even if you don't believe it was correct?

@kim
Copy link
Contributor Author

kim commented Jan 21, 2025

f136670

@kim
Copy link
Contributor Author

kim commented Jan 22, 2025

The issue is not fully fixed, potentially due to special handling of connect/disconnect transactions.

imho, replaying should only warn, but never error on set semantic violations.

@kim kim reopened this Jan 22, 2025
@kim
Copy link
Contributor Author

kim commented Jan 22, 2025

Steps to reproduce:

  • run a standalone server
  • publish the quickstart-chat example
  • run the quickstart-chat example and exit it
    This will only create connect/disconnect transactions.
  • stop the server
  • remove the snapshot of the quickstart-chat replica
  • start the server

Observe:

2025-01-22T12:16:13.955109Z  INFO try_init: crates/core/src/db/relational_db.rs:311: [c2004ab29bcd85ef4bb8d9a5f974b40a5261ba4d5b568c5c940c592eaea38a67] DATABASE: durable_tx_offset is Some(3)    
2025-01-22T12:16:13.955199Z  INFO try_init: crates/core/src/db/relational_db.rs:462: [c2004ab29bcd85ef4bb8d9a5f974b40a5261ba4d5b568c5c940c592eaea38a67] DATABASE: no snapshot on disk    
2025-01-22T12:16:13.958685Z  INFO try_init: crates/core/src/db/relational_db.rs:1207: [c2004ab29bcd85ef4bb8d9a5f974b40a5261ba4d5b568c5c940c592eaea38a67] DATABASE: applying transaction history...    
2025-01-22T12:16:13.958800Z  WARN try_init: /tank/src/clockworklabs/spacetimedbprivate/public/crates/commitlog/src/commitlog.rs:649: commitlog offset index is not used at segment 0: Asked key is smaller than the first entry in the index    
2025-01-22T12:16:13.960646Z  INFO try_init: crates/core/src/db/relational_db.rs:1221: [c2004ab29bcd85ef4bb8d9a5f974b40a5261ba4d5b568c5c940c592eaea38a67] Loaded 100% (3/3)    
2025-01-22T12:16:13.960801Z ERROR try_init: crates/core/src/host/host_controller.rs:695: error=Error deleting row ProductValue { elements: [U256(87750066309256334258040029134888392372784772464136912353719315292725071624392), U128(Packed(334798706132543748226464558050192239161))] } during transaction 3 playback
2025-01-22T12:16:13.960865Z ERROR /tank/src/clockworklabs/spacetimedbprivate/public/crates/client-api/src/lib.rs:342: internal error: failed to get or launch module host: Error deleting row ProductValue { elements: [U256(87750066309256334258040029134888392372784772464136912353719315292725071624392), U128(Packed(334798706132543748226464558050192239161))] } during transaction 3 playback: Delete for non-existent row when replaying transaction   

The offending transaction 3 is another identity_disconnected call.

This is explicitly allowed because we cannot guarantee the pairing connect/disconnect transactionally.

Replaying must support this, either by special-casing connect/disconnect, or generally not erroring due to set semantic violations.
I would prefer the latter.

@gefjon
Copy link
Contributor

gefjon commented Jan 22, 2025

Replaying must support this, either by special-casing connect/disconnect, or generally not erroring due to set semantic violations. I would prefer the latter.

To be clear, what should the behavior be in the case of an insert of an already-present row in a commitlog? Should the second insert:

  1. Be processed, resulting in two copies of the row in the in-memory state?
  2. Be ignored, resulting in a single copy of the row in the in-memory state?

Should we similarly ignore deletes of not-present rows?

I will also point out that, due to our rejecting duplicate rows during commitlog replay, we identified and fixed an actual bug in the datastore which we might otherwise not have noticed. If removing that protection is the only or best option, so be it, but there is a cost to it.

@kim
Copy link
Contributor Author

kim commented Jan 22, 2025

due to our rejecting duplicate rows during commitlog replay, we identified and fixed an actual bug in the datastore

That’s nice, but it would have resulted in prolonged unavailability if it was caught in production. Since no automated testing caught it, this could very well have happened.

I would prefer the replay to be as lenient as possible, ignoring duplicate inserts (leaving the first row in memory) and deletes (the row is already gone).

I would also accept a solution which either special-cases connect/disconnect transactions, or revises the implementation and requirements to not require a potentially-duplicate disconnect.

@gefjon
Copy link
Contributor

gefjon commented Jan 23, 2025

An update on this: there was no duplicate insert, there was a correct and consistent commitlog containing insert/delete pairs as it should have. We were unable to replay due to a bug in the datastore, described in #2161 . Kim's assertion that it is safe to ignore a failed delete because "the row is already gone" did not hold in this case; the row would still be present after the failed delete. If we had been "as lenient as possible," we would have reconstructed a state that was objectively incorrect, in that it would contain rows which according to the commitlog should have been deleted. I believe quite strongly that bringing up a database in the wrong state in this way would be worse than erroring loudly.

@gefjon
Copy link
Contributor

gefjon commented Jan 24, 2025

Closed by #2161 and #2156 .

@gefjon gefjon closed this as completed Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants