Add MaybeTransaction for better writing speed when supported #584

Atreyagaurav · 2024-11-02T00:14:46Z

I agree to follow the project's code of conduct.
I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Fixes #582

lnicola · 2024-11-03T09:11:23Z

src/vector/transaction.rs

+
+impl Dataset {
+    pub fn maybe_start_transaction(&mut self) -> MaybeTransaction<'_> {
+        let force = 1;


We should add at some point that parameter for force, but if you want this for better performance, you absolutely don't want to enable it.

I think it only works for FileGDB, and it's implemented by making a file system-level copy of the whole dataset, and restoring it on rollback.

Yeah that won't help with performance, I think if we want transaction for rollback purposes vs want transaction for speed purposes, I think we should have different methods.

The MaybeTransaction doesn't have rollback method at all, that'd help at compile time to let people know they can't rollback. Since it could be working on the database directly without transaction as well.

That's why I was thinking maybe some other name would be better. The current interpretation of MaybeTransaction is it is maybe a transaction or not, hence it has to be committed on drop, and no rollback. To make the Transaction and non transaction database do same thing.

On that note I think we can add the force thing to normal Transaction or add another ForceTransaction that does filesystem copy ourselves when Transaction fails.

weiznich · 2024-11-13T08:16:35Z

src/vector/transaction.rs

+}
+
+impl Dataset {
+    pub fn maybe_start_transaction(&mut self) -> MaybeTransaction<'_> {


It might be reasonable to go with a callback based approach here instead of having that guard object. This should make it much easier to handle errors on committing.

That would give you the following function signature:

fn maybe_transaction<R,E: From<gdal::Error>>(&self, impl Fn(&Dataset) -> Result<R, E>) -> Result<R,E>

For the implementation of that pattern see diesels transaction method.

Sorry for late response. I made it quickly, just copying the mechanism that is being already used by the transaction. It gives uniformity to the transaction API.

lnicola · 2024-11-27T20:33:11Z

I can see why the callback approach is better: a commit method doesn't really make sense, because the transaction will get committed anyway (we could offer an API where rollback can fail and still commit, but that seems harder to understand), and we can't expose. On the other hand, the callback API is much harder to use:

// Nb.: untested, and I'm probably misremembering the API

let mut tx = ds.maybe_transaction()?;
let mut batch_len = 0; // slice::chunks would probably be even nicer
for row in df {
    write_feature(&mut tx, row)?;
    batch_len += 1;
    if batch_len == BATCH_SIZE {
        tx.commit()?; // just to check for errors
        tx = ds.maybe_transaction()?;
        batch_len = 0;
    }
}

if batch_len > 0 {
    tx.commit()?;
}

For me, at least, this feels pretty natural to write. With a callback, I think it would look like this:

// Nb.: even more wrong, probably

let mut it = df.into_iter().peekable();
while it.peek().is_some() {
    ds.maybe_transaction(|mut tx| {
        for row in it.take(BATCH_LIMIT) {
            write_feature(&mut tx, row)?;
        }
        Ok(())
    })?;
}

Which isn't that bad, but seems more tricky to write.

And if we use a callback approach here, we should probably change start_transaction for symmetry. @weiznich I'm pretty sure you've spent a lot of time on this API question, you might have more arguments in favor of the callback approach.

Then there's the question of naming, MaybeTransaction isn't that bad, but feels a bit unserious. I don't have a better alternative, though.

Atreyagaurav · 2024-11-27T20:54:43Z

I haven't thought that much into it. But I think, looking at your examples, the callback does make sense. Although, I don't know if there is going to be any problem with the borrow checker for that example. But making it take a closure and then calling that either on transaction or dataset seems like a good idea, but yeah it'll be harder to write than running things on a MaybeTransaction.

What do you think about this implementation?

@@ -177,6 +177,22 @@ impl Dataset {
         }
         Ok(Transaction::new(self))
     }
+
+    pub fn maybe_transaction(&mut self, func: impl Fn(&Dataset) -> Result<()>) -> Result<()> {
+	let force = 0;
+	let rv = unsafe { gdal_sys::GDALDatasetStartTransaction(self.c_dataset(), force) };
+	let res = func(self);
+        if rv == OGRErr::OGRERR_NONE {
+	    let rv = unsafe { gdal_sys::GDALDatasetCommitTransaction(self.c_dataset()) };
+            if rv != OGRErr::OGRERR_NONE {
+		return Err(GdalError::OgrError {
+                    err: rv,
+                    method_name: "GDALDatasetCommitTransaction",
+		});
+            }
+	}
+	res
+    }
 }

Atreyagaurav · 2024-11-27T21:06:01Z

@lnicola what do you think of this:
Atreyagaurav@eaf509d

It is the above implementation with tests, I think this is good. The test cases are simple (I'm not doing the BATCH SIZE things you are doing).

EDIT: I need to do force=0, since this is for speed up reason.

EDIT2: I don't think I can switch branches on the pull request, so I can create a new pull request with it, if we go that direction.

lnicola · 2024-11-27T21:44:04Z

I think it's fine. Feel free to force-push to this branch.

Atreyagaurav · 2024-11-27T21:56:06Z

I thought renaming the branches would be a fun way to work around the branch swap, but it closed it.

Please refer to #591 for the new implementation based on closure.

Atreyagaurav force-pushed the maybe-transaction branch from 6e3ef42 to 128641c Compare November 2, 2024 00:22

Atreyagaurav marked this pull request as ready for review November 2, 2024 00:22

lnicola reviewed Nov 3, 2024

View reviewed changes

weiznich reviewed Nov 13, 2024

View reviewed changes

lnicola mentioned this pull request Nov 23, 2024

Bump gdal-src to 3.10.0 #574

Merged

2 tasks

Atreyagaurav added 2 commits November 27, 2024 22:10

Add MaybeTransaction for better writing speed when supported

a962b73

Update CHANGES.md; spelling

21d783d

lnicola force-pushed the maybe-transaction branch from 128641c to 21d783d Compare November 27, 2024 20:10

Atreyagaurav closed this Nov 27, 2024

Atreyagaurav deleted the maybe-transaction branch November 27, 2024 21:48

Atreyagaurav mentioned this pull request Nov 27, 2024

Add Dataset::maybe_batch for faster database write #591

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MaybeTransaction for better writing speed when supported #584

Add MaybeTransaction for better writing speed when supported #584

Atreyagaurav commented Nov 2, 2024

lnicola Nov 3, 2024

Atreyagaurav Nov 3, 2024

weiznich Nov 13, 2024

Atreyagaurav Nov 27, 2024

lnicola commented Nov 27, 2024 •

edited

Loading

Atreyagaurav commented Nov 27, 2024 •

edited

Loading

Atreyagaurav commented Nov 27, 2024 •

edited

Loading

lnicola commented Nov 27, 2024

Atreyagaurav commented Nov 27, 2024

Add MaybeTransaction for better writing speed when supported #584

Add MaybeTransaction for better writing speed when supported #584

Conversation

Atreyagaurav commented Nov 2, 2024

lnicola Nov 3, 2024

Choose a reason for hiding this comment

Atreyagaurav Nov 3, 2024

Choose a reason for hiding this comment

weiznich Nov 13, 2024

Choose a reason for hiding this comment

Atreyagaurav Nov 27, 2024

Choose a reason for hiding this comment

lnicola commented Nov 27, 2024 • edited Loading

Atreyagaurav commented Nov 27, 2024 • edited Loading

Atreyagaurav commented Nov 27, 2024 • edited Loading

lnicola commented Nov 27, 2024

Atreyagaurav commented Nov 27, 2024

lnicola commented Nov 27, 2024 •

edited

Loading

Atreyagaurav commented Nov 27, 2024 •

edited

Loading

Atreyagaurav commented Nov 27, 2024 •

edited

Loading