Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MaybeTransaction for better writing speed when supported #584

Closed
wants to merge 2 commits into from

Conversation

Atreyagaurav
Copy link
Contributor

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Fixes #582

@Atreyagaurav Atreyagaurav marked this pull request as ready for review November 2, 2024 00:22

impl Dataset {
pub fn maybe_start_transaction(&mut self) -> MaybeTransaction<'_> {
let force = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add at some point that parameter for force, but if you want this for better performance, you absolutely don't want to enable it.

I think it only works for FileGDB, and it's implemented by making a file system-level copy of the whole dataset, and restoring it on rollback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that won't help with performance, I think if we want transaction for rollback purposes vs want transaction for speed purposes, I think we should have different methods.

The MaybeTransaction doesn't have rollback method at all, that'd help at compile time to let people know they can't rollback. Since it could be working on the database directly without transaction as well.

That's why I was thinking maybe some other name would be better. The current interpretation of MaybeTransaction is it is maybe a transaction or not, hence it has to be committed on drop, and no rollback. To make the Transaction and non transaction database do same thing.

On that note I think we can add the force thing to normal Transaction or add another ForceTransaction that does filesystem copy ourselves when Transaction fails.

}

impl Dataset {
pub fn maybe_start_transaction(&mut self) -> MaybeTransaction<'_> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be reasonable to go with a callback based approach here instead of having that guard object. This should make it much easier to handle errors on committing.

That would give you the following function signature:

fn maybe_transaction<R,E: From<gdal::Error>>(&self, impl Fn(&Dataset) -> Result<R, E>) -> Result<R,E>

For the implementation of that pattern see diesels transaction method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late response. I made it quickly, just copying the mechanism that is being already used by the transaction. It gives uniformity to the transaction API.

@lnicola lnicola mentioned this pull request Nov 23, 2024
2 tasks
@lnicola
Copy link
Member

lnicola commented Nov 27, 2024

I can see why the callback approach is better: a commit method doesn't really make sense, because the transaction will get committed anyway (we could offer an API where rollback can fail and still commit, but that seems harder to understand), and we can't expose. On the other hand, the callback API is much harder to use:

// Nb.: untested, and I'm probably misremembering the API

let mut tx = ds.maybe_transaction()?;
let mut batch_len = 0; // slice::chunks would probably be even nicer
for row in df {
    write_feature(&mut tx, row)?;
    batch_len += 1;
    if batch_len == BATCH_SIZE {
        tx.commit()?; // just to check for errors
        tx = ds.maybe_transaction()?;
        batch_len = 0;
    }
}

if batch_len > 0 {
    tx.commit()?;
}

For me, at least, this feels pretty natural to write. With a callback, I think it would look like this:

// Nb.: even more wrong, probably

let mut it = df.into_iter().peekable();
while it.peek().is_some() {
    ds.maybe_transaction(|mut tx| {
        for row in it.take(BATCH_LIMIT) {
            write_feature(&mut tx, row)?;
        }
        Ok(())
    })?;
}

Which isn't that bad, but seems more tricky to write.

And if we use a callback approach here, we should probably change start_transaction for symmetry. @weiznich I'm pretty sure you've spent a lot of time on this API question, you might have more arguments in favor of the callback approach.


Then there's the question of naming, MaybeTransaction isn't that bad, but feels a bit unserious. I don't have a better alternative, though.

@Atreyagaurav
Copy link
Contributor Author

Atreyagaurav commented Nov 27, 2024

I haven't thought that much into it. But I think, looking at your examples, the callback does make sense. Although, I don't know if there is going to be any problem with the borrow checker for that example. But making it take a closure and then calling that either on transaction or dataset seems like a good idea, but yeah it'll be harder to write than running things on a MaybeTransaction.

What do you think about this implementation?

@@ -177,6 +177,22 @@ impl Dataset {
         }
         Ok(Transaction::new(self))
     }
+
+    pub fn maybe_transaction(&mut self, func: impl Fn(&Dataset) -> Result<()>) -> Result<()> {
+	let force = 0;
+	let rv = unsafe { gdal_sys::GDALDatasetStartTransaction(self.c_dataset(), force) };
+	let res = func(self);
+        if rv == OGRErr::OGRERR_NONE {
+	    let rv = unsafe { gdal_sys::GDALDatasetCommitTransaction(self.c_dataset()) };
+            if rv != OGRErr::OGRERR_NONE {
+		return Err(GdalError::OgrError {
+                    err: rv,
+                    method_name: "GDALDatasetCommitTransaction",
+		});
+            }
+	}
+	res
+    }
 }

@Atreyagaurav
Copy link
Contributor Author

Atreyagaurav commented Nov 27, 2024

@lnicola what do you think of this:
Atreyagaurav@eaf509d

It is the above implementation with tests, I think this is good. The test cases are simple (I'm not doing the BATCH SIZE things you are doing).

EDIT: I need to do force=0, since this is for speed up reason.

EDIT2: I don't think I can switch branches on the pull request, so I can create a new pull request with it, if we go that direction.

@lnicola
Copy link
Member

lnicola commented Nov 27, 2024

I think it's fine. Feel free to force-push to this branch.

@Atreyagaurav
Copy link
Contributor Author

I thought renaming the branches would be a fun way to work around the branch swap, but it closed it.

Please refer to #591 for the new implementation based on closure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Req] Optionally use Transaction if supported
3 participants