-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MaybeTransaction for better writing speed when supported #584
Conversation
6e3ef42
to
128641c
Compare
|
||
impl Dataset { | ||
pub fn maybe_start_transaction(&mut self) -> MaybeTransaction<'_> { | ||
let force = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add at some point that parameter for force
, but if you want this for better performance, you absolutely don't want to enable it.
I think it only works for FileGDB, and it's implemented by making a file system-level copy of the whole dataset, and restoring it on rollback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that won't help with performance, I think if we want transaction for rollback purposes vs want transaction for speed purposes, I think we should have different methods.
The MaybeTransaction
doesn't have rollback
method at all, that'd help at compile time to let people know they can't rollback. Since it could be working on the database directly without transaction as well.
That's why I was thinking maybe some other name would be better. The current interpretation of MaybeTransaction
is it is maybe a transaction or not, hence it has to be committed on drop, and no rollback. To make the Transaction and non transaction database do same thing.
On that note I think we can add the force thing to normal Transaction
or add another ForceTransaction
that does filesystem copy ourselves when Transaction fails.
} | ||
|
||
impl Dataset { | ||
pub fn maybe_start_transaction(&mut self) -> MaybeTransaction<'_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be reasonable to go with a callback based approach here instead of having that guard object. This should make it much easier to handle errors on committing.
That would give you the following function signature:
fn maybe_transaction<R,E: From<gdal::Error>>(&self, impl Fn(&Dataset) -> Result<R, E>) -> Result<R,E>
For the implementation of that pattern see diesels transaction method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late response. I made it quickly, just copying the mechanism that is being already used by the transaction. It gives uniformity to the transaction API.
128641c
to
21d783d
Compare
I can see why the callback approach is better: a // Nb.: untested, and I'm probably misremembering the API
let mut tx = ds.maybe_transaction()?;
let mut batch_len = 0; // slice::chunks would probably be even nicer
for row in df {
write_feature(&mut tx, row)?;
batch_len += 1;
if batch_len == BATCH_SIZE {
tx.commit()?; // just to check for errors
tx = ds.maybe_transaction()?;
batch_len = 0;
}
}
if batch_len > 0 {
tx.commit()?;
} For me, at least, this feels pretty natural to write. With a callback, I think it would look like this: // Nb.: even more wrong, probably
let mut it = df.into_iter().peekable();
while it.peek().is_some() {
ds.maybe_transaction(|mut tx| {
for row in it.take(BATCH_LIMIT) {
write_feature(&mut tx, row)?;
}
Ok(())
})?;
} Which isn't that bad, but seems more tricky to write. And if we use a callback approach here, we should probably change Then there's the question of naming, |
I haven't thought that much into it. But I think, looking at your examples, the callback does make sense. Although, I don't know if there is going to be any problem with the borrow checker for that example. But making it take a closure and then calling that either on transaction or dataset seems like a good idea, but yeah it'll be harder to write than running things on a What do you think about this implementation? @@ -177,6 +177,22 @@ impl Dataset {
}
Ok(Transaction::new(self))
}
+
+ pub fn maybe_transaction(&mut self, func: impl Fn(&Dataset) -> Result<()>) -> Result<()> {
+ let force = 0;
+ let rv = unsafe { gdal_sys::GDALDatasetStartTransaction(self.c_dataset(), force) };
+ let res = func(self);
+ if rv == OGRErr::OGRERR_NONE {
+ let rv = unsafe { gdal_sys::GDALDatasetCommitTransaction(self.c_dataset()) };
+ if rv != OGRErr::OGRERR_NONE {
+ return Err(GdalError::OgrError {
+ err: rv,
+ method_name: "GDALDatasetCommitTransaction",
+ });
+ }
+ }
+ res
+ }
} |
@lnicola what do you think of this: It is the above implementation with tests, I think this is good. The test cases are simple (I'm not doing the BATCH SIZE things you are doing). EDIT: I need to do force=0, since this is for speed up reason. EDIT2: I don't think I can switch branches on the pull request, so I can create a new pull request with it, if we go that direction. |
I think it's fine. Feel free to force-push to this branch. |
I thought renaming the branches would be a fun way to work around the branch swap, but it closed it. Please refer to #591 for the new implementation based on closure. |
CHANGES.md
if knowledge of this change could be valuable to users.Fixes #582