Add locking functionality #14

grooverdan · 2012-09-10T02:41:16Z

This code adds the option locking to filesystem so that consistent content of calendars is given. It uses an external package locking. If you're ok with this I'll happily update the website documentation.

Cheers

Daniel.
PS: also noticed git_folder in radicale/config.py ? Is this something in progress. I'm definitely interested in a git based storage and may implement this regardless.

grooverdan · 2012-09-10T02:51:56Z

hmm. I need to think a bit more there. Locking needs to be in radicale/init.py and wrapped around the manipulation of an item. So perhaps the storage needs to expose a locking api.

rngtng · 2012-09-10T09:33:06Z

yeah +1 on gitbased storage!

liZe · 2012-09-10T12:39:14Z

tl;dr: locking is great but complicated, Radicale already handles concurrent edition with etags in a standard way.

Exposing the locking API in the storage seems to be a very good idea, but it's a bit harder. Adding a method in ical.Calendar (and in the different storages overriding this class) should work. By the way, checking the etag is enough to avoid concurrent edition, if you use Radciale only, with no multithreading. It's already done for PUT requests, with a dedicated HTTP response status to handle this in clients.

The git based storage is a feature request for a long time: http://redmine.kozea.fr/issues/775. Now that we have an architecture allowing different storage mechanisms, it should be quite easy to write.

liZe · 2013-04-26T11:56:55Z

I've added the (experimental) database storage, it should be easy to add a git-based one now. Anyone interested ;)?

By the way, databases probably are "the solution"™ if you're interested in consistency, I'm not sure anymore that locking files is useful then.

grooverdan · 2013-04-27T00:10:33Z

tl;dr: locking is great but complicated,

Agree. I dislike my solution too and it was a work-around for the fact I was adding to the ical file outside of Radicale.

Radicale already handles concurrent edition with etags in a standard way.

I think Radicale handles concurrent edition only because it is fixed to a single thread. Even with the database code all the items are pulled into a structure, edited and all written out. While this was suitable for a flat file for a database its really poor. The exposure of the CalDAV manipulation API at the storage layer is required to achieve any concurrency and this should work for database and git. The storage engine itself should do its own locking, as databases are very capable of doing, and its retries in the even of failure (transaction failure) are in the storage layer or higher.

If you want to be able to handle higher concurrent actions without a single deadlock the storage layer needs to be altered to represent this.

liZe · 2013-04-30T11:04:37Z

The really simple and probably buggy database storage creates a transaction for each request. I think that it's enough, but I'm not sure at all. What do you think of that?

grooverdan · 2013-04-30T22:22:23Z

Correct me if I'm wrong, as I understand it the storage engine loads a whole calendar for use, manipulates this within the CalDAV interfaces, and if its a modification, writes the entire calendar out in full.

The difficulty is when more than one web request is operating on the same calendar. This can be multi user or the same client doing concurrent updates. At the moment the single threaded option is saving the data by ensuring the a single modification or retrieval at a time. This will become a bottleneck for larger concurrent usages where entry point in to Radicale has requests queued.

Assume for a bit that Radicale is no longer single threaded as it will need to be this way to handle a volume. Within the transaction whole calendar is deleted, and then reconstructed. Its hard to say here which bits are occurring in SQLAlchemy and which bits are occurring on the database however neither is ideal. Assuming its on the database then the isolation level of the database as to what the outcomes could be. A database is read uncommitted level will expose a deleted calendar to another thread where it might only be doing a modification leaving a client looking at an empty calendar. Even going to the top isolation level of serializable and PUT operation could be modifying once event in the calendar (correct me if I've got my CalDAV) wrong and another thread is modifying another event. The results are likely that one of these modifications will be lost.

To handle the maximum throughput and still be consistent Radicale needs to push as minimal changes as possible into the database and ensure the transaction is begun at the beginning, do the minimal modifications required, and push the commit. Depending on what data is read/reread within the code of a transaction may dictate what transaction isolation level is required. At the end, a commit may still and Radicale should retry a few times before returning an error to the user.

I'm sure some bits of this are going to be hard. I'm not sure how minimal a set of changes CalDAV clients push but for the most part the diffs that are pushed are what should appear in the database transactions.

Sorry for being a sideline critic. I'm hoping this helps.

liZe · 2013-05-03T12:22:51Z

Sorry for being a sideline critic. I'm hoping this helps.

No problem, it's very useful!

Correct me if I'm wrong, as I understand it the storage engine loads a whole calendar for use, manipulates this within the CalDAV interfaces, and if its a modification, writes the entire calendar out in full.

That's true.

The difficulty is when more than one web request is operating on the same calendar. This can be multi user or the same client doing concurrent updates. At the moment the single threaded option is saving the data by ensuring the a single modification or retrieval at a time. This will become a bottleneck for larger concurrent usages where entry point in to Radicale has requests queued.

You're right.

Assume for a bit that Radicale is no longer single threaded as it will need to be this way to handle a volume. Within the transaction whole calendar is deleted, and then reconstructed. Its hard to say here which bits are occurring in SQLAlchemy and which bits are occurring on the database however neither is ideal. Assuming its on the database then the isolation level of the database as to what the outcomes could be. A database is read uncommitted level will expose a deleted calendar to another thread where it might only be doing a modification leaving a client looking at an empty calendar. Even going to the top isolation level of serializable and PUT operation could be modifying once event in the calendar (correct me if I've got my CalDAV wrong) and another thread is modifying another event. The results are likely that one of these modifications will be lost.

I'm sometimes a bit stupid :/. One day, I'll finally understand that transactions cannot magically solve my problems.

It should be possible to use locks to prevent data losses and corruptions. Well, according to what you wrote in the previous paragraph, locking the whole database for each request would mean just setting the bottleneck one step deeper. But at least, it would allow Radicale to be safely (and slowly) launched multithreaded.

To handle the maximum throughput and still be consistent Radicale needs to push as minimal changes as possible into the database and ensure the transaction is begun at the beginning, do the minimal modifications required, and push the commit. Depending on what data is read/reread within the code of a transaction may dictate what transaction isolation level is required. At the end, a commit may still and Radicale should retry a few times before returning an error to the user.

Even this cannot be done without changing the way Radicale works. For example, checking the etag of an event and modifying it is done in two different function calls, with no easy way to check that the data didn't change between them.

I'm pretty confident about consistency, but I'm not sure that Radicale can provide a high performance storage without sacrificing what it's been written for: work out of the box, be small, be simple. Data consistency with multithreading can be a funny goal, at least for database (and git, etc) storage. Better performance can also be a goal, easy to reach when you look how stupid some algorithms are.

But other CalDAV servers can handle zillions of requests in a couple of seconds, and they're really good at it. They're also bigger, hard to install and to configure, that's why Radicale has been created.

I'm sure some bits of this are going to be hard. I'm not sure how minimal a set of changes CalDAV clients push but for the most part the diffs that are pushed are what should appear in the database transactions.

That's definitely the first step, and it's quite easy to reach (I've only spent a couple of hours on the database storage, for fun, and I don't know databases seriously). Anyone interested? :)

Oh, and before I forget: thanks a lot for this thread.

grooverdan · 2013-05-04T02:20:27Z

I'll finally understand that transactions cannot magically solve my problems.

Not magically without thought, but with a bit of coaxing, minimal changes that corresponds to the business logic, and realizing they can fail, they can become very magic quickly.

I suspect using the merge() function of sqlalemcy may produce a more minimal set of changes ( http://docs.sqlalchemy.org/en/latest/orm/session.html#session-faq ). Increase the log level on sqlalchemy and see what sql it does on the database.

You may also need cascade definitions in your database relationships (easy with the sqlalchemy)

It should be possible to use locks to prevent data losses and corruptions. Well, according to what you wrote in the previous paragraph, locking the whole database for each request would mean just setting the bottleneck one step deeper. But at least, it would allow Radicale to be safely (and slowly) launched multithreaded.

Locks, as in explicit locks (LOCK TABLE, DATABASE, SELECT WITH READ LOCK, etc), in databases will almost always cause more problems than transactions.

For example, checking the etag of an event and modifying it is done in two different function calls, with no easy way to check that the data didn't change between them.

Try the get, the merge and then commit. With default transactions it might work without too much change.

But other CalDAV servers can handle zillions of requests in a couple of seconds, and they're really good at it. They're also bigger, hard to install and to configure, that's why Radicale has been created.

Yes. I quite liked the simplicity of the filesystem storage which is why I used it. I probably need to get the other bit of my application to talk CalDAV rather than manipulating the storage and writing my dodgy lock patch.

There's no reason why with the right logic Radicale could do quite well on a DB back-end.

liZe · 2013-09-12T21:11:07Z

FYI, a git-based storage has been added. Install dulwich, use an empty git repository as root folder for collections, it should work. Issues are welcome!

liZe · 2016-05-25T14:42:30Z

Locking has been implemented in #402.

add locking functionality

3dfc91e

liZe mentioned this pull request Mar 25, 2014

Discussion: Proposed changes to storage api for 1.0 #130

Closed

liZe mentioned this pull request Jul 28, 2014

Radicale on GoogleAppEngine #184

Closed

liZe mentioned this pull request Mar 9, 2016

Define what's in Radicale 2.0 #364

Closed

liZe modified the milestones: 2.0, 3.0 Mar 31, 2016

liZe mentioned this pull request Apr 4, 2016

Follow roadmap for 2.0 #372

Closed

39 tasks

liZe closed this May 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add locking functionality #14

Add locking functionality #14

grooverdan commented Sep 10, 2012

grooverdan commented Sep 10, 2012

rngtng commented Sep 10, 2012

liZe commented Sep 10, 2012

liZe commented Apr 26, 2013

grooverdan commented Apr 27, 2013

liZe commented Apr 30, 2013

grooverdan commented Apr 30, 2013

liZe commented May 3, 2013

grooverdan commented May 4, 2013

liZe commented Sep 12, 2013

liZe commented May 25, 2016

Add locking functionality #14

Add locking functionality #14

Conversation

grooverdan commented Sep 10, 2012

grooverdan commented Sep 10, 2012

rngtng commented Sep 10, 2012

liZe commented Sep 10, 2012

liZe commented Apr 26, 2013

grooverdan commented Apr 27, 2013

liZe commented Apr 30, 2013

grooverdan commented Apr 30, 2013

liZe commented May 3, 2013

grooverdan commented May 4, 2013

liZe commented Sep 12, 2013

liZe commented May 25, 2016