-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add locking functionality #14
Conversation
hmm. I need to think a bit more there. Locking needs to be in radicale/init.py and wrapped around the manipulation of an item. So perhaps the storage needs to expose a locking api. |
yeah +1 on gitbased storage! |
tl;dr: locking is great but complicated, Radicale already handles concurrent edition with etags in a standard way. Exposing the locking API in the storage seems to be a very good idea, but it's a bit harder. Adding a method in ical.Calendar (and in the different storages overriding this class) should work. By the way, checking the etag is enough to avoid concurrent edition, if you use Radciale only, with no multithreading. It's already done for PUT requests, with a dedicated HTTP response status to handle this in clients. The git based storage is a feature request for a long time: http://redmine.kozea.fr/issues/775. Now that we have an architecture allowing different storage mechanisms, it should be quite easy to write. |
I've added the (experimental) database storage, it should be easy to add a git-based one now. Anyone interested ;)? By the way, databases probably are "the solution"™ if you're interested in consistency, I'm not sure anymore that locking files is useful then. |
Agree. I dislike my solution too and it was a work-around for the fact I was adding to the ical file outside of Radicale.
I think Radicale handles concurrent edition only because it is fixed to a single thread. Even with the database code all the items are pulled into a structure, edited and all written out. While this was suitable for a flat file for a database its really poor. The exposure of the CalDAV manipulation API at the storage layer is required to achieve any concurrency and this should work for database and git. The storage engine itself should do its own locking, as databases are very capable of doing, and its retries in the even of failure (transaction failure) are in the storage layer or higher. If you want to be able to handle higher concurrent actions without a single deadlock the storage layer needs to be altered to represent this. |
The really simple and probably buggy database storage creates a transaction for each request. I think that it's enough, but I'm not sure at all. What do you think of that? |
Correct me if I'm wrong, as I understand it the storage engine loads a whole calendar for use, manipulates this within the CalDAV interfaces, and if its a modification, writes the entire calendar out in full. The difficulty is when more than one web request is operating on the same calendar. This can be multi user or the same client doing concurrent updates. At the moment the single threaded option is saving the data by ensuring the a single modification or retrieval at a time. This will become a bottleneck for larger concurrent usages where entry point in to Radicale has requests queued. Assume for a bit that Radicale is no longer single threaded as it will need to be this way to handle a volume. Within the transaction whole calendar is deleted, and then reconstructed. Its hard to say here which bits are occurring in SQLAlchemy and which bits are occurring on the database however neither is ideal. Assuming its on the database then the isolation level of the database as to what the outcomes could be. A database is read uncommitted level will expose a deleted calendar to another thread where it might only be doing a modification leaving a client looking at an empty calendar. Even going to the top isolation level of serializable and PUT operation could be modifying once event in the calendar (correct me if I've got my CalDAV) wrong and another thread is modifying another event. The results are likely that one of these modifications will be lost. To handle the maximum throughput and still be consistent Radicale needs to push as minimal changes as possible into the database and ensure the transaction is begun at the beginning, do the minimal modifications required, and push the commit. Depending on what data is read/reread within the code of a transaction may dictate what transaction isolation level is required. At the end, a commit may still and Radicale should retry a few times before returning an error to the user. I'm sure some bits of this are going to be hard. I'm not sure how minimal a set of changes CalDAV clients push but for the most part the diffs that are pushed are what should appear in the database transactions. Sorry for being a sideline critic. I'm hoping this helps. |
No problem, it's very useful!
That's true.
You're right.
I'm sometimes a bit stupid :/. One day, I'll finally understand that transactions cannot magically solve my problems. It should be possible to use locks to prevent data losses and corruptions. Well, according to what you wrote in the previous paragraph, locking the whole database for each request would mean just setting the bottleneck one step deeper. But at least, it would allow Radicale to be safely (and slowly) launched multithreaded.
Even this cannot be done without changing the way Radicale works. For example, checking the etag of an event and modifying it is done in two different function calls, with no easy way to check that the data didn't change between them. I'm pretty confident about consistency, but I'm not sure that Radicale can provide a high performance storage without sacrificing what it's been written for: work out of the box, be small, be simple. Data consistency with multithreading can be a funny goal, at least for database (and git, etc) storage. Better performance can also be a goal, easy to reach when you look how stupid some algorithms are. But other CalDAV servers can handle zillions of requests in a couple of seconds, and they're really good at it. They're also bigger, hard to install and to configure, that's why Radicale has been created.
That's definitely the first step, and it's quite easy to reach (I've only spent a couple of hours on the database storage, for fun, and I don't know databases seriously). Anyone interested? :) Oh, and before I forget: thanks a lot for this thread. |
Not magically without thought, but with a bit of coaxing, minimal changes that corresponds to the business logic, and realizing they can fail, they can become very magic quickly. I suspect using the merge() function of sqlalemcy may produce a more minimal set of changes ( http://docs.sqlalchemy.org/en/latest/orm/session.html#session-faq ). Increase the log level on sqlalchemy and see what sql it does on the database. You may also need cascade definitions in your database relationships (easy with the sqlalchemy)
Locks, as in explicit locks (LOCK TABLE, DATABASE, SELECT WITH READ LOCK, etc), in databases will almost always cause more problems than transactions.
Try the get, the merge and then commit. With default transactions it might work without too much change.
Yes. I quite liked the simplicity of the filesystem storage which is why I used it. I probably need to get the other bit of my application to talk CalDAV rather than manipulating the storage and writing my dodgy lock patch. There's no reason why with the right logic Radicale could do quite well on a DB back-end. |
FYI, a git-based storage has been added. Install dulwich, use an empty git repository as root folder for collections, it should work. Issues are welcome! |
Locking has been implemented in #402. |
This code adds the option locking to filesystem so that consistent content of calendars is given. It uses an external package locking. If you're ok with this I'll happily update the website documentation.
Cheers
Daniel.
PS: also noticed git_folder in radicale/config.py ? Is this something in progress. I'm definitely interested in a git based storage and may implement this regardless.