Skip to content
apavlo edited this page May 30, 2011 · 1 revision

Authors

Irina Calciu

Alex Gillmor

Implementation

We de-normalized the tuples by managing relations in the primary key. Here are some high level examples:

  • WAREHOUSE.W_ID: One warehouse
  • WAREHOUSE.W_ID.DISTRICT.D_ID: One district in that warehouse
  • W_ID.DISTRICT.D_ID.ORDER_LINE.ORDER_ID: One orderline for an order for a district for a warehouse

Relations are managed in the hierarchal sense this way. Tables are partitioned this way as well. For secondary indexes generally only that which was absolutely required was placed in a list. For instance:

WAREHOUSE.W_ID.STOCK

Would return a list of objects with item ids and quantity for the new order and stock level transactions.

Driver Dependencies

We need a modified Scalaris driver for python; the official one doesn't handle datetime objects.

Known Issues

The main issue with benchmarking Scalaris has been being unable to successfully load the data. This problem is compounded by the fact that the Python API and the Scalaris JSON RPC web service appear to be terribly slow. Whereas most of the other systems being benchmarked appeared to do a full load of their data for 2 nodes (20 warehouses with 10 districts each) within 20-30 minutes; it takes Scalaris approximately the same amount of time to load a much smaller dataset (2 warehouses with 2 districts) using the Python API and there have been many issues encountered even loading such a dataset. Among these have been; transactions failing once the transaction log have reached 10 kilobytes, transactions failing with large values due to a problem with the JSON connection and TCP that forces us to constantly reconnect to the server as we load our data, large data values failing due to timeout issues, even with large timeouts set on the client and the server (meaning the system is timing out apart from our settings). We have hit smaller, non-critical bugs that we work around, but those listed above are hard to work with and even as we hack workarounds; there is no guarantee that there is not another bug waiting around the corner. We have gotten our transactions working with the much smaller datasets, but we currently do not even know how long it would take to load a full dataset for 2 nodes, much less 4 or 8.

We have tried to address the issues with the Python API by using the Java API with Jython as Java can communicate natively with Erlang, is regarded as being faster and having better support in communicating with multiple nodes, but we have been unsuccessful in that regard as well. With a base install of Scalaris we hit a bug that is not described in the documentation and cannot successfully run the the Java tests on the server, much less get one of our clients to work with it.

We have done our best to try to address our issues with the current documentation and the community. However, the community is small (some 140 people are subscribed to the mailing list) and as the project is being actively developed the documentation is not great. The responsiveness of the development team has been lukewarm; they have been very quick on some issues but not with others. We feel that Scalaris has a lot of potential but there is a lack of expertise outside the main developers that hinders the ability to effectively use it.

Future Work

None until Scalaris is significantly improved.