Skip to content
DmitryMezhensky edited this page Nov 28, 2012 · 3 revisions

Why we need this

Today’s view of the ‘future’ of computing revolves around data storage and elastic, distributed platforms. Cloud computing has good manageability and resource utilization. This makes tend to move services to clouds: use public or build private ones. There are some platforms for building private clouds: Cloudstack, Openstack, Eucalyptus, VMware. Despite private clouds benefit there are problems with compatibility with existing systems. This imposes restrictions is use. Our main idea was to make compatibility with Openstack Swift and most popular distributed computation framework - Hadoop.

Hadoop

Hadoop supports main storages such as Amazon S3, Kosmos, Hadoop Archive Filesystem. Hadoop can run MapReduce programs over that storages, integration is fully compatible. But there are variety of other storages that Hadoop doesn't support. Developers can write a driver to support custom FS, but some features, for instance Hadoop data locality, will be missing. This cases network, CPU or memory overhead.

Swift

Swift is cloud storage developing by Openstack community. Swift is Amazon S3 analogue: both systems have accounts, accounts contain buckets (containers in Swift), buckets (containers) contain objects. Objects are real files of any kind. For data processing in S3 Amazon offers Elastic MapReduce service, Swift doesn't have such compatibility.

Clone this wiki locally