HyperStore Analytics Platform(HAP) is a project that allows Cloudian HyperStore users to make analysis on their existing Cloudian HyperStore clusters. Since the fundamental block is HyperStoreFileSystem, which is our own Hadoop FileSystem, users can use any Hadoop compatible libraries.
We recommend to run Spark rather than Hadoop to avoid resource contentions.
If you are new to Hadoop/Spark, please go ahead to our wiki pages to get some basics.
- Deep Learning with DL4J
- Machine Learning on Spark
- ETL and SQL on Spark
- ETL with Pig on Hadoop
- SQL with Hive on Hadoop
This platform was tested against the following versions.
name | version |
---|---|
Cloudian HyperStore | 5.2.1 |
Spark | 1.5.2 |
DL4J | 0.4 |
Hadoop | 2.7.1 |
Pig | 0.15.0 |
Hive | 1.2.1 |
Please follow Installation on our wiki.
You can use hsfs protocol to access your files stored in Cloudian HyperStore as follows. If you have used s3n or s3a, then simply replace s3n/s3a with hsfs.
hsfs://BUCKET_NAME/OBJECT_KEY
Please note that hsfs does not support write operations because it is designed to provide data locality, which is not available in writes. For writes, please use s3a instead(no additional config is required).