-
Notifications
You must be signed in to change notification settings - Fork 560
MapR
Instructions for Mapr
Mapr has a new one-step installer for mapr 3.1.1 some of this info is still useful, but it hasn't been updated in a while.
we don't really support standalone h2o talking to mapr anymore. You should launch h2o on mapr like this, while on node in a mapr cluster. This assumes you build the jars using 'make' in the h2o directory:
cd /home/kevin/h2o/hadoop/target
hadoop dfs -rmr /user/$USER/hdfsOutputDirName
hadoop jar h2odriver_mapr2.1.3.jar water.hadoop.h2odriver -jt 192.168.1.173:9001 -libjars ../../target /h2o.jar -mapperXmx 16g -nodes 5 -output hdfsOutputDirName
if standalone h2o works: There are currently two choices for mapr
-hdfs_version mapr3.0.1
or
-hdfs_version mapr2.1.3
if you specify
-hdfs_version mapr
you get mapr2.1.3
You may need to copy the /opt/mapr/lib tree from the target mapr cluster (a machine) to the machines that will be running h2o. (may be the same machines as the maprfs, so no issue there)
-hdfs_root doesn't need to be specified.
In house:
java -Djava.library.path=/opt/mapr/lib -Xmx2g -jar h2o.jar -hdfs maprfs://192.168.1.171 -hdfs_version 0.20.2mapr
In AWS:
java -Djava.library.path=/opt/mapr/lib -Xmx2g -jar h2o.jar -hdfs maprfs://mapr/mapr_0xdata/ -hdfs_version 0.20.2mapr
In house: (note the https is required)
https://192.168.1.171:8443/#dashboard
username: mapr
passwd: mapr
192.168.1.171 has the hostname "mr-0x1" so equivalently:
https://mr-0x1:8443/#dashboard
If you get no response, the mapr processes may not be up. log into 192.168.1.171 (root/0xdata) and
ps aux | grep mapr
if no java processes are running, you can start mapr by:
cd /home/root/mapr_install_stuff
./mapr_down.sh
./mapr_up.sh
Result should look like:
root@mr-0x1:~/mapr_install_stuff# ./mapr_up.sh
<actually a lot different now with 5 nodes, but it's something like this>
JMX enabled by default
Using config: /opt/mapr/zookeeper/zookeeper-3.3.2/conf/zoo.cfg
Starting zookeeper ... STARTED
Starting WARDEN, logging to /opt/mapr/logs/warden.log
For diagnostics look at /opt/mapr/logs/ for createsystemvolumes.log, warden.log and configured services log files
In AWS:
https://23.21.217.170:8443/#dashboard (instance BigInstance21 provides mapr-webserver)
username: mapr
passwd: mapr
Best to look at the sample script on 192.168.1.171 in /root/mapr_install_stuff.
cd /home/root/mapr_install_stuff
./mapr_down.sh
./mapr_up.sh
mapr talks about waiting after starting things, and sequencing the start for multiple nodes. The script can be modified for other clusters. (just runs on one node, 192.168.1.171..no-op on other nodes)
As user mapr run su - mapr
$ hadoop fs -ls /var/mapr
Found 1 items
drwxr-xr-x - mapr mapr 4 2012-10-28 13:32 /var/mapr/metrics
$ hadoop fs -copyFromLocal prostate_long.csv.gz /var/mapr
$ hadoop fs -ls /var/mapr
Found 2 items
drwxr-xr-x - mapr mapr 4 2012-10-28 13:32 /var/mapr/metrics
-rwxr-xr-x 3 mapr mapr 110707 2012-10-28 15:51 /var/mapr/prostate_long.csv.gz
MapR-FS uses underlying linux permissions UID/GID. That means if a user wants to read/write file it has to have access to the location. If a user stores file its uid/gid is assigned to a file. TODO: I am not sure about umask.
http://www.mapr.com/doc/display/MapR/Setting+Up+the+Client#SettingUptheClient
After installation: $ hadoop fs -ls maprfs://192.168.1.170/ $ hadoop fs -ls maprfs://192.168.1.170:7222/
12/10/28 17:11:09 WARN fs.MapRFileSystem: Could not find any cluster, defaulting to localhost
Found 5 items
-rwxr-xr-x 3 1004 1005 3555 2012-10-26 17:59 /.bashrc
drwxr-xr-x - 1004 1005 0 2012-10-28 13:29 /datasets
-rwxr-xr-x 3 1004 1005 110707 2012-10-26 18:00 /prostate_long.csv.gz
drwxrw-r-x - 1004 1005 3 2012-10-28 07:57 /user
drwxr-xr-x - 1004 1005 1 2012-10-24 19:13 /var
In sudo vi /opt/mapr/conf/warden.conf
$ sudo service.command.mfs.heapsize.percent=10
$ sudo /etc/init.d/mapr-mfs restart
Also see,
http://www.mapr.com/blog/keeping-hadoop-cluster-healthy-through-memory-management
service.command.jt.heapsize.max=5000
service.command.tt.heapsize.max=325
service.command.hbmaster.heapsize.max=512
service.command.hbregion.heapsize.max=4000
service.command.cldb.heapsize.max=4000
service.command.webserver.heapsize.max=750
service.command.nfs.heapsize.max=1000
service.command.os.heapsize.max=750
service.command.warden.heapsize.max=750
service.command.zk.heapsize.max=1500
How to figure out mapr config:
cat /opt/mapr/conf/mapr-clusters.conf
mapr_0xdata 10.188.77.216:7222
The Mapr relies strictly on linux privileges (user UID/group GID). It tries to interpret them cross the machines. Hence it is necessary to have a mapr user (or similar) on each machine which access MapR-FS - its UID has to be same on all machines.
According to documentation
Before installing MapR, decide on the name, user id (UID) and group id (GID) for the MapR user. The MapR user must exist on each node, and the user name, UID and primary GID must match on all nodes.
Here is a summary of MapR-FS privileges configuration
MapR-FS:
-
users: mapr (~as MapR-FS administrator), hduser (~as user of MaprFS)
-
volumes configuration (created by mapr user):
- volume 'users' mounted into /user
- volume 'hduser_home' mounted into /user/hduser
- volume 'mapr_home' mounted into /user/mapr
-
the privileges are configured as follows:
drwxr-x--- - hduser mapr 0 2012-10-29 17:42 /user/hduser drwxr-x--- - mapr mapr 2 2012-10-29 18:09 /user/mapr
(directories has 0750 permission to detect wrong client side setup, perhaps 0700 would be better)`
H20 side:
- configure mapr-clusters.conf (located in /opt/mapr/conf/mapr-clusters.conf). For example,
mapr1_0xdata 192.168.1.170:7222
H2O has to be run as hduser (the same UID as at MaprFS server)
java -Djava.library.path=/opt/mapr/lib -jar build/h2o.jar -hdfs maprfs:///mapr/mapr1_0xdata/ -hdfs_root iced
Notes:
- if
-hdfs_root
is a relative path (i.e.,-hdfs_root iced
), then it is relative to user's home (/user/hduser/iced
) - if
-hdfs_root
is an absolute path, then it is interpreted from the root of cluster FS (-hdfs_root /user/hduser/iced
has the same effect as the relative path above). HOWEVER, in this case, be careful because it is possible to point into directory which is not readable/writable by current user (i.e., hduser) - try not to refer cluster via IP (in this case uri resolution is little bit different)
The config file /opt/mapr/conf/mfs.conf allows us (http://www.mapr.com/doc/display/MapR/mfs.conf ):
- to control Mapr-FS access (based on subnets)
- and also to tune LRU cache for FS
- the documentation is poor but according to mapr-fs FAQ(http://answers.mapr.com/questions/2183/nfs-performance) - it is possible to tune use-specific (=used for mapreduce or just for NFS) performance of mapr-fs
Furthermore, it seems that mapr-fs by default compress all data
- we can turn off the feature globally
- or for specific folder (http://www.mapr.com/doc/display/MapR/.dfs_attributes)
Regarding memory allocation: The article http://www.mapr.com/blog?Itemid=265 says:
"MapR takes care of allocating memory among all the installed services on a node based on percentage of available physical memory. When the package is installed and the configure.sh script is run, MapR generates /opt/mapr/conf/warden.conf file that has details about how much percentage of physical memory needs to be allocated for each service. For example, if you install FileSystem and MapReduce on a node, the memory allocation is distributed between those two services. On the other hand if node also has HBase servcies, configure.sh makes sure that the available memory is divided among FileSystem, MapReduce as well as HBase services."
MapR tries to take 75% of physical memory and distributes it among running services. It can be tuned mapreduce.tasktracker.reserved.physicalmemory.mb (specified size in MB) BUT there is a question - do we need mapreduce layer?
Step-by-step notes for mapr installation on AWS instances.
For each node:
-
configure disk - 96GB volume split into 32GB disks per node (mapr requires at least 3 disks per node, flatfile can be used as well)
-
add mapr user and group (uid=2000,gid=2000)
-
update ubuntu repository list /etc/apt/sources.list
# Mapr 1.2.9 deb http://package.mapr.com/releases/v1.2.9/ubuntu/ mapr optional deb http://package.mapr.com/releases/ecosystem/ubuntu binary/
-
install Sun JDK6 via https://github.com/flexiondotorg/oab-java6 (it is recommended by MapR)
-
run
apt-get update
Notes: one cldb node, at least 3 zookeper, each node should include mapr-fileserver
- nodeA: mapr-cldb mapr-webserver mapr-zookeeper mapr-fileserver
- nodeB: mapr-fileserver mapr-zookeeper
- nodeC: mapr-fileserver mapr-zookeeper
- nodeD: mapr-fileserver
Run apt-get install for listed package above on corresponding nodes.
####Configure cloud
At each node run: ./configure.sh -C -Z -N mapr_0xdata
####Configure disks /opt/install-mapr/disks.txt contains a list of disk used by MapR-FS Run /opt/mapr/server/disksetup -F disks.txt
Run services: mapr-cldb, mapr-zookeper, mapr-mfs (at each node when they are installed)
Go to https://:8443/ to see if the cluster is accessible.
####Configure volumes TODO
it's a little confusing keeping track of all these hadoop/cloudera/mapr command variants
mapr uses the original "hadoop" name (instead of cloudera's new hdfs)
Note the schema is maprfs, not hdfs. I got distracted trying to use MapR's NFS support, but dropped that. Then got distracted trying to install just their client on other boxes to talk to them. Finally just decided to use the client on the MapR box that got installed with the node..that worked.
$ hadoop fs -ls maprfs://192.168.1.170:7222/
Found 1 items
drwxr-xr-x - mapr mapr 1 2012-10-24 19:13 /var
$ hadoop fs -ls maprfs://192.168.1.170:7222/
Found 1 items
drwxr-xr-x - mapr mapr 1 2012-10-24 19:13 /var
$ hadoop fs -ls maprfs://192.168.1.170:7222/var
Found 1 items
drwxr-xr-x - mapr mapr 1 2012-10-24 19:13 /var/mapr
$ hadoop fs -ls maprfs://192.168.1.170:7222/var/mapr
Found 1 items
drwxr-xr-x - mapr mapr 3 2012-10-26 17:02 /var/mapr/metrics
$ hadoop fs -ls maprfs://192.168.1.170:7222/var/mapr/metrics
Found 3 items
-rw-r--r-- 3 mapr mapr 8951 2012-10-25 16:32 /var/mapr/metrics/October.25.2012
-rw-r--r-- 3 mapr mapr 9993 2012-10-26 16:32 /var/mapr/metrics/October.26.2012
-rw-r--r-- 3 mapr mapr 215 2012-10-26 17:02 /var/mapr/metrics/October.27.2012