MapR

Instructions for Mapr

Mapr has a new one-step installer for mapr 3.1.1 some of this info is still useful, but it hasn't been updated in a while.

How do I launch H2O on Mapr?

we don't really support standalone h2o talking to mapr anymore. You should launch h2o on mapr like this, while on node in a mapr cluster. This assumes you build the jars using 'make' in the h2o directory:

cd /home/kevin/h2o/hadoop/target
hadoop dfs -rmr /user/$USER/hdfsOutputDirName
hadoop jar h2odriver_mapr2.1.3.jar water.hadoop.h2odriver -jt 192.168.1.173:9001 -libjars ../../target  /h2o.jar -mapperXmx 16g -nodes 5 -output hdfsOutputDirName

if standalone h2o works: There are currently two choices for mapr

-hdfs_version mapr3.0.1

or

-hdfs_version mapr2.1.3

if you specify

-hdfs_version mapr

you get mapr2.1.3

You may need to copy the /opt/mapr/lib tree from the target mapr cluster (a machine) to the machines that will be running h2o. (may be the same machines as the maprfs, so no issue there)

-hdfs_root doesn't need to be specified.

In house:

java -Djava.library.path=/opt/mapr/lib -Xmx2g -jar h2o.jar -hdfs maprfs://192.168.1.171 -hdfs_version 0.20.2mapr

In AWS:

java -Djava.library.path=/opt/mapr/lib -Xmx2g -jar h2o.jar -hdfs maprfs://mapr/mapr_0xdata/ -hdfs_version 0.20.2mapr

Where is MapR's dashboard?

In house: (note the https is required)

https://192.168.1.171:8443/#dashboard
username: mapr
passwd:   mapr

192.168.1.171 has the hostname "mr-0x1" so equivalently:

https://mr-0x1:8443/#dashboard

If you get no response, the mapr processes may not be up. log into 192.168.1.171 (root/0xdata) and

ps aux | grep mapr

if no java processes are running, you can start mapr by:

cd /home/root/mapr_install_stuff
./mapr_down.sh
./mapr_up.sh

Result should look like:

root@mr-0x1:~/mapr_install_stuff# ./mapr_up.sh

JMX enabled by default
Using config: /opt/mapr/zookeeper/zookeeper-3.3.2/conf/zoo.cfg
Starting zookeeper ... STARTED
Starting WARDEN, logging to /opt/mapr/logs/warden.log
For diagnostics look at /opt/mapr/logs/ for createsystemvolumes.log, warden.log and configured services log files

In AWS:

https://23.21.217.170:8443/#dashboard (instance BigInstance21 provides mapr-webserver)
username: mapr
passwd:   mapr

How do I restart MapR?

Best to look at the sample script on 192.168.1.171 in /root/mapr_install_stuff.

cd /home/root/mapr_install_stuff
./mapr_down.sh
./mapr_up.sh

mapr talks about waiting after starting things, and sequencing the start for multiple nodes. The script can be modified for other clusters. (just runs on one node, 192.168.1.171..no-op on other nodes)

How do I add files to MapR?

As user mapr run su - mapr

$ hadoop fs -ls /var/mapr
Found 1 items
drwxr-xr-x   - mapr mapr          4 2012-10-28 13:32 /var/mapr/metrics

$ hadoop fs -copyFromLocal prostate_long.csv.gz   /var/mapr

$ hadoop fs -ls /var/mapr
Found 2 items 
drwxr-xr-x   - mapr mapr          4 2012-10-28 13:32 /var/mapr/metrics
-rwxr-xr-x   3 mapr mapr     110707 2012-10-28 15:51 /var/mapr/prostate_long.csv.gz

How do MapR-FS treat user permissions

MapR-FS uses underlying linux permissions UID/GID. That means if a user wants to read/write file it has to have access to the location. If a user stores file its uid/gid is assigned to a file. TODO: I am not sure about umask.

How do I install mapR client?

http://www.mapr.com/doc/display/MapR/Setting+Up+the+Client#SettingUptheClient

After installation: $ hadoop fs -ls maprfs://192.168.1.170/ $ hadoop fs -ls maprfs://192.168.1.170:7222/

  12/10/28 17:11:09 WARN fs.MapRFileSystem: Could not find any cluster, defaulting to localhost
  Found 5 items
   -rwxr-xr-x   3 1004 1005       3555 2012-10-26 17:59 /.bashrc
   drwxr-xr-x   - 1004 1005          0 2012-10-28 13:29 /datasets
   -rwxr-xr-x   3 1004 1005     110707 2012-10-26 18:00 /prostate_long.csv.gz
   drwxrw-r-x   - 1004 1005          3 2012-10-28 07:57 /user
   drwxr-xr-x   - 1004 1005          1 2012-10-24 19:13 /var

How do I restart mapr with less memory for their mapr-fs (mfs)

    In sudo vi /opt/mapr/conf/warden.conf
    $ sudo service.command.mfs.heapsize.percent=10
    $ sudo /etc/init.d/mapr-mfs  restart

What are the other users of memory on a mapr box?

    Also see,
    http://www.mapr.com/blog/keeping-hadoop-cluster-healthy-through-memory-management

    service.command.jt.heapsize.max=5000
    service.command.tt.heapsize.max=325
    service.command.hbmaster.heapsize.max=512
    service.command.hbregion.heapsize.max=4000
    service.command.cldb.heapsize.max=4000
    service.command.webserver.heapsize.max=750
    service.command.nfs.heapsize.max=1000
    service.command.os.heapsize.max=750
    service.command.warden.heapsize.max=750
    service.command.zk.heapsize.max=1500

How to find out mapr url to connect to after login to node?

How to figure out mapr config:

  cat /opt/mapr/conf/mapr-clusters.conf 
  mapr_0xdata 10.188.77.216:7222

MapR privileges

The Mapr relies strictly on linux privileges (user UID/group GID). It tries to interpret them cross the machines. Hence it is necessary to have a mapr user (or similar) on each machine which access MapR-FS - its UID has to be same on all machines.

According to documentation

Before installing MapR, decide on the name, user id (UID) and group id (GID) for the MapR user. The MapR user must exist on each node, and the user name, UID and primary GID must match on all nodes.

MapR privileges scheme

Here is a summary of MapR-FS privileges configuration

MapR-FS:

users: mapr (~as MapR-FS administrator), hduser (~as user of MaprFS)
volumes configuration (created by mapr user):
- volume 'users' mounted into /user
- volume 'hduser_home' mounted into /user/hduser
- volume 'mapr_home' mounted into /user/mapr
the privileges are configured as follows:

drwxr-x--- - hduser mapr 0 2012-10-29 17:42 /user/hduser drwxr-x--- - mapr mapr 2 2012-10-29 18:09 /user/mapr

(directories has 0750 permission to detect wrong client side setup, perhaps 0700 would be better)`

H20 side:

configure mapr-clusters.conf (located in /opt/mapr/conf/mapr-clusters.conf). For example,

mapr1_0xdata 192.168.1.170:7222

H2O has to be run as hduser (the same UID as at MaprFS server) java -Djava.library.path=/opt/mapr/lib -jar build/h2o.jar -hdfs maprfs:///mapr/mapr1_0xdata/ -hdfs_root iced

Notes:

if -hdfs_root is a relative path (i.e., -hdfs_root iced), then it is relative to user's home (/user/hduser/iced)
if -hdfs_root is an absolute path, then it is interpreted from the root of cluster FS (-hdfs_root /user/hduser/iced has the same effect as the relative path above). HOWEVER, in this case, be careful because it is possible to point into directory which is not readable/writable by current user (i.e., hduser)
try not to refer cluster via IP (in this case uri resolution is little bit different)

Tuning MapR-FS performance

The config file /opt/mapr/conf/mfs.conf allows us (http://www.mapr.com/doc/display/MapR/mfs.conf ):

to control Mapr-FS access (based on subnets)
and also to tune LRU cache for FS
- the documentation is poor but according to mapr-fs FAQ(http://answers.mapr.com/questions/2183/nfs-performance) - it is possible to tune use-specific (=used for mapreduce or just for NFS) performance of mapr-fs

Furthermore, it seems that mapr-fs by default compress all data

we can turn off the feature globally
or for specific folder (http://www.mapr.com/doc/display/MapR/.dfs_attributes)

MapR memory allocation

Regarding memory allocation: The article http://www.mapr.com/blog?Itemid=265 says:

"MapR takes care of allocating memory among all the installed services on a node based on percentage of available physical memory. When the package is installed and the configure.sh script is run, MapR generates /opt/mapr/conf/warden.conf file that has details about how much percentage of physical memory needs to be allocated for each service. For example, if you install FileSystem and MapReduce on a node, the memory allocation is distributed between those two services. On the other hand if node also has HBase servcies, configure.sh makes sure that the available memory is divided among FileSystem, MapReduce as well as HBase services."

MapR tries to take 75% of physical memory and distributes it among running services. It can be tuned mapreduce.tasktracker.reserved.physicalmemory.mb (specified size in MB) BUT there is a question - do we need mapreduce layer?

MapR AWS installation

Step-by-step notes for mapr installation on AWS instances.

Basic configuration

For each node:

configure disk - 96GB volume split into 32GB disks per node (mapr requires at least 3 disks per node, flatfile can be used as well)
add mapr user and group (uid=2000,gid=2000)
update ubuntu repository list /etc/apt/sources.list # Mapr 1.2.9 deb http://package.mapr.com/releases/v1.2.9/ubuntu/ mapr optional deb http://package.mapr.com/releases/ecosystem/ubuntu binary/
install Sun JDK6 via https://github.com/flexiondotorg/oab-java6 (it is recommended by MapR)
run apt-get update

Plan deployment

Notes: one cldb node, at least 3 zookeper, each node should include mapr-fileserver

nodeA: mapr-cldb mapr-webserver mapr-zookeeper mapr-fileserver
nodeB: mapr-fileserver mapr-zookeeper
nodeC: mapr-fileserver mapr-zookeeper
nodeD: mapr-fileserver

Run apt-get install for listed package above on corresponding nodes.

####Configure cloud

At each node run: ./configure.sh -C -Z -N mapr_0xdata

####Configure disks /opt/install-mapr/disks.txt contains a list of disk used by MapR-FS Run /opt/mapr/server/disksetup -F disks.txt

Run services: mapr-cldb, mapr-zookeper, mapr-mfs (at each node when they are installed)

Go to https://:8443/ to see if the cluster is accessible.

####Configure volumes TODO

Kevin's on the trail notes

it's a little confusing keeping track of all these hadoop/cloudera/mapr command variants

mapr uses the original "hadoop" name (instead of cloudera's new hdfs)

Note the schema is maprfs, not hdfs. I got distracted trying to use MapR's NFS support, but dropped that. Then got distracted trying to install just their client on other boxes to talk to them. Finally just decided to use the client on the MapR box that got installed with the node..that worked.

$ hadoop fs -ls  maprfs://192.168.1.170:7222/
Found 1 items
drwxr-xr-x   - mapr mapr          1 2012-10-24 19:13 /var

$ hadoop fs -ls  maprfs://192.168.1.170:7222/
Found 1 items
drwxr-xr-x   - mapr mapr          1 2012-10-24 19:13 /var

$ hadoop fs -ls  maprfs://192.168.1.170:7222/var
Found 1 items
drwxr-xr-x   - mapr mapr          1 2012-10-24 19:13 /var/mapr

$ hadoop fs -ls  maprfs://192.168.1.170:7222/var/mapr
Found 1 items
drwxr-xr-x   - mapr mapr          3 2012-10-26 17:02 /var/mapr/metrics

$ hadoop fs -ls  maprfs://192.168.1.170:7222/var/mapr/metrics
Found 3 items
-rw-r--r--   3 mapr mapr       8951 2012-10-25 16:32 /var/mapr/metrics/October.25.2012
-rw-r--r--   3 mapr mapr       9993 2012-10-26 16:32 /var/mapr/metrics/October.26.2012
-rw-r--r--   3 mapr mapr        215 2012-10-26 17:02 /var/mapr/metrics/October.27.2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly