LXC Linux Containers. Lightweight isolation. Create more hadoop clusters on a set of machines

Overview

http://en.wikipedia.org/wiki/LXC

LXC has really only become production worthy in the last year? Seems better than 2010-12 era. Here's a 2013 article talking up container tech: http://www.linuxjournal.com/content/containers%E2%80%94not-virtual-machines%E2%80%94are-future-cloud

(This is the same as Docker technology)

LXC (Linux Containers) is an operating system–level virtualization method for running multiple isolated Linux systems (containers) on a single control host.

The Linux kernel comprises cgroups for resource isolation (CPU, memory, block I/O, network, etc.) that does not require starting any virtual machines. Cgroups also provides namespace isolation to completely isolate applications' view of the operating environment, including process trees, network, user ids and mounted file systems.

LXC combines cgroups and namespace support to provide an isolated environment for applications. Docker can also use LXC as one of its execution drivers, enabling image management and providing deployment services.

LXC provides operating system-level virtualization through a virtual environment that has its own process and network space, instead of creating a full-fledged virtual machine. LXC relies on the Linux kernel cgroups functionality that was released in version 2.6.24. It also relies on other kinds of namespace-isolation functionality, which were developed and integrated into the mainline Linux kernel.

Oracle has some nice user level documentation here: http://docs.oracle.com/cd/E37670_01/E37355/html/ol_containers.html

Things I'll explore later

some deeper stuff with vagrant I might explore http://containerops.org/2013/11/19/lxc-networking/

More powerful use of lxc https://www.stgraber.org/2013/12/21/lxc-1-0-your-second-container/

I'm going to try attaching raw disk devices for use in mapr. Not sure if it will work. like this (but with partition names)

container has to be running

sudo lxc-device -n p1 add /dev/sdb /dev/sdb
sudo lxc-device -n p1 add /dev/sdb5 /dev/sdb5
sudo lxc-device -n p1 add /dev/sdb6 /dev/sdb6
sudo lxc-device -n p1 add /dev/sdb7 /dev/sdb7

gparted sees them if I do the above, but can't seem to find a superblock? Maybe that won't hurt MapR.

https://github.com/lxc/lxc/blob/master/src/lxc/lxc-device

Didn't seem to work with mapr? libvirt means a disk attach rather than a device attach

This page seems to show a method with mounts thru fstab http://it.randomthemes.com/2012/07/16/how-to-mount-disk-to-lxc-container/ Normal nfs mounts to the host would work I guess.

but we need it to be a block device..from http://s3hh.wordpress.com/2012/10/22/easily-making-a-blockdev-available-to-a-container/

But he says lxc-device now should work?

clusterssh and byobu are your friends..even more so with dual monitors!

screens

apt-get install byobu
# to run
byobu

apt-get install clusterssh
# to run
clusterssh -o "-X" -l root 192.168.1.171 192.168.1.172 192.168.1.173 192.168.1.174 192.168.1.175 192.168.1.176 192.168.1.177 192.168.1.178 192.168.1.179 192.168.1.180

I typically add -X for X windows stuff. so I can get X stuff back on my local machine if needed.

byobu: nicely if you detach, you can ssh back and byobu again, and reattach to the old session.

I rarely ssh directly to the container IPs, but you can since the setup below makes them public just like any other machine

I use clusterssh to ssh to all machines, then byobu. Then I use f2 to create a new set of screens, and lxc-stop -n cntr1 and lxc-start -n cntr1 on all the machines. Then I can login. Then I have all ten of the virtual machines, and I can f3 to shift byobu to the 10 host machines

Warning: keep track of what machines you're on! It's really easy to trash the wrong machines.

All of the container machines I create, end in -cntr so

mr-0x1-cntr1

is on

mr-0x1

ifconfig afterwards

Here's ifconfig on the host with a running container. The lxcbr0 is left over, because I didn't delete it as described above. It doesn't hurt anyone. Note that the eth0 doesn't get the ip address, the br0 does.

The veth* is the ethernet device for the container.

I left in tun0 which is my vpn tunnel from home. I had to use vpnc-connect for the vpn, because the ubuntu network-manager gets disabled when it sees I modified /etc/network/interfaces, and I can't use the gui for the vpn start. lBut with vpnc, I can vpn even with the bridged eth0.

I show all this, to show that it's robust even in a more complicated case.

br0   Link encap:Ethernet  HWaddr d4:3d:7e:18:db:22  
      inet addr:192.168.0.34  Bcast:192.168.0.255  Mask:255.255.255.0
      inet6 addr: fe80::d63d:7eff:fe18:db22/64 Scope:Link
      UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
...

eth0  Link encap:Ethernet  HWaddr d4:3d:7e:18:db:22  
      inet6 addr: fe80::d63d:7eff:fe18:db22/64 Scope:Link
      UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
...

lo    Link encap:Local Loopback  
      inet addr:127.0.0.1  Mask:255.0.0.0
      inet6 addr: ::1/128 Scope:Host
      UP LOOPBACK RUNNING  MTU:65536  Metric:1
...

lxcbr0 Link encap:Ethernet  HWaddr 2e:2c:9c:54:39:a3  
      inet addr:10.0.3.1  Bcast:10.0.3.255  Mask:255.255.255.0
      inet6 addr: fe80::2c2c:9cff:fe54:39a3/64 Scope:Link
      UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
...

tun0  Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
      inet addr:192.168.1.226  P-t-P:192.168.1.226  Mask:255.255.255.255
      UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1412  Metric:1
...


vethL7EZtF Link encap:Ethernet  HWaddr fa:8f:c0:f6:36:f7  
      inet6 addr: fe80::f88f:c0ff:fef6:36f7/64 Scope:Link
      UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
...

Installing LXC and doing config before initial container create/start

Links are dropped in for additional research if curious. What is used is not exactly described at any site, so treat as research.

https://www.digitalocean.com/community/tutorials/getting-started-with-lxc-on-an-ubuntu-13-04-vps

I use the latest backport from the developers for lxc install. I need the lxc-include functionality in the config, for network config maintenance. That wasn't in the normal ubuntu apt-get install. Here's the ppa:

This PPA contains backports of stable version of LXC for all supported Ubuntu releases. Note this is NOT the lts release, which I think is 1.0

I think the stable release is lxc 1.1.x? https://launchpad.net/~ubuntu-lxc/+archive/ubuntu/lxc-stable

lts release https://launchpad.net/~ubuntu-lxc/+archive/ubuntu/lxc-lts

to setup the ppa for the lts release as root:

apt-get update
add-apt-repository ppa:ubuntu-lxc/lts
apt-get update
apt-get upgrade

That creates entries for stable first, then lts in /etc/sources.list.d/ubuntu-lxc-stable-precise.list It's good to be careful and pay attention on any apt upgrade that updates lxc. You may want to take the new configuration files and modify them to include your old changes rather than just keep your old (when prompted

I had to go back to the lts release. I couldn't get /proc/meminfo and cpuinfo and top and ps in the container (not sure how the 1.1 lxc is doing things there.

deb http://ppa.launchpad.net/ubuntu-lxc/lts/ubuntu precise main
deb-src http://ppa.launchpad.net/ubuntu-lxc/lts/ubuntu precise main

here's some versions that get installed as of 4/18/15 from stable that I couldn't use..had to apt-get purge and reinstall

Setting up libseccomp2 (2.1.1-1~ubuntu12.04.1~ppa1) ...
Setting up liblxc1 (1.1.2-0ubuntu3~ubuntu12.04.1~ppa1) ...
Setting up python3-lxc (1.1.2-0ubuntu3~ubuntu12.04.1~ppa1) ...
Setting up lxc (1.1.2-0ubuntu3~ubuntu12.04.1~ppa1) ...
Installing new version of config file /etc/apparmor.d/lxc/lxc-default ...
Installing new version of config file /etc/apparmor.d/usr.bin.lxc-start ...
Installing new version of config file /etc/default/lxc ...
Installing new version of config file /etc/init/lxc-net.conf ...
Installing new version of config file /etc/init/lxc.conf ...
Preserving user changes to /etc/dnsmasq.d-available/lxc (renamed from /etc/dnsmasq.d/lxc)...
The dnsmasq configuration has been migrated twice, fixing it.
Setting up lxc dnsmasq configuration.
Setting up lxc-templates (1.1.2-0ubuntu3~ubuntu12.04.1~ppa1) ...
Setting up lxcfs (0.7-0ubuntu2~ubuntu12.04.1~ppa1) ...

Note that lxc-shutdown is gone with the latest 1.1 lxc. The lxc-stop is what you want: a clean shutdown, followed by kill if that doesn't work. Has --no_kill if you want to avoid a kill.

http://man7.org/linux/man-pages/man1/lxc-stop.1.html

apt-get install lxc lxctl bridge-utils lxc-templates

lxctl and lxc-templates might not be needed. I believe the above installs the lxc web panel. nothing extra needed for that if you want to use (I don't)

also see:

http://sylvain.fankhauser.name/setting-up-lxc-containers-in-30-minutes-debian-wheezy.html

To check:

lxc-checkconfig

Turning off the default NAT, since we're going to expose the IPs as public

Old note..i don't have this issue any more: I think the dnsmasq that's running for the containers lxcbr0 is messing with my external dns server setup. I pkill'ed it, and my dns started working Ended up having to put the dns server in the container resolv.conf manually.

sudo vi /etc/resolvconf/resolv.conf.d/base
nameserver 172.16.0.200

Disabling the default LXC bridge

LXC config files. have to sort out why so many now. Some say they are autogenerated. Maybe I'm not restarting LXC service when I should.

http://bj0z.wordpress.com/2011/08/19/howto-build-a-base-lxc-container-in-ubuntu-11-04/

edit these files:

/etc/default/lxc 
# update: apparently these files arrived with latest lxc?
/etc/init/lxc-net.conf 
/etc/default/lxc-net

this line

USE_LXC_BRIDGE="true"

to

USE_LXC_BRIDGE="false"

and restart lxc? (reboot?)

On a running system, after you've done everything below, you can remove the lxcbr0 bridge (unused) if you forget this step (although it will be recreated if you don't do the above, on reboot?)

The bridge-utils package provides the brctl and bridge_ports extenstion to /etc/network interfaces

apt-get install bridge-utils

Some say libvirt should be used instead, because of suspected bug in bridge-utils when used with some other VM environments? I've not had any issues with bridge-utils, and don't have another VM environment.

Removing existing lxcbr0 bridge (not critical)

ip link set lxcbr0 down
brctl delbr lxcbr0 
ifconfig

Before booting a new kernel, you can check its configuration

 CONFIG=/path/to/config /usr/bin/lxc-checkconfig

Pointing to another place for the created lxc files

https://help.ubuntu.com/12.04/serverguide/lxc.html

I have /home2 on 171-175 and /home3 on 176-180 and 181-190 partitioned for use by lxc It's just normal filesystem so visible from the host. It's not like the hidden mapr partitions.

I replace the normal lxc directories with symbolic links: could make it home3 and sit on home2 but that can be confusing and create hidden files if home2 isn't mounted

NEWDIR=home2
sudo mkdir /$NEWDIR/lxclib /$NEWDIR/lxccache

NEWDIR=home3
sudo mkdir /$NEWDIR/lxclib /$NEWDIR/lxccache

# don't do these if the directories already exist above! assume these links were created
sudo rm -rf /var/lib/lxc /var/cache/lxc
sudo ln -s /$NEWDIR/lxclib /var/lib/lxc
sudo ln -s /$NEWDIR/lxccache /var/cache/lxc

Creating containers

My current setup only requires /etc/hostname to be changed (because it wants to be globally unique) and the my_cntr1_network file to be modified with ip/mac info. The mac info is randomly generated on a lxc-create, and should use that on into the my_cntr1_network (cut it out)

as root:

cd /var/lib/lxc
lxc-create -n cntr1 -t ubuntu

or (change subsequent names to trusty1 if this container is used)

lxc-create -t download -n trusty1 -- --dist ubuntu --release trusty --arch amd64


cd cntr1
# edit config. Cut/save these lines out to ../my_cntr1_network. Make it one dir above so not lost if lxc-destroy

# Network configuration. first 3 lines are unique. kbn 9/9/14
lxc.utsname = mr-0x10-cntr1
lxc.network.hwaddr = 00:16:3e:8d:77:2b
lxc.network.ipv4 = 172.16.2.211/16 172.16.255.255

lxc.network.ipv4.gateway = 172.16.0.1
lxc.network.type = veth
lxc.network.link = br0
# up should be last 
lxc.network.flags = up

Then add this to cntr1/config (remember you have the latest lxc installed via ppa, to use lxc.include)

lxc.include = /var/lib/lxc/my_cntr1_network

Don't say lxc-include in error. No error detecting, it will just be ignored.

Host eth0 is changed to be a bridge. I had problems adding a separate bridge, requires NAT forwarding and maybe promiscious mode, which has performance issues? This seems robust (tried on both 12.04 LTS and hwe trusty (kernel 3.2 and 3.13)

edit /etc/network/interfaces on host. Best to reboot after, but /etc/init.d/networking restart sometimes is enough. MAKE SURE THIS IS EXACTLY RIGHT (subsitute correct addresses for your machine) and bridge-utils is installed. Otherwise bridge_ports won't work, and you'll have no network access and need to plug in a console to fix! hard to do if remote and no IPMI!

<on host>
sudo vi /etc/network/interfaces

auto lo
iface lo inet loopback

# double check that your device is eth0! it might be eth1, eth2 or eth3 due to renaming. 
# Use ifconfig.
auto eth0
iface eth0 inet manual

auto br0
iface br0 inet static
    # double check that your device is eth0! it might be eth1, eth2 or eth3 due to renaming
    bridge_ports eth0
    bridge_fd 0
    bridge_maxwait 0
    bridge_stp off

    address 172.16.2.180
    # Note my network uses a supernetted netmask here. Adjust as necessary
    netmask 255.255.0.0
    # inline comments might break things. Don't do. These are not needed.
    # network 172.16.0.0
    # broadcast 172.16.255.255
    gateway 172.16.0.1
    dns-nameservers 172.16.0.200

Heads up: if you do this on your home machine, the network-manager won't be usable anymore, so you can't start vpn with it. I installed vpnc which is nice, and use vpnc-connect and vpnc-disconnect. vpnc uses a config file, search for "vpnc package" on this page https://help.ubuntu.com/community/VPNClient for instructions (or google).

So at home I can test LXC and still have a VPN connected, even though network-manager disables itself because it saw /etc/network/interfaces was in use (the default Ubuntu /etc/NetworkManager/NetworkManager.conf has managed=true)

The 0xdata machines have static ips, and network-manager removed

 apt-get purge network-manager

Some of that might be okay with default settings, but I include all for clarity. See resulting ifconfig above

/etc/init.d/networking restart

Might be enough, but to be sure, you should reboot.

LXC create/start/stop

lxc-create -n cntr1 -t ubuntu

With the latest lxc stuff, you'll see error messages from some services being terminated. That's okay.

lxc-start -n cntr1

output looks like. Because of dhcp delays? you might have to wait 30 secs to login

root@mr-0x5:/var/lib/lxc/cntr1# lxc-start -n cntr1
<4>init: hwclock main process (7) terminated with status 77
<4>init: ureadahead main process (8) terminated with status 5
<4>init: udev-fallback-graphics main process (53) terminated with status 1
<4>init: setvtrgb main process (71) terminated with status 1
<4>init: console-setup main process (100) terminated with status 1
<30>udevd[149]: starting version 175

Stopping (this does clean stop. Kills if necessary. This is in the new lxc (you don't need the older twwo commands)

lxc-stop -n cntr1

lxc-restart exists in the old lxc, which combines stop and start. But the release I've installed above, doesn't have it. Don't install lxc with apt-get if prompted. You want to use lxc-stop and lxc-start insted

I never use this console attach

lxc-console -n cntr1

I do use this to get to the command line in the container if networking is broken there and no ssh. You do this from the host, and it gets you to the container command line

lxc-attach -n cntr1

Cloning containers

UPDATE: I'm having problems with the network config when I clone. I now copy the original config (cntr1/config) to the cloned (cntr2/config) and then edit it and s/cntr1/cntr2. The cloned config looks very different than the original. I suspect maybe with the latest lxc they have some issues? This hand copy/edit of the config seems to make the clone work. I sometimes have to also add the network stuff to /etc/network/interfaces inside the clone and stop/start. Not sure if I need that always, yet.

For rapid provisioning, you may wish to customize a canonical container according to your needs and then make multiple copies of it. This can be done with the lxc-clone program. Given an existing container called C1, a new container called C2 can be created using

# but I have to hardwire the IP's correctly? and hostname maybe
sudo lxc-clone -o C1 -n C2

Monitoring and Shutting Down Containers (and seeing container processes from host)

from http://docs.oracle.com/cd/E37670_01/E37355/html/ol_shutdown_containers.html

To display the containers that are configured, use the lxc-ls (or lxc-ls -f) command on the host.

[root@host ~]# lxc-ls
ol6ctr1
ol6ctr2

To display the containers that are running on the host system, specify the --active option.

[root@host ~]# lxc-ls --active
ol6ctr1

To display the state of a container, use the lxc-info command on the host.

[root@host ~]# lxc-info -n ol6ctr1
state:  RUNNING
pid:    10171

To view the state of the processes in the container from the host, either run ps -ef --forest and look for the process tree below the lxc-start process or use the lxc-attach command to run the ps command in the container.

[root@host ~]# ps -ef --forest

If you were logged into the container, the output from the ps -ef command would look similar to the following.

[root@ol6ctr1 ~]# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 07:58 ?        00:00:00 /sbin/init
root       183     1  0 07:58 ?        00:00:00 /sbin/dhclient -H ol6ctr1 ...
root       206     1  0 07:58 ?        00:00:00 /sbin/rsyslogd -i ...
root       247     1  0 07:58 ?        00:00:00 /usr/sbin/sshd
root       254     1  0 07:58 lxc/console 00:00:00 /sbin/mingetty /dev/console
root       258     1  0 07:58 ?        00:00:00 login -- root
root       260     1  0 07:58 lxc/tty2 00:00:00 /sbin/mingetty /dev/tty2
root       262     1  0 07:58 lxc/tty3 00:00:00 /sbin/mingetty /dev/tty3
root       264     1  0 07:58 lxc/tty4 00:00:00 /sbin/mingetty /dev/tty4
root       268   258  0 08:04 lxc/tty1 00:00:00 -bash
root       279   268  0 08:04 lxc/tty1 00:00:00 ps -ef

Note that the process numbers differ from those of the same processes on the host, and that they all descend from the process 1, /sbin/init, in the container.

To suspend or resume the execution of a container, use the lxc-freeze and lxc-unfreeze commands on the host.

[root@host ~]# lxc-freeze -n ol6ctr1
[root@host ~]# lxc-unfreeze -n ol6ctr1

What to change when you log into the container

remove this line from the /etc/hosts

127.0.1.1 cntr1

The containers /etc/network/interfaces looks like this..leave it with dhcp. It will get the static ip from the lxc config but get the dns server (192.168.1.200) from dhcp

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

UPDATE: I've been just changing and this works. Have to not race against a dhcp? (first time seems fine..2nd not, without this)

auto eth0
iface eth0 inet manual

old recommendation: Check the eth0 ip with ifconfig. I have to manually set the /etc/network/interfaces in the container for some reason. Sometimes it seems like it's the only reliable way to get static ips (regardless of the outside settings) and dns nameservers. Like this:

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
    address 172.16.2.110 # change
    # Note my network uses a supernetted netmask here. Adjust as necessary
    netmask 255.255.0.0
    # inline comments might break things. don't do that. These are unneeded.
    # network 172.16.0.0 
    # broadcast 172.16.255.255
    gateway 172.16.0.1
    dns-nameservers 172.16.0.200

Then /etc/init.d/networking restart

make hostname unique.

hostname <hostname>
vi /etc/hostname

i sudo sh, use passwd to give root a password.

apt-get install vim

edit ~/.vimrc

:set expandtab
au BufEnter * set tabstop=4 shiftwidth=4
au BufEnter *.java set tabstop=2 shiftwidth=2

set timezone with

sudo dpkg-reconfigure tzdata

or command line only (PST)

echo "America/Los_Angeles" | sudo tee /etc/timezone
sudo dpkg-reconfigure --frontend noninteractive tzdata

gives

Current default time zone: 'America/Los_Angeles'
Local time is now:      Wed Oct  8 21:18:11 PDT 2014.
Universal Time is now:  Thu Oct  9 04:18:11 UTC 2014.

Can install ntp if you want

apt-get install ntp

because of typing 'y' or 'yes' to lots of parallel machines and getting wedged if the machine was not in sync with the others and didn't expect 'y' or 'yes' (y repeatedly outputs y) I add this to the .bashrc for root and maybe kevin and jenkins

alias y=/bin/ls
alias yes=/bin/ls

I shorten the failsafe timeout to get fast boot http://tech.pedersen-live.com/2012/05/disable-waiting-for-network-configuration-messages-on-ubuntu-boot/

vi /etc/init/failsafe.conf

change the first sleep 20 :question: to sleep 5
change the sleep 40 to sleep 15
change the sleep 59 to sleep 15

This is about waiting for the network to be 'configured'

need showmount

apt-get install nfs-common

so I can do this

showmount -e mr-0xs3
Export list for mr-0xs3:
/mnt/mr-0xs3-pool/hdp2.1_hdfs_datasets (everyone)

Install java with ppa per http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html

ubuntu 12.04?

apt-get update
sudo apt-get install python-software-properties

ubuntu 14.04

apt-get update
sudo apt-get install software-properties-common

both:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
sudo apt-get install oracle-java8-installer
sudo apt-get install oracle-java8-set-default
sudo apt-get install oracle-java7-set-default

other things I like for our 0xdata "standard machines"

ssh:

apt-get install openssh-server

copy things in /root from mr-0xe1 (some shell scripts) and do some. Setup vnc server? maybe yeah. Setup user passwords

dealing with slow ssh login

http://askubuntu.com/questions/246323/why-does-sshs-password-prompt-take-so-long-to-appear I turn off reverse dns lookup on ssh login

There are several things that can go wrong. Add -vvv to make ssh print a detailed trace of what it's doing, and see where it's pausing.

I turn off reverse DNS lookup in /etc/ssh/sshd_config. And GSSAPI while you're at it.

UseDNS no
GSSAPIAuthentication no

then

service ssh restart

It's possible the initial lxc container setup, then changing the dns, leaves a bad dns or something in /etc/resolve.conf or somewhere? not sure.

I used to install autofs. Maybe not on newer machines. (and not on LXC machines?) I did have it on the host machines.

copy /etc/ssh/sshd_config from an existing 0xdata machine (details about max starts/sessions

in case you need to get stuff from s3

this is only version 1.0. don't install (different path)

apt-get install s3cmd

latest from source forge..as of 6/15 at http://sourceforge.net/projects/s3tools/files/s3cmd/1.5.2/

wget http://sourceforge.net/projects/s3tools/files/s3cmd/1.5.2/s3cmd-1.5.2.tar.gz
tar -xvzf s3cmd*tar.gz
cd s3cmd*
python ./setup.py install
s3cmd --version

aptitude is good

apt-get install aptitude
aptitude update -y

path resolution

apt-get install realpath

monitoring tools:

apt-get install htop
apt-get install iotop
apt-get install saidar

in /etc/profile

export JAVA_HOME=/usr/lib/jvm/java-7-oracle
/opt/who.sh  #custom script to look for h2o processes

For R/numpy/scipy

sudo apt-get install libcurl4-openssl-dev

Probably want these for liblinear etc

sudo apt-get install libblas3gf -y
sudo apt-get install libblas-doc -y
sudo apt-get install libblas-dev -y

sudo apt-get install liblapack3gf -y --reinstall
sudo apt-get install liblapack3gf-base -y --reinstall
sudo apt-get install liblapack-doc -y
sudo apt-get install liblapack-dev -y --reinstall

sudo ln -s /usr/lib/liblapack.so.3gf /usr/lib/liblapack.so.3
sudo ln -s /usr/lib/libblas.so.3gf /usr/lib/libblas.so.3

seem to required these links..can't find something that installs them right and some r packages look for them

If I copy the /usr/local/lib/R/site-library and try to do library("LiblineaR") or others in R...sometimes it can't find libRblas.so (and maybe others: libRlapack.so)

seems like if I copy these two files to /usr/local/libR/lib and create links to them in /usr/lib things are okay. But not sure why my install didn't set them up right. Do I have old copies of packages and this would go right if I install.packages("..") them in R rather than copying the site-library? or ??

cd /usr/local/lib/R scp -p -r <another machine's copy> /usr/local/lib/R/lib . cd /usr/lib ln -s /usr/local/lib/R/lib/libRlapack.so ln -s /usr/local/lib/R/lib/libRblas.so

add to /etc/apt/sources.list

deb http://cran.stat.ucla.edu/bin/linux/ubuntu precise/
deb-src http://cran.stat.ucla.edu/bin/linux/ubuntu precise/

then

apt-get update

you get

GPG error: http://cran.stat.ucla.edu precise/ Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 51716619E084DAB9

reload the missing key

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 51716619E084DAB9

then

apt-get update
apt-get install r-base

pip and setuptools

setuptools is platform dependent install see here: https://pypi.python.org/pypi/setuptoo

I suppose on ubuntu

apt-get install python-setuptools

should work. But not sure what version you get. Better to do

wget https://bootstrap.pypa.io/ez_setup.py -O - | sudo python

platform dependent ways of installing pip are at https://pip.pypa.io/en/latest/installing.html

on ubuntu

sudo apt-get install python-pip
pip install pip --upgrade

weird..that leaves pip not visible. it changed paths. Need to start a new shell to see the new path

# pip install pip --upgrade
bash: /usr/bin/pip: No such file or directory

this fixes it

apt-get install python-pip --reinstall
# you can't do --force-reinstall with the version installed above.. just upgrade first
pip install pip --upgrade
# complains about /usr/bin/pip. It's at /usr/local/bin/pip ..force the path to make it work
# probably need to start a new shell to get path right
bash
/usr/local/bin/pip install pip --upgrade --force-reinstall

To check version apparently want to update distribute too. Got this from the pip list after above

Warning: cannot find svn location for distribute===0.6.24dev-r0

So I make sure that's most recent too

But this wipes out setuptools and means distribute can't install again

pip install distribute --upgrade --force-reinstall

so just do

pip install distribute --upgrade

#  pip list | egrep '(distribute|pip|setuptools)'
distribute (0.7.3)
pip (7.0.3)
setuptools (17.0)

See, I'm thinking maybe we want people to install the latest setuptools also

doing both the pip install and upgrade (we should warn that the default platform builds, like for apt-get install python-pip, you don't get the latest pip and have to do

I like the reinstall, because it makes sure everything's clean and you can see the version. And it's all different if you have virtualenv or private place for python packages?

also we should note that this is all just for python 2.7 python 3.0 instructions are different

as they say " On Linux, pip will generally be available for the system install of python using the system package manager, although often the latest version will be unavailable."

numpy/scipy/sklearn

# sudo apt-get install python-numpy python-scipy 
sudo apt-get install python-matplotlib python-nose
sudo apt-get install python-dev

# this should have been done above
# sudo apt-get install python-setuptools
# easy_install pip

# newer versions?
pip install -U numpy scipy scikit-learn statsmodels

# upgrades?
# already done above
# pip install -U pip
pip install -U numpy scipy statsmodels

other python packages

pip install requests simplejson paramiko psutil

remove libreoffice stuff

apt-get remove libreoffice*

Current R packages we use can be copied from /usr/local/lib/R/site-library on another machine. Maybe need to install.packages("LibLineaR") in R

If rgl didn't install because of GL/gl.h in ubuntu, do this install first

sudo apt-get install r-base-dev xorg-dev libglu1-mesa-dev mesa-common-dev
sudo apt-get build-dep r-cran-rgl

probably need to add universe and multiverse repos first /etc/apt/sources.list

deb http://us.archive.ubuntu.com/ubuntu/ precise universe
deb-src http://us.archive.ubuntu.com/ubuntu/ precise universe
deb http://us.archive.ubuntu.com/ubuntu/ precise-updates universe
deb-src http://us.archive.ubuntu.com/ubuntu/ precise-updates universe

apt-get update
apt-get install r-cran-rgl

Then go into R and just do install.packages("rgl") and see that it completes without error

forcing mtu==1500 for localhost (hack..see confluence page for description)

apt-get install traceroute6.iputils
# should be bigger than 1500 by default
tracepath localhost

add this to /etc/network/interfaces (under localhost) and reboot

auto lo
iface lo inet loopback
# see explanation of 1500 mtu issue at https://0xdata.atlassian.net/wiki/pages/viewpage.action?pageId=31916232
    post-up /sbin/ethtool --offload lo tso off
    post-up /sbin/ethtool --offload lo ufo off
    post-up /sbin/ethtool --offload lo gso off
    post-up /sbin/ethtool --offload lo gro off
    post-up /sbin/ifconfig lo mtu 1500

test after reboot. should see 1500

tracepath localhost

LXC web panel (I don't use)

open browser

http://localhost:5000 username : admin password admin

web panel config file : /srv/lwp/lwp.conf

Make the lxc containers startup automatically at host boot

current info

http://man7.org/linux/man-pages/man1/lxc-autostart.1.html

says to set lxc.start.auto == 1 in the config

can check with lxc-ls --fancy

old info: To make a container autostart, you simply need to symlink its config file into the /etc/lxc/auto directory:

(I've not checked whether they really restart on reboot with this)

# with the new lxc, have to create the auto directory?
mkdir /etc/lxc/auto
ln -s /var/lib/lxc/cntr1/config /etc/lxc/auto/cntr1.conf

Specifying gateway and broadcast in the lxc config

LXC 8.0 allows

lxc.network.ipv4.gateway = 172.16.0.1

so you can specify the gateway.

You can specify the broadcast if necessary on the same line as the ipv4

lxc.network.ipv4 = 172.16.2.112/16 172.16.2.255

From Gionn ...solving nfs mount issues

https://gist.github.com/gionn/7585324 How to enable bind mount inside lxc container

When mount is returning:

STDERR: mount: block device /srv/database-data/postgres is write-protected, mounting read-only
mount: cannot mount block device /srv/database-data/postgres read-only

and dmesg shows:

[ 6944.194280] type=1400 audit(1385049795.420:32): apparmor="DENIED" operation="mount" info="failed type  match" error=-13 parent=6631 profile="lxc-container-default" name="/var/lib/postgresql/9.1/main/" pid=6632 comm="mount" srcname="/srv/database-data/postgres/" flags="rw, bind"

AppArmor is blocking mount -o bind inside the LXC container.

To enable it add in /etc/apparmor.d/lxc/lxc-default:

profile lxc-container-default flags=(attach_disconnected,mediate_deleted) {
...
    mount options=(rw, bind),
...

Reload apparmor:

# /etc/init.d/apparmor reload

To ensure read-only mounts work, you'll want mount options to be:

mount options=(rw, bind, ro),

Update (This didn't work though)

Rather than this, I put hard mounts in /etc/fstab for /mnt thngs to the underlying system's /mnt which is actually autofs. that seemed to work. how to do that is described in another section lower down

Using NFS/Autofs in a LXC container http://bridge.grumpy-troll.org/2014/03/lxc-routed-on-ubuntu/ You can mount NFS from outside the container, which is the approach I use with NAT’d containers, although then the container is unaware of the mount-point and you’re not using the same uid space.

To mount NFS inside the container, you need to tell AppArmor to allow this

# vi /etc/apparmor.d/abstractions/lxc/container-base
 ...
# service apparmor restart

The rules I add are:

# allow NFS
mount fstype=nfs,
mount fstype=nfs4,
mount fstype=rpc_pipefs,

You can then just add NFS mount-points to the /etc/fstab inside the container’s rootfs.

My current mounts for precise1 container (home-0xdiag-datasets)

This is in /var/lib/lxc/precise1/fstab

home-0xdiag-datasets was bind'ed to /exports for nfs mount reasons, so we can reuse it here (rather than /home/0xdiag/home-0xdiag-datasets. Does it matter?

To create the directory (for the mount) automatically in the container, you can also add the create=dir option in the fstab :

/exports/home-0xdiag-datasets /var/lib/lxc/precise1/rootfs/home/0xdiag/home-0xdiag-datasets none bind,create=dir

This is specific to LXC. https://lists.linuxcontainers.org/pipermail/lxc-devel/2013-December/006444.html

Just like we already had "optional", this adds two new LXC-specific mount flags:

create=dir (will do a mkdir_p on the path)
create=file (will do a mkdir_p on the dirname + a fopen on the path)

This was motivated by some of the needed bind-mounts for the unprivileged containers.

Adding Limits

cribbed from digitalocean, thanks https://www.digitalocean.com/community/tutorials/getting-started-with-lxc-on-an-ubuntu-13-04-vps

Besides just isolation, another massive benefit of using LXC is its ability to apply cgroup limits to the processes within a container.

Limits for a container are defined in its config file, which for our container can be found at /var/lib/lxc/test-container/config.

Memory limits can be used to set a maximum RAM usage for container. In this case, we'll limit our container to 50MB of memory:

lxc.cgroup.memory.limit_in_bytes = 50000000

CPU limits are defined slightly differently; unlike with memory, where physical limits are defined, CPU limits operate with CPU 'shares':

lxc.cgroup.cpu.shares = 100

These shares are not linked to any physical quantity but instead just represent relative allocations of CPU resources, meaning a container with more shares gets higher CPU access priority. The numbers used are completely arbitrary though, so giving one container 10 and another 20 is the same as giving them 1000 and 2000 respectively, as all it tells us is that the second container has twice the CPU share priority. Just ensure you are consistent with your scale between containers.

Once you've changed the cgroup limits in the config file, you'll need to shutdown and restart the container for the changes to take effect.

Alternatively, limits can be set temporarily on a running container with the lxc-cgroup command:

lxc-cgroup -n test-container cpu.shares 100

Autostart

It is often the case that you'll want the containers to autostart after a reboot, particularly if they are hosting services. By default, containers will not be started after a reboot, even if they were running prior to the shutdown.

To make a container autostart, you simply need to symlink its config file into the /etc/lxc/auto directory:

ln -s /var/lib/lxc/test-container/config /etc/lxc/auto/test-container.conf

Now running lxc-ls -f again will show that our container is setup to autostart:

# lxc-ls -f
NAME            STATE    IPV4        IPV6  AUTOSTART
----------------------------------------------------
test-container  RUNNING  10.0.3.143  -     YES

Warning if copying containers to other machines

I just had a headache with a copied container "working" but then ssh getting interrupted and ntpd binding with ipv6 not working. Also note apparently you can't disable ipv6 or things stop working? It seems like my problem was reusing a mac address I had used on another container, with the new copied container. Something in my network didn't like that (although I think all my ips and macs were unique). Things were nice after I inc'ed the mac address by one. I think the lesson is: treat MAC + container has one-use ..You create a new container, create a new MAC to use with it..always. I set the mac outside in the lxc config stuff (ipv4 only). so my container /etc/network/interfaces now just say "manual"

Warning

I was using lxc.network.ipv4.gateway = auto and it stopped working. 'ip route' showed the default route not going to my desired 172.16.0.1 gateway..went to the host machine.

looks like I have to specify both the ip address and the gateway in the container's config..not just /etc/network/interfaces

I also specify the broadcast, just to be safe i.e. for 172.16.2.211

lxc.network.ipv4 = 172.16.2.211/16 172.16.255.255
lxc.network.ipv4.gateway = 172.16.0.1

I also have it correct in the /etc/network/interfaces inside the container

now it works

Docker and LXC. Docker now has it's own libcontainer

Docker drops LXC as default execution environment by Chris Swan on Mar 13, 2014

With the release of version 0.9 Docker.io have dropped LXC as the default execution environment, replacing it with their own libcontainer. At the same time Docker now supports a much broader range of isolation tools through the use of execution drivers, which include: OpenVZ, systemd-nspawn, libvirt-lxc, libvirt-sandbox, qemu/kvm, BSD Jails, Solaris Zones, and chroot.

Libcontainer is a library written in Go that provides direct access for Docker to Linux container APIs:

Docker out of the box can now manipulate namespaces, control groups, capabilities, apparmor profiles, network interfaces and firewalling rules - all in a consistent and predictable way, and without depending on LXC or any other userland package. This drastically reduces the number of moving parts, and insulates Docker from the side-effects introduced across versions and distributions of LXC. In fact, libcontainer delivered such a boost to stability that we decided to make it the default. In other words, as of Docker 0.9, LXC is now optional.

LXC itself recently announced the release of version 1.0. Whilst Docker can still be used in combination with LXC it’s likely that most users will run with the new default that omits LXC.

Things I install on the host machine

trusty kernel backport. I do this on the host machine only, not the container. CHECK FIRST! if you're seeing trusty packages installed by apt-get update after a new install, and uname -r says 3.13.x..you don't need/want this. I did a 12.04.5 install on a haswell-e box, and it installed 3.13.x kernel for a ubuntu 12.04.5 LTS install from flash (iso).

sudo apt-get install xserver-xorg-lts-precise
hwe-support-status --verbose
sudo apt-get install linux-generic-lts-trusty xserver-xorg-lts-trusty libgl1-mesa-glx-lts-trusty linux-image-generic-lts-trusty
# or this?
# apt-get install --install-recommends linux-generic-lts-trusty xserver-xorg-lts-trusty libgl1-mesa-glx-lts-trusty
reboot

after uninstalling and reinstalling gdm and lightdm, I was dead in the water until I did this on my haswell-e box

sudo apt-get install xserver-xorg-lts-precise
apt-get install --install-recommends linux-generic-lts-trusty xserver-xorg-lts-trusty libgl1-mesa-glx-lts-trusty

after apt-get install gdm

useful for switching between:

sudo dpkg-reconfigure gdm

or

sudo dpkg-reconfigure lightdm

picking gdm as default display manager

service gdm start

had the right effect of restarting the display then

service lightdm start

restarted me back to the login, which was good.

When I removed gdm, and went back to lightdm, a restart didn't work well although

startx

fixed that, then the second time, service lightdm restart worked (after getting to a CTRL-ALT-F1 terminal)

Seems better with gdm though..ah! now I got the gdm login greeter

disabled the user list at login with

apt-get install gconf-editor
gconf-editor
apps -> gdm -> simple-greeter
check the 'disable user list' box

(look at the ubuntu install notes on confluence under network infrastructure for more details about how to get this in lightdm conf correctly on ubuntu 14.04)

To get the full text to see why things are delayed: You would need to edit the file /etc/default/grub. In this file you'll find an entry called GRUB_CMDLINE_LINUX_DEFAULT. This entry must be edited to control the display of the splash screen.

The presence of the word splash in this entry enables the splash screen, with condensed text output. Adding quiet as well, results in just the splash screen; which is the default for the desktop edition since 10.04 (Lucid Lynx). In order to enable the "normal" text start up, you would remove both of these.

So, the default for the desktop, (i.e. splash screen only):

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

For the traditional, text display:

GRUB_CMDLINE_LINUX_DEFAULT=

After editing the file, you need to run update-grub.

 sudo update-grub

To reduce the network timeout delays for dhcp (why can't I disable dhcp when I'm static everywhere??) you can try to set In /etc/dhcp/dhclient.conf.

timeout 10;
backoff-cutoff 0;
initial-interval 0;
retry 15;

See dhclient.conf manpage (man dhclient.conf) for reference.

msr-tools only on host machine (for rdmsr/wrmsr (turbo mode state)

apt-get install msr-tools

tmpreaper

apt-get install tmpreaper

Copy /etc/tmpreaper.conf from existing machine

turbostat only on the host machine

sudo apt-get install linux-tools-common
sudo modprobe msr
sudo turbostat

tools. only do these on the host machine

add-apt-repository -y ppa:yannubuntu/boot-repair
apt-get update
apt-get install boot-repair
apt-get install smartmontools
apt-get install hddtemp
apt-get install hdparm

This PPA contains the latest release of Grub Customizer.

sudo add-apt-repository ppa:danielrichter2007/grub-customizer
sudo apt-get update
sudo apt-get install grub-customizer

ipmi. I only do this on the host machine

apt-get install freeipmi-tools
apt-get install ipmitool

modprobe ipmi_devintf
modprobe ipmi_si

You can add these to /etc/modules to have them loaded automatically (just list the module names):

ipmi_devintf
ipmi_si

sensors. I only do this on the host machine

apt-get install lm-sensors
sensors-detect
sensors

edac-util. I only do this on the host machine

apt-get install edac-utils

not working on haswell system? mcelog is the replacement? (mcelog..does it work at same time as edac-utils in older systems?)

apt-get install mcelog

Only do these on the host machine

apt-get install dstat
apt-get install cpufrequtils

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LXC Linux Containers. Lightweight isolation. Create more hadoop clusters on a set of machines

Overview

Things I'll explore later

clusterssh and byobu are your friends..even more so with dual monitors!

ifconfig afterwards

Installing LXC and doing config before initial container create/start

Turning off the default NAT, since we're going to expose the IPs as public

Disabling the default LXC bridge

Pointing to another place for the created lxc files

Creating containers

LXC create/start/stop

Cloning containers

Monitoring and Shutting Down Containers (and seeing container processes from host)

What to change when you log into the container

dealing with slow ssh login

this is only version 1.0. don't install (different path)

latest from source forge..as of 6/15 at http://sourceforge.net/projects/s3tools/files/s3cmd/1.5.2/

pip and setuptools

numpy/scipy/sklearn

forcing mtu==1500 for localhost (hack..see confluence page for description)

LXC web panel (I don't use)

Make the lxc containers startup automatically at host boot

Specifying gateway and broadcast in the lxc config

From Gionn ...solving nfs mount issues

Update (This didn't work though)

My current mounts for precise1 container (home-0xdiag-datasets)

Adding Limits

Autostart

Warning if copying containers to other machines

Warning

Docker and LXC. Docker now has it's own libcontainer

Things I install on the host machine

Clone this wiki locally