Author Archives: Chris Alexander

Redis + systemd-nspawn

This is me re-visiting something I made a few years back for a production workload using utilities “not ready for production”. What could go wrong?

The goal was thousands of redis processes with some isolation. Previously their was none and everyone was root. This isn’t the entire process, but some of the main configuration points.

## Unmess sh
cd /bin                                                                            
sudo rm sh                                                                         
sudo ln -s bash sh                                                                 
                                                                                   
## The usual suspects                                                              
sudo apt-get update                                                                
sudo apt-get install -y \                                                          
    build-essential \                                                              
    git-core \                                                                     
    xfsprogs \                                                                     
    dbus \                                                                         
    debootstrap \                                                                  
    curl \                                                                         
    lsof        

Here we are just making the container mount points and sources. I don’t know why logs are stored in /mnt, it wasn’t my decision.

## Build the container                                                          
sudo mkdir -p /var/lib/machines/redis                                           
sudo mkdir /opt/null                                                            
sudo debootstrap --arch=amd64 jessie /var/lib/machines/redis/                   
                                                                                
## Create redis user                                                            
sudo useradd -u 9000 -s /bin/false -m redis                                     
sudo useradd -u 9000 -s /bin/false -R /var/lib/machines/redis/ redis            
                                                                                
## Modify /mnt permissions                                                      
sudo chmod g+w /mnt                                                             
sudo chown :redis /mnt                                                          
                                                                                
## Modify redis ulimits                                                         
sudo tee /etc/security/limits.d/redistogo.conf <<'EOF'                          
redis   soft    nofile  100000                                                  
redis   hard    nofile  100000                                                  
EOF                                                                   

systemd-nspawn uses D-Bus. D-Bus has a problem where it limits the number of connections a user, including root, can make. Hitting this max causes the system to die and have to be restarted. I was unable to save it from this and I lost the link to the response from developers. It's a hardcoded config but it can be configured.

## Configure DBuS to not suck                                                          
sudo tee /etc/dbus-1/system.d/redisprov.conf <<'EOF'                               
<busconfig>                                                                        
    <limit name="service_start_timeout">120000</limit>                             
    <limit name="auth_timeout">240000</limit>                                      
    <limit name="pending_fd_timeout">150000</limit>                                
    <limit name="max_completed_connections">100000</limit>                      
    <limit name="max_incomplete_connections">10000</limit>                      
    <limit name="max_connections_per_user">100000000</limit>                    
    <limit name="max_pending_service_starts">10000</limit>                      
    <limit name="max_names_per_connection">50000</limit>                        
    <limit name="max_match_rules_per_connection">50000</limit>                  
    <limit name="max_replies_per_connection">50000</limit>                      
</busconfig>                                                                    
EOF         

Now we write a few unit files that build the container. The symlinking allows you to use a variable startup to launch different processes and use different ports.

##                                                                              
# Install Redis systemd unit files                                              
#                                                                               
#   Note(crainte): Not certain why but a direct symlink to the service file     
#                  will not allow the dynamic version to start. We have to      
#                  do the include for it to pick up the correct variable.       
#                                                                               
#   Examples:                                                                   
#       systemctl status 2.8.21@9000                                            
#                                                                               
#       systemctl start 2.8.21@9000                                             
#                                                                               
#       systemctl stop 2.8.21@9000                                              
#                                                                               
#       machinectl status redis-2.8.21-9000                                     
##                                                                              
                                                                                
sudo tee /etc/systemd/system/redisprov.include <<'EOF'                          
.include /etc/systemd/system/redisprov.service                                  
EOF                                                                             
                                                                                
sudo tee /etc/systemd/system/redisprov.service <<'EOF'                          
[Unit]                                                                          
Description=redisprov                                                           
                                                                                
[Service]                                                                       
LimitNOFILE=100000                                                              
ExecStart=/usr/bin/systemd-nspawn --user=redis --keep-unit --machine=redis-%p-%i -j --directory=/var/lib/machines/redis/ --bind=/usr/local/bin/redis/%p/:/bin/ --bind=/mnt --bind=/opt/null:/sbin --bind=/home/redis/%i redis-server /home/redis/%i/rprov.conf
Restart=always                                                                  
SuccessExitStatus=1                                                             
                                                                                
[Install]                                                                       
Also=dbus.service                                                               
EOF

Now after "systemctl start 2.8.21@9000" & "systemctl start 2.8.21@9001" you will have two redis processes running under nspawn. As I mentioned, these are notes from a few years back. If revisiting I could most likely remove the debootstrap option since redis isn't using that. But if you want to run a more full featured application under nspawn, you might need it.

Good luck.

Ubuntu 12 + lxc + Rackspace

First let’s build an Ubuntu 12 server

cloud> servers create
Server name: lxc-host
Image ID: 125
Flavor ID: 3
server{
   "server" : {
      "status" : "BUILD",
      "progress" : 0,
      "name" : "lxc-host",
      "imageId" : 125,
      "addresses" : {
         "private" : [
            "127.0.0.1"
         ],
         "public" : [
            "127.0.0.2"
         ]
      },
      "flavorId" : 3,
      "hostId" : "c395c82dd1963e52098e77819fda2b86",
      "metadata" : {},
      "id" : 20898849
   }
}

Make sure everything is up-to-date

root@lxc-host:~# apt-get update
root@lxc-host:~# apt-get upgrade

Install some necessary utilities for this.

root@lxc-host:~# apt-get install lxc debootstrap bridge-utils screen

Now we need to make the bridge in /etc/network/interfaces

# lxc bridge
auto lxcbr0
iface lxcbr0 inet static
        address 192.168.0.1
        netmask 255.255.255.0
        post-up echo 1 > /proc/sys/net/ipv4/ip_forward
        post-up iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
        pre-down echo 0 > /proc/sys/net/ipv4/ip_forward
        pre-down iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE
        bridge_ports none
        bridge_stp off

Activating the bridge…

root@lxc-host:~# ifup lxcbr0

Now it’s time to create the first container. The available templates should be listed here:

root@lxc-host:~# ls /usr/lib/lxc/templates/
lxc-busybox  lxc-debian  lxc-fedora  lxc-opensuse  lxc-sshd  lxc-ubuntu  lxc-ubuntu-cloud

To be special let’s make a fedora guest

root@lxc-host:~# vim /etc/lxc/auto/fedora.conf
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = lxcbr0
lxc.network.ipv4 = 192.168.0.2/24
lxc.network.name = eth0
lxc.cgroup.cpu.shares = 512
lxc.cgroup.memory.limit_in_bytes = 1024M
lxc.cgroup.memory.memsw.limit_in_bytes = 3072M

But we might need something first

root@lxc-host:~# apt-get install yum

The actual lxc-create syntax

root@lxc-host:~# lxc-create -n fedora-guest -t fedora -f /etc/lxc/auto/fedora.conf 
root@lxc-host:~# vim /var/lib/lxc/fedora-guest/rootfs/etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
#BOOTPROTO=dhcp
ONBOOT=yes
HOSTNAME=fedora-guest
NM_CONTROLLED=no
TYPE=Ethernet
MTU=
BOOTPROTO=static
IPADDR=192.168.0.2
NETMASK=255.255.255.0
GATEWAY=192.168.0.1
root@lxc-host:~# screen -dmS init-fedora-guest lxc-start -n fedora-guest
root@lxc-host:~# screen -ls
There is a screen on:
	19959.init-fedora-guest	(06/06/2012 09:55:39 AM)	(Detached)

And voila! I’ll admit this fedora install seems somewhat messed up, but that’s a problem with the template itself. You can correct the install or use something else like ubuntu/ubuntu-cloud for better results. Let me know if you have questions.

iSCSI performance in the cloud

This is one of the more entertaining things I’ve done at work and according to the emails i received from co-workers one of the most controversial. Someone asked if this could be done, and i figured i’d give it a shot. This will be in two parts, the first being FileIO since I was having problems trying to get partition re-sizes to work in the cloud. Part two will be BlockIO once I get around to fixing the kernels.

I used CentOS 6 for all of the servers involved in this setup and all network traffic is over the internal service network.

[root@vsan1 ~]# dd if=/dev/zero of=/root/block.img bs=4M count=1250

The other iSCSI drive is a RAID 0 configuration using a much smaller block size for comparison.

[root@vsan2 ~]# dd if=/dev/zero of=/root/block.img bs=512K count=30720 
[root@vsan3 ~]# dd if=/dev/zero of=/root/block.img bs=512K count=30720
[root@master ~]# mdadm -Cv /dev/md0 -l0 -n2 -c128 /dev/sdb1 /dev/sdc1

All of the targets are setup the same way:

Target iqn.2012-03.local.node:vsan2 
Lun 0 Path=/root/block.img,Type=fileio,IOMode=wb 
Alias vsan2 
ImmediateData Yes 
MaxConnections 2 
InitialR2T No

I’m just using hdparm for basic tests. I wasn’t really inspired to test further. For comparison here is the actual block device offered by xenserver:

[root@master ~]# hdparm -tT /dev/xvda 
/dev/xvda: 
Timing cached reads: 1948 MB in 1.99 seconds = 977.19 MB/sec 
Timing buffered disk reads: 82 MB in 3.10 seconds = 26.45 MB/sec

That is actually lower than what I received before during my tests, but it’s what I got when writing this response. Now to test the single drive, 4M block size device:

[root@master ~]# hdparm -tT /dev/sda 
/dev/sda: 
Timing cached reads: 1696 MB in 1.99 seconds = 850.29 MB/sec 
Timing buffered disk reads: 16 MB in 3.40 seconds = 4.71 MB/sec

And finally for the RAID0 performance results:

[root@master ~]# hdparm -tT /dev/md0 
/dev/md0: 
Timing cached reads: 1968 MB in 1.99 seconds = 987.01 MB/sec 
Timing buffered disk reads: 28 MB in 3.04 seconds = 9.21 MB/sec

Kinda meh results, but that’s what you can expect from fileio. I’m really looking forward to the blockio tests, but the kernels on the gentoo images are 3.0+ and I need 2.6+ for iscsi to work. When I get around to it i’ll finish setting that up.