How to build a custom CentOS DigitalOcean image

Unified instance images are critical to a sound solid cloud deployment. There's nothing more annoying than having to manage the dozens of different types of images just because a different provider enforces their default image on you.

While Digital Ocean doesn't seem to support custom images, kickstarts or kernel parmeters here's how I managed to do a full kickstart install!

The snippet below can be run as a single bash script. This works by first downloading the vmlinuz and initrd.img which are the minimum required to start the CentOS installer.

Next we define our kickstart file, this is optional but it allows us to automate the steps. Espically the %post section which defines the most important section regarding partition labeling.

After we've defined our kickstart, we then boot into the CentOS installer using kexec. The use of the $IP, $NETMASK and $GATEWAY variable is pulled from the metadata service, but can be substituted with hand coded values.

Once it's done, sit back and watch the install go over the remote console. You will not require any further input.

yum -y install wget curl

mkdir /boot/centos  
cd /boot/centos  
wget http://mirror.centos.org/centos/6/os/x86_64/isolinux/vmlinuz  
wget http://mirror.centos.org/centos/6/os/x86_64/isolinux/initrd.img

cat <<'EOL' > /boot/centos/kickstart.ks

skipx  
text  
install  
url --url=http://mirror.centos.org/centos/6/os/x86_64  
firewall --enabled --service=ssh  
repo --name="Base" --baseurl=http://mirror.centos.org/centos/6/os/x86_64/

rootpw  --iscrypted $6$lApTqNOAYmyCrIfy$dXt9vKgMGihzZZniafkcHyMf/QzM7iSDmcLwEVcO.IewBP0EX9HVCJJrMXsv1u2Er568sma/jdPi4dcOFDvXA0  
authconfig --enableshadow --passalgo=sha512

# System keyboard
keyboard us  
# System language
lang en_US.UTF-8  
# SELinux configuration
selinux --enforcing

# System services
services --enabled="network,sshd,rsyslog,tuned,acpid"  
# System timezone
timezone Australia/Melbourne  
# Network information
network  --bootproto=dhcp --device=eth0 --onboot=on  
#network  --bootproto=dhcp --device=eth1 --onboot=on
bootloader --location=mbr --driveorder=vda --append="crashkernel=auto rhgb quiet"  
zerombr  
clearpart --all --drives=vda  
part / --fstype=ext4 --size=1000 --grow  
#part swap --size=512
reboot

%packages --nobase
@core
@server-policy
%end

%post

# DigitalOcean customization
touch /etc/digitalocean

# DO sets the kernel param root=LABEL=DOROOT
e2label /dev/vda1 DOROOT  
sed -i -e 's?^UUID=.* / .*?LABEL=DOROOT     /           ext4    defaults,relatime  1   1?' /etc/fstab

sync;

%end
EOL

# Enable swap partition if less than 800mb RAM
totalmem=$(cat /proc/meminfo | grep MemTotal | awk '{ print $2 }')  
if [ $totalmem -lt 800000 ]; then  
  sed -i 's/#part swap --size=512/part swap --size=512/g' /boot/centos/kickstart.ks 
fi


# Booting into vmlinuz
yum install -y kexec-tools

IP=$(curl http://169.254.169.254/metadata/v1/interfaces/public/0/ipv4/address)  
NETMASK=$(curl http://169.254.169.254/metadata/v1/interfaces/public/0/ipv4/netmask)  
GATEWAY=$(curl http://169.254.169.254/metadata/v1/interfaces/public/0/ipv4/gateway)

sed -i "s/network  --bootproto=dhcp --device=eth0 --onboot=on/network  --bootproto=static --device=eth0 --onboot=on --ip=$IP --netmask=$NETMASK --gateway=$GATEWAY --nameserver 8.8.8.8,8.8.4.4/g"1 /boot/centos/kickstart.ks

kexec -l /boot/centos/vmlinuz --initrd=/boot/centos/initrd.img  --append="ip=$IP netmask=$NETMASK gateway=$GATEWAY dns=8.8.8.8 ksdevice=eth0 ks=hd:vda1:/boot/centos/kickstart.ks method=http://mirror.centos.org/centos/6/os/x86_64/ lang=en_US keymap=us"  
sleep 2  
kexec -e  

After it's installed, you will face an issue with the eth0 interface not coming up. This is related to the different kernels provided, you will need to power off the instance and select the latest CentOS kernel, making sure it matches the one installed on your system. You'll notice this when you see:
Device eth0 does not seem to be present, delaying initialization

Now you should be good to go with a fresh CentOS install, most importantly with Selinux enabled! Package it up as a snapshot using the DigitalOcean control panel and let lose with your prebuilt custom CentOS images.

If you use foreman like I do, there's a plugin available to add DigitalOcean as a compute resource. It still appears to have some issues but two lines will get you going:

yum install ruby193-rubygem-foreman_digitalocean  
service foreman restart  

Your API keys will be available https://cloud.digitalocean.com/apiaccess to add to the computeresources section.

If you intend to use this as a snapshot, a few extra things you may consider adding to your %post section

yum -y install http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm  
yum -y install https://yum.puppetlabs.com/puppetlabs-release-el-6.noarch.rpm

yum -y install cloud-init nano screen ntp ntpdate curl wget dracut-modules-growroot acpid tuned puppet openssh-clients

yum -y update

# make sure firstboot doesn't start
echo "RUN_FIRSTBOOT=NO" > /etc/sysconfig/firstboot

# set virtual-guest as default profile for tuned
echo "virtual-guest" > /etc/tune-profiles/active-profile


# Fix hostname on boot
sed -i -e 's/\(preserve_hostname:\).*/\1 False/' /etc/cloud/cloud.cfg  
sed -i '/HOSTNAME/d' /etc/sysconfig/network  
rm /etc/hostname

# Remove all mac address references
sed -i '/HWADDR/d' /etc/sysconfig/network-scripts/ifcfg-eth0  
sed -i '/HOSTNAME/d' /etc/sysconfig/network-scripts/ifcfg-eth0  
sed -i '/UUID/d' /etc/sysconfig/network-scripts/ifcfg-eth0

sed -i '/HWADDR/d' /etc/sysconfig/network-scripts/ifcfg-eth1  
sed -i '/HOSTNAME/d' /etc/sysconfig/network-scripts/ifcfg-eth1  
sed -i '/UUID/d' /etc/sysconfig/network-scripts/ifcfg-eth1

yum clean all  
rm -rf /etc/ssh/*key*

rm -f /root/.ssh/*  
rm -f /home/cloud-user/.ssh/*

rm -f /etc/udev/rules.d/*-persistent-*  
touch /etc/udev/rules.d/75-persistent-net-generator.rules  

This took me some time to hack together, but I'm happy to get this working as we can now add DO to our list of available hosting providers.

Finally, the EL6 version of cloud-init does not seem to support the DigitalOcean meta datastore, while the same version EL7 release seems to.
I haven't dug into it further but if you use cloud-init you should set the datastore to DigitalOcean, even though el6 doesn't support it. It will fall back to none and prevent the 120 seconds timeout set by cloud-init to query the incorrect datastore (EC2 by default).

echo 'datasource_list: [ DigitalOcean, None ]  
datasource:  
 DigitalOcean:
   retries: 5
   timeout: 10

' >> /etc/cloud/cloud.cfg

For el6, to obtain the SSH key, something simple like this may work:

curl http://169.254.169.254/metadata/v1/public-keys > /home/cloud-user/.ssh/authorized_keys  

If this helped and you do end up giving DigitalOcean a try, sign up with my referal link https://www.digitalocean.com/?refcode=afbafb6012b6 I get $25 credits and you get free $10 just for using the link!

Happy hacking!

Roll your own CentOS 6.5 HVM AMI in less than 15 minutes

Update:

Selinux sometimes does not install properly unless you have at least 1024MB combined memory (physical and swap).  

So after attending the AWS summit 2014 in Melbourne, I was sold on the benefits it brought vs pursuing the hassle of racking hardware and configuring the low level functions. While I find that fun and interesting.. it's probably the most time consuming effort and then to compare that up to the performance you get from AWS.. it's hard to compete..sure you can get refurb hardware, but that doesn't beat the performance AWS can give you. Enough rambling..

We all know the flaws of the marketplace centos image, but the big killer was it doesn't support HVM. I want the new t2 instance types!!! Then the community third party AMIs, are riddled with random stuff I don't know what they've done.. custom repos, disabled selinux!?!?!?! I stopped there..

Lets cut the crap, here's all you need to do:

# Create a new instance
# Just choose any CentOS 6 image with HVM and EBS support. I used  RightImage_CentOS_6.5_x64_v13.5.2_HVM_EBS (ami-45950b7f)
# Now login, RightImage seems to use root as the account

mkdir /boot/centos  
cd /boot/centos  
wget http://mirror.centos.org/centos/6/os/x86_64/isolinux/vmlinuz  
wget http://mirror.centos.org/centos/6/os/x86_64/isolinux/initrd.img

echo '  
default         0  
timeout         0  
hiddenmenu

title CentOS 6 VNC Installation  
        root (hd0,0)
        kernel /boot/centos/vmlinuz vnc vncpassword=yourvncpassword ip=dhcp xen_blkfront.sda_is_xvda=1 ksdevice=eth0 ks=https://gist.githubusercontent.com/andrewklau/9c354a43976d951bdedd/raw/266f8acc3c4a09af0cde273e6046bd9fc26ca9ea/centosami.ks method=http://mirror.centos.org/centos/6/os/x86_64/ lang=en_US keymap=us
initrd /boot/centos/initrd.img ' > /boot/grub/menu.lst

reboot  

Boom! That's it, let it go and it'll install and shutdown once it's finished. Mine took about 5 minutes, once it's done right click and press "Create Image".. that's your AMI there. A few tips, go modify the kickstart to fit your requirements, some of the key things I've got in there are all commented.

You probably also noticed the vnc and vncpassword parameter in the grub config. If you want, you can follow along with the install by connecting your vnc client to ipaddress:1 aka ipaddress:5901

I used to do these installs on my physical boxes all the time, why bother spring up that ugly java dell/ibm KVM interface when VNC looks colorful and without lag! Another thing to note, if your install seems to be taking a while.. your KS is probably borked and it's just hung. If you can ping your instance, but VNC hasn't come up yet, it's errored at a kickstart entry. I used a local VM to test my ks file first.

You can also start up a local VM and upload it as an AMI, but that's a painful process with working out all the APIs etc. I did it, and it took me a lot longer than 15 minutes. Eitherway, my new AMI just finished building so /endrant

Controlling glusterfsd CPU outbreaks with cgroups

Some of you may that same feeling when adding a new brick to your gluster replicated volume which already has an excess of 1TB data already on there and suddenly your gluster server has shot up to 500% CPU usage. What's worse is when my hosts run along side oVirt so while gluster hogged all the CPU, my VMs started to crawl, even running simple commands like top would take 30+ seconds. Not a good feeling.

My first attempt I limited the NIC's bandwidth to 200Mbps rather than the 2x1Gbps aggregated link and this calmed glusterfsd down to a healthy 50%. A temporary fix which however meant clients accessing gluster storage would be bottlenecked by that shared limit.

So off to the mailing list - a great suggestion from James/purpleidea (https://ttboj.wordpress.com/code/puppet-gluster/) on using cgroups.

The concept is simple, we limit the total CPU glusterfsd sees so when it comes to doing the checksums for self heals, replication etc. They won't have the high priority which other services such as running VMs would have. This effectively slows down replication rate in return for lower CPU usage.

First make sure you have the package (RHEL/CentOS) libcgroup

Now you want to modify /etc/cgconfig.conf so you've got something like this (keep in mind comments MUST be at the start of the line or you may get parser errors):

mount {  
    cpuset  = /cgroup/cpuset;
    cpu = /cgroup/cpu;
    cpuacct = /cgroup/cpuacct;
    memory  = /cgroup/memory;
    devices = /cgroup/devices;
    freezer = /cgroup/freezer;
    net_cls = /cgroup/net_cls;
    blkio   = /cgroup/blkio;
}
group glusterfsd {  
        cpu {
# half of what libvirt assigns individual VMs (1024) - approximately 50% cpu share
                cpu.shares="512";
        }
        cpuacct {
                cpuacct.usage="0";
        }
        memory {
# limit the max ram to 4GB and 1GB swap
                memory.limit_in_bytes="4G";
                memory.memsw.limit_in_bytes="5G";
        }
}

group glusterd {  
        cpu {
# half of what libvirt assigns individual VMs (1024) - approximately 50% cpu share
                cpu.shares="512";
        }
        cpuacct {
                cpuacct.usage="0";
        }
        memory {
# limit the max ram to 4GB and 1GB swap
                memory.limit_in_bytes="4G";
                memory.memsw.limit_in_bytes="5G";
        }
}

Now apply the changes to the running service:
service cgconfig restart

What this has done is defined two cgroup groups (glusterfsd and glusterd). I've gone and assigned the CPU share of the group to half of what libvirt assigns a VM along with some fixed memory limits just in case. The important one here is cpu.shares.

One last thing to do is modify the services so they start up in the cgroups. You can easily do this manually, but the recommended way (according to Red Hat docs) was to modify /etc/sysconfig/service

[[email protected] ~]# cat /etc/sysconfig/glusterd 
# Change the glusterd service defaults here.
# See "glusterd --help" outpout for defaults and possible values.

#GLUSTERD_LOGFILE="/var/log/gluster/gluster.log"
#GLUSTERD_LOGLEVEL="NORMAL"

CGROUP_DAEMON="cpu:/glusterd cpuacct:/glusterd memory:/glusterd"  
[[email protected] ~]# cat /etc/sysconfig/glusterfsd
# Change the glusterfsd service defaults here.
# See "glusterfsd --help" outpout for defaults and possible values.

#GLUSTERFSD_CONFIG="/etc/glusterfs/glusterfsd.vol"
#GLUSTERFSD_LOGFILE="/var/log/glusterfs/glusterfs.log"
#GLUSTERFSD_LOGLEVEL="NORMAL"

CGROUP_DAEMON="cpu:/glusterfsd cpuacct:/glusterfsd memory:/glusterfsd"  

Quick sum-up: We assign the gluster{d,fsd} service into the gluster{d,fsd} cgroup and define the resource groups we want to limit them to.

Now make sure cgconfig comes on at boot chkconfig cgconfig on

Ideally now, you should just send the host for a reboot to make sure everything's working the way it should.

When it comes back up, you can try the command cgsnapshot -s to see what your current rules are. -s will just ignore the undefined values.

Alternatively, before you define the "CGROUPDAEMON" in the sysconfig files shutdown the gluster services, then define "CGROUPDAEMON" and try start the gluster services again this should properly put them in the correct cgroups.

Note: I've only really tested this for a day - and so far I'm pretty impressed as the replication is no longer eating up my CPU and I haven't seen any performance drop in terms of read/write as all we've done is limited CPU and Memory. Bandwidth is untouched.

If you do your Google research you can also find the non-persistent method where you modify the files in /cgroup/ and create the groups there. I recommend doing that first to find the best config values for your systems.

For those interested, with my config values on a 2x Quad Core Server I cleaned out a brick and forced a re-replicate of the 1TB and glusterfsd happily chugged away at around 50% CPU and 200Mbps data transfer. I'm quite happy with that result, the obvious trade off cpu for replication rate is worth it for my scenario.

Please leave your suggestions/feedback and whether you found any possible ideal values for cgconfig.

HTH

Update (July 2014):

After a few months in using cgroups, I've removed the memory limits as gluster isn't that memory intensive. Similar to a comment as well, with a memory limit sometimes we hit an oom killer which is not great!

CPU performance DOES effect the read/write speed, so tweaking is required! The recent 3.5 version seems to be much better with CPU usage, making this appear to become obsolete. So kudos to the gluster devs!!

Deploying a high available OpenShift Origin 3 (now M4) Infrastructure with Ansible and FreeIPA on CentOS 6.5

I've been having quite a few questions about how I put together my OpenShift Origin environment from many jelous friends who were still manually spinning up test VMs.. I mean come on who wants to keep doing that :) So here goes my post on deploying openshift!

If you've been following the PaaS movements, OpenShift Origin M4 recently hit the mirrors and there's a damn fine list of new improvements - my favourite the new console interface and my own commit which allows MariaDB cartridges on RHEL based nodes (previously only available on Fedora).

In this post I'm going to run through the basic concept of installing a high available OpenShift Origin 3 deployment using my ansible script https://github.com/andrewklau/openshift-ansible. I expect you to know what it already does and is - if you don't look around first and come back. I wrote this because currently:

I'm not going to cover the FreeIPA install, as it's very simple and pretty amazing once you understand the concept (I use a 2 host replica behind keepalived).

OpenShift Infrastructure Design

Diagram Source: https://github.com/ansible/ansible-examples/tree/master/openshift

First things first lets do some initial prep and ground work. First you'll need atleast 8 VMs plus a working FreeIPA deployment. These are then broken down into:

  • 2x LVS Servers (active/backup style design we use these for load balancing the load between the broker components and do some redirects)
  • 2x Brokers Servers (these are like your entry points to the to your PaaS environment. They will interact with all the other components on your behalf. Keep in mind, these are standalone, stateless style servers which is why we can run LVS infront of these)
  • 3x Broker Support Nodes (these servers will run your PaaS datastore and amqp routing. MongoDB and ActiveMQ respectively.)
  • 1..N Node Servers (these are where the applications will be hosted and run.)
  • DNS (this is simply our FreeIPA cluster)

Ideal/Minimum Resource Allocation:

  • LVS Servers can run on your lowest spec'd host/VM as all the routing process is done on the kernel level aka SUPER FAST:
    • 1-2 Core CPU
    • 512mb-1GB RAM
    • 15GB HDD (atleast)
    • 2x Gigabit Nics (optional 10 Gigabit Nics)
  • Broker Servers are your main point of entry servers. Since these are fairly stateless they can be scaled out horizontally depending on your load.
    • 2-4 Core CPU
    • 2-4GB RAM
    • 15GB HDD (atleast)
    • 1x Gigabit NIC (optional 10 Gigabit NIC)
  • Broker Support Nodes play a key role in the infrastructure. They can be scaled out horizontally but keep in mind the cluster quorum requirements of ActiveMQ and MongoDB (eg. minimum 3 servers in the cluster).
    • 2-4 Core CPU
    • 2-8GB RAM (4GB Ideally)
    • 25GB HDD (minimum)
    • 1x Gigabit NIC (optional 10 Gigabit NIC)
  • Node Servers are your bread and butter. These need to be spec'd out the most to ensure high performance for the sites and services you'll be hosting. but always remember to think cloud - scale out horizontally is better than vertically. Deploying hundreds of nodes will scale and utilize the OpenShift PaaS infrastructure a lot better than one super power PC. (think redundancy, failover, bottlenecks etc)
    • 4-8 Core CPU
    • 4-16GB RAM
    • 50GB HDD (minimum)
    • 2x Gigabit NIC (optional 10 Gigabit NIC)

Few Things to Note:

  • All data is stored on the Nodes individually, there is no replication of this data. I'd love to see some sort of integration with glusterfs some point in the future though. Do not confuse this with the PaaS datastore. PaaS datastore holds the information such as application routing, names etc.
  • ActiveMQ is used in collaboration with mcollective to share details between the nodes and broker servers.
  • If you're simply building a proof of concept, you can even run this with just one node but you'll need atleast 3 Broker Support Nodes.
  • The whole infrastructure is designed to run in a private network. The only ones you'll need to port forward (if behind NAT or on AWS) are the hosts with dual NICs (eg. LVS and Node Servers). You can modify it all to be public facing but please update the firewall rules!
  • All hosts are already registered as a host within FreeIPA with the hostname formating of:
    • oo-service.lab.example.net
      • eg. oo-lvs1.lab.example.net
      • oo-node1.lab.example.net
  • My ansible script will also install nginx on the broker hosts to do a simple catch-all URL forwarding service. I found the one issue with OpenShift's routing is it relied on CNAMEs which unfortunately cannot be run on the @ of your domain. eg. you would never be able to host example.net on OpenShift. The workaround is to set the A address of your floating LVS IP address and it will redirect any * to www.*
    • Example: example.com -> LVS Floating IP Address (Nginx) -> www.example.com

I deployed all of these VMs with the help of Foreman onto of oVirt. So it made sense to start my ansible install script from the foreman host. However you can run ansible on any host connected to your network of VMs (even your laptop).

Warning! Warning!

Before you continue - I advice you to read the script first and really make sure you can roll back (eg. re-install OS or have snapshots etc.)

Source: https://github.com/andrewklau/openshift-ansible

Ansible is very easy to understand once you grab the concept that each host can be assigned role(s) and each role will have a set of tasks. Those tasks will then be run in sequential order to complete the install.

eg. roles/nodes/tasks/main.yml

Read through all those roles and tasks just to make sure you know what's happening. Usual la-di-dah I'm not responsible if it break something.

Foreman Host

# Make sure you have EPEL installed
yum -y install ansible git  
git clone https://github.com/andrewklau/openshift-ansible.git

cd openshift-ansible  
# Modify the config variables
nano group_vars/all  

I've added some extra notes below to help you modify the group_vars/all file

# Leave As Is
iface: '{{ ansible_default_ipv4.interface }}'

# This is your domain name eg. rhcloud.com
domain_name: cloud.example.net

# Leave Default
dns_port: 53  
rndc_port: 953

# Ignore this as we aren't using dns_keys for BIND updates
dns_key: "+GPQn8ufEpRLk5xgUU+W3CXdD5CKLO5hP4kCy1PRngV26V0eHnBrJF55IWw0HZme6mXJAgn7gkFMQbuQGq7tLQ=="

# Directory to store mongodb data (can leave as is)
mongodb\_datadir\_prefix: /data  
mongod_port:  2700  
# Set a secure mongo admin password
mongo\_admin\_pass: skldflk

# Set secure passwords for these services
mcollective_pass: asdasd  
admin_pass: asdasd  
amquser_pass: asdasd

# This is the LVS floating IP Address - if you aren't using NAT this should be a public address
vip: 10.0.0.4  
vip_netmask: 255.255.255.0

# Kerberos Configuration (make sure it matches your FreeIPA infrastructure)
kerb\_ipa\_server: ipa01.oo.example.net  
kerb_domain: oo.example.net  
kerb_realm: EXAMPLE.NET

### Remove this Uncommented Info Section ###

Node Public Configuration  
You will need to manually create the following A records on your DNS server. This is your PUBLIC ip address. 

The format is:  
{{ ansible_hostname }}-{{ public_domain_suffix }}.{{ public_domain }}

eg. oo-node1-lab.example.net

Notice how this format is slightly different to the above scheme of oo-node1.lab.example.net this is because FreeIPA does not yet support Split DNS. This is only used for the node's dynamic hostname generation.

## End Info Section ###

public_domain: example.net  
public\_domain\_suffix: lab

# FreeIPA DNS Configuration (dirty method)
ipa\_bind\_ip: 10.0.1.10  
ipa_server: ipa01.lab.example.net  
kinit_user: admin  
kinit_password: adminpassword

# Broker Hostname (used for Nginx)
broker_hostname: oo-broker.example.net  

Generate random string and replace contents in /roles/mongodb/files/secret

openssl rand -base64 741  

Generate new salts for authsalt and sessionsecret in /roles/broker/vars/main.yml

openssl rand -base64 64  

eg.

auth_salt: "lk32rsjdpdgSK+BacvsrfdQi6pDW7HMen3uJFykUwOQBoCsvqNZ68po9+N8w=="  
session_secret: "skdlfj3klSDFkjsdO+Yao9VoUx69ikwfiRdph9oXplWDaQ10yWV8y0iFiCf8lTzj40M6b9NIV+wtIuLv/Y/ODjmtJ399g=="  

Regenerate some new shared SSL Certs for inter server communication. Don't use FreeIPA ipa-getcert for this

cd roles/broker/files  
openssl genrsa -out server_priv.pem 2048 openssl rsa -in server_priv.pem -pubout > server_pub.pem  

Modify the hosts file to match your current infrastructure. Make sure all the hostnames resolve!

foreman host

# Make sure we have some fresh ssh keys available
ssh-keygen

# Now we run it!
screen  
ansible-playbook -i hosts site.yml  

The install process can take up to 30 minutes+ depending on your internet speed and host performance.

Once it's done, we still need to do a few post-install clean ups:

Because I have a public (eth1) and private (eth0) network this causes some complications with the install script. You'll simply need to modify /etc/openshift/node.conf

Set The following:

  • external ip address (the public ip address)
  • external hostname (remember those external hostnames we created eg. oo-broker01-melb.example.net)

Configure FreeIPA Dynamic DNS Updates

Now we need to configure the FreeIPA to accept the dynamic DNS updates it'll be receiving from the broker hosts.

Head over to the DNS Section -> Domain Name -> Settings

Append the following to the section for dynamic updates:

grant DNS\[email protected] wildcard * ANY; grant DNS\[email protected] wildcard * ANY;  

This can only be done through the web-ui as through command line it will only replace rather than append.

Finally tick the checkbox "Allow Dynamic Updates" and save changes.

Verification

  • OpenShift Origin comes with some nice tools to check everything is working.
    • On your Nodes run: oo-accept-node -v
      • This will help you debug any issues normally related to mcollective not running or missing packages.
    • On your Brokers run: oo-accept-broker -v
      • As above.
  • Everynow and then don't forget to make sure your ActiveMQ and MongoDB replicated cluster is still chugging along and that disk space isn't filling up too fast.

Closing Notes

Now enjoy your new highly available OpenShift Origin deployment and lookout for the cool pep's on OpenShift docker support and truely highly available hosted applications.

I'm open to comments and feedback and pull requests. I know my script is very dirty and hacky, but was my first attempt at ansible.

Because of the nature of the current openshift design, data is still limited to that one node (Unless they're scaled). I'd love to see some sort of gluster replicated style deployment so each node becomes a gluster-server but until then make sure you are always backing up those data directories ( /var/lib/openshift )

References:

How to workaround through the maze and reach the goal of the new amazing oVirt Hosted Engine with 3.4.0 Beta

Update I'm still working through a few issues and will update this page shortly so don't expect a perfect solution just yet.

Following from my previous post Deploying a semi-HA glusterized oVirt 3.3 Infrastructure on all the things I wanted for (except a simpler UI) have arrived!! That is:

  • oVirt Hosted Engine (chicken and egg scenario - host the engine within the cluster)
  • semi-libgfapi (native read/write speeds on the VM ontop of a glusterfs storage). This came with the release of RHEL/CentOS 6.5, I say semi because unfortunately there's a few limitations so only a workaround is in place for now.

So now lets take you through the exciting hosted-engine feature. There are two ways to do this:

  • Fresh Install (which is more ideal - I'll take you through the whole process)
  • Migrate Existing Engine (you'll need an extra host to add to your cluster)

My network configurations remain the same as before and we'll do all these on one host first:

oVirt Host

eth0 (Public Network) : 192.168.0.11
eth1 (management/gluster): bond0
eth2 (management/gluster): bond0

bond0.2 (management): 172.16.0.11
bond0.3 (gluster): 172.16.1.11

yum -y install wget screen nano
yum -y update
yum install http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm -y
yum install http://resources.ovirt.org/releases/ovirt-release-el6-10.0.1-2.noarch.rpm -y

We'll use the nightly repo until stable comes a long as there are a few bugs still getting patched

nano /etc/yum.repos.d/el6-ovirt.repo  
    # enable nightly repo

yum -y install ovirt-hosted-engine-setup

nano /etc/hosts  
172.16.0.10 engine engine.lab.example.net  
172.16.0.11 hv01 hv01.lab.example.net  
172.16.0.12 hv02 hv02.lab.example.net

172.16.1.11 gs01 gs01.lab.example.net  
172.16.1.12 gs02 gs02.lab.example.net

# Use your local mirror - we'll be using this later in our hosted-vm
wget http://mirror.optus.net/centos/6/isos/x86_64/CentOS-6.5-x86_64-minimal.iso

yum install -y glusterfs glusterfs-fuse glusterfs-server vdsm-gluster

# This dataallignment is good with most disks but do your research first
pvcreate --dataalignment 2560k

vgcreate vg_gluster /dev/sda3  
lvcreate --extents 100%FREE -n lv_gluster1 vg_gluster

# Here's the secret sauce for ideal glusterfs xfs filesystem
mkfs.xfs -i size=512 -n size=8192 -d su=256k,sw=10 /dev/mapper/vg_gluster-lv_gluster1

# Add to fstab for onboot mounting
echo "/dev/mapper/vg_gluster-lv_gluster1 /data1  xfs     defaults,allocsize=4096,inode64,logbsize=256K,logbufs=8,noatime        1 2" >> /etc/fstab

mkdir -p /data1/  
mount -a  

Modify Network Nics

nano /etc/sysconfig/network-scripts/ifcfg-eth0  
DEVICE=eth0  
TYPE=Ethernet  
ONBOOT=yes  
NM_CONTROLLED=no  
BOOTPROTO=none  
IPADDR=192.168.0.11  
GATEWAY=192.168.0.1  
PREFIX=24


nano /etc/sysconfig/network-scripts/ifcfg-eth1  
DEVICE=eth1  
ONBOOT=yes  
NM_CONTROLLED=no  
BOOTPROTO=none  
MASTER=bond0  
SLAVE=yes

nano /etc/sysconfig/network-scripts/ifcfg-eth2  
DEVICE=eth2  
ONBOOT=yes  
NM_CONTROLLED=no  
BOOTPROTO=none  
MASTER=bond0  
SLAVE=yes

nano /etc/sysconfig/network-scripts/ifcfg-bond0  
DEVICE=bond0  
BOOTPROTO=none  
ONBOOT=yes  
NM_CONTROLLED=no  
BONDING_OPTS="miimon=100 mode=balance-alb"  
IPV6INIT=no  
# Enable Half-Jumbo Frames CRS only supports 4064
MTU=4000

nano /etc/sysconfig/network-scripts/ifcfg-bond0.2  
DEVICE=bond0.2  
ONBOOT=yes  
BOOTPROTO=none  
TYPE=Ethernet  
VLAN=yes  
BRIDGE=ovirtmgmt  
# Enable Half-Jumbo Frames CRS only supports 4064
MTU=4000

nano /etc/sysconfig/network-scripts/ifcfg-bond0.3  
DEVICE=bond0.3  
ONBOOT=yes  
BOOTPROTO=none  
TYPE=Ethernet  
VLAN=yes  
IPADDR=172.16.1.11  
PREFIX=24  
GATEWAY=172.16.1.1  
DEFROUTE=no  
# Enable Half-Jumbo Frames - CRS only supports 4064
MTU=4000

nano /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt  
DEVICE=ovirtmgmt  
NM_CONTROLLED=no  
ONBOOT=yes  
TYPE=Bridge  
BOOTPROTO=none  
IPADDR=172.16.0.11  
PREFIX=24  
GATEWAY=172.16.0.1  
DEFROUTE=yes  
IPV4\_FAILURE\_FATAL=yes  
IPV6INIT=no

# My BZ 1055129 may not have been fixed so just to be safe run we need to run hosted-engine twice to fix some vdsm issues.

hosted-engine --deploy  
# It will error here.
hosted-engine --deploy  
# Let it run first then cancel it when it prompts you for details (Ctrl-D)

# It's important to do this before restarting the network

# Restart Network
service network restart  

My Mikrotik CRS125 24g 1s RM unfortunately only supports a max MTU of 4064, but it's still better than 1500MTU. If you've got one run this quick script to get your MTU updated in bulk:

# Mikrotik Bulk Update MTU for ether7-ether24
/for i from=8 to=23 step=1 do={ /interface ethernet set $i l2mtu=4064 mtu=4064 }

Test New MTU (with another host with same MTU)

ping -s 3964 172.16.0.12

Keepalived Configuration

yum install -y keepalived

cat /dev/null > /etc/keepalived/keepalived.conf  
nano /etc/keepalived/keepalived.conf

# Node1 (copy to HV01)
vrrp_instance VI_1 {  
interface bond0.3  
state MASTER  
virtual\_router\_id 10  
priority 100   # master 100  
virtual_ipaddress {  
172.16.1.5  
}
}

# Node2 (copy to HV02)
vrrp_instance VI_1 {  
interface bond0.3  
state BACKUP  
virtual\_router\_id 10  
priority 99 # master 100  
virtual_ipaddress {  
172.16.1.5  
}
}

service keepalived start  
chkconfig keepalived on

The following workaround is important otherwise live migrations won't work!

#Work Around until libvirtd fixes the port conflict (http://review.gluster.org/#/c/6147/)
nano /etc/glusterfs/glusterd.vol  
    option base-port 50152


chkconfig glusterd on  
service glusterd start

curl https://raw.github.com/gluster/glusterfs/master/extras/group-virt.example -o /var/lib/glusterd/groups/virt

gluster volume create HOSTED-ENGINE gs01.lab.example.net:/data1/hosted-engine  
gluster volume start HOSTED-ENGINE  
gluster volume set HOSTED-ENGINE auth.allow 172.16.*.*  
gluster volume set HOSTED-ENGINE group virt  
gluster volume set HOSTED-ENGINE storage.owner-uid 36  
gluster volume set HOSTED-ENGINE storage.owner-gid 36  
gluster volume set HOSTED-ENGINE nfs.disable off  

There is currently a bug over here https://bugzilla.redhat.com/show_bug.cgi?id=1055153 where vdsm fails at weird points (and trust me I found many along with many workarounds). But all you need now is:

chown vdsm:kvm /var/log/vdsm/vdsm.log

Run It!

screen  
ovirt-hosted-engine-setup  

Follow the install procedure. You can safely leave the defaults for many. When it comes to the storage choose NFS and use the gluster share we setup:

172.16.1.5:/HOSTED-ENGINE

Don't be confused here - hosted-engine only supports NFS storage for the hosted-VM itself. It's a special export domain which is only used for the engine, we'll still be able to use our glusterfs storage for our other VMs.

Then don't forget the ISO file we grabbed earlier:

/root/CentOS-6.5-x86_64-minimal.iso

Once it setups the initial configuration settings the install process will be as follows:

  • The install will give you an IP Address to VNC into, this will get you connected to the screen of your hosted-engine VM which is running on your host.
  • Using the minimal CentOS 6.5 ISO we downloaded earlier we'll follow the normal install procedure or if you're game run it through a kickstart file.
  • When the install finishes let it reboot, and go back to your console session on the host to confirm your VM has installed the OS.
  • It'll give you the VNC session details again for you to go and install the ovirt-engine service in your new VM.

Hosted Engine VM Centos Install

yum install http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm -y  
yum install http://resources.ovirt.org/releases/ovirt-release-el6-10-2.noarch.rpm -y  
yum -y install screen wget nano  
yum -y update

nano /etc/yum.repos.d/el6-ovirt.repo  
    # enable nightly repo

yum -y install ovirt-engine

# Install some useful packages such as ovirt-guest-agent
yum -y install dhclient ntp wget logrotate ntp openssh-clients rng-tools rsync tmpwatch selinux-policy-targeted nano perl ipa-client ipa-admintools screen acpid at curl ovirt-guest-agent

yum -y update

# Run through your setup the old fashion way
# If you're migrating from your engine, read the "cut off" section first http://www.ovirt.org/Migrate_to_Hosted_Engine

engine-setup

# Spice Proxy
yum install squid  
nano /etc/squid/squid.conf  
    # http_access deny CONNECT !SSL_ports
    http_access deny CONNECT !Safe_ports

    acl spice_servers dst 172.16.0.0/24
    http_access allow spice_servers

service squid restart  
chkconfig squid on  
iptables -A INPUT -p tcp --dport 3128 -j ACCEPT

# Setup a system wide spice proxy to save public IP addresses (if you're running behind NAT)
engine-config -s SpiceProxyDefault=http://192.168.0.10:3128/

service ovirt-engine restart

# Do your other post install features here #
  • Go back to your host confirm the engine has been installed. It'll do a quick check that the host can connect to the engine and then it will for you to shutdown the hosted-engine VM so the HA services can manage it.
  • Go ahead and power off the VM and your install is done!
  • Now the HA features of the hosted-engine will take care of keeping the VM alive.

Unfortunately there's still a bug https://bugzilla.redhat.com/show_bug.cgi?id=1055059 in regards to starting the VM. You you'll have to do the following in the mean time:

hosted-engine --vm-start-paused  
virsh start HostedEngine    

Now that we've got our engine setup - Let's quickly get tuned running to optimize our host:

cp -r /etc/tune-profiles/virtual-host /etc/tune-profiles/rhs-virtualization  
tuned-adm profile rhs-virtualization  

And because of our gluster base-port modification, don't forget to grab my custom iptables rules https://gist.github.com/andrewklau/7623169/raw/df69416e0386828d405845692c213b82e3f98e91/ovirt-engine-vdsm-iptables and drop them into /etc/sysconfig/iptables (remove the engine rules).

Finally - we create our final gluster volumes and configure them in the engine the same way we did before.

gluster volume create VM-DATA gs01.lab.example.net:/data1/vm-data  
gluster volume start VM-DATA  
gluster volume set VM-DATA auth.allow 172.16.*.*  
gluster volume set VM-DATA group virt  
gluster volume set VM-DATA storage.owner-uid 36  
gluster volume set VM-DATA storage.owner-gid 36  
# Help to avoid split brain
gluster volume set VM-DATA cluster.quorum-type auto  
gluster volume set VM-DATA performance.cache-size 1GB


chown -R 36:36 /data1

mkdir -p /storage/iso  
gluster volume set ISO auth.allow 172.16.*.*  
gluster volume set ISO storage.owner-uid 36  
gluster volume set ISO storage.owner-gid 36  

When you go to deploy the second host, follow the steps all the way till ovirt-hosted-engine-setup. When it prompts you for the storage connection it'll detect that it is joining a cluster and prompt you with the different steps.

Kudos to the Red Hat team for nice work and help. I hope you enjoy!

Check out this presentation here which is really interesting to how they calculate the score on which host will run the hosted-engine - along with their "jokes" and "comics".
http://www.linux-kvm.org/wiki/images/2/26/Kvm-forum-2013-hosted-engine.pdf

Deploying a semi-HA glusterized oVirt 3.3 Infrastructure

Time has passed since I last played with oVirt, the new ever so "amazing" openstack caught my attention but although it's got the momentem it's just not there yet in terms of easy deployment(maybe in a few months). So after a few weeks of playing with OpenStack we're back to oVirt.

There are a few issues I have with oVirt like the slow cluggy interface where as OpenStack has the lovely simple html5 bootstrap design. (Although there are talks to redesign the oVirt UI).

I guess you can compare the two as the Pet and Farm Animal scenario.

oVirt -> Pets -> Sick Pet (VM) -> Heal It -> Return to Health

OpenStack -> Farm Animals -> Sick Animal (VM) -> Replace It

In theory the two platforms should work hand in hand, and that's what Red Hat's currently doing. Many of the new oVirt features are taking advantage of OpenStack's fast pace development and integrating amazing new features. However, most people don't have the kind of hardware to deploy both OpenStack and oVirt/RHEV side by side (unless you've got the $$$$ to spend).

But to get a proper OpenStack infrastructure that can withstand node failures becomes expensive and tedious to configure and manage. oVirt on the other hand once you get the initial bits working "Just Works", and you can do it with minimal hardware.

I did this all on two CentOS 6.4 hosts.

Controller

eth0 (management/gluster): 172.16.0.11
eth1 (VM Data): n/a
eth2 (Public Network): 192.168.0.11

Compute

eth0 (management/gluster): 172.16.0.12
eth1 (VM Data): n/a

yum -y install wget screen nano  
yum -y update  
yum install http://resources.ovirt.org/releases/ovirt-release-el6-8-1.noarch.rpm -y  
yum install http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm -y


nano /etc/hosts  
172.16.0.11 hv01.lab.example.net hv01  
172.16.0.12 hv02.lab.example.net hv02

# Don't install vdsm-gluster here because it seems to fail the install later on
yum install -y glusterfs glusterfs-fuse glusterfs-server  
mkfs.xfs -f -i size=512 /dev/mapper/vg_gluster-lv_gluster1  
echo "/dev/mapper/vg_gluster-lv_gluster1 /data1  xfs     defaults        1 2" >> /etc/fstab  
mkdir -p /data1/  
mount -a


curl https://raw.github.com/gluster/glusterfs/master/extras/group-virt.example -o /var/lib/glusterd/groups/virt

gluster volume create DATA replica 2 hv01.lab.example.net:/data1/ hv02.lab.example.net:/data1/  
gluster volume start DATA  
gluster volume set DATA auth.allow 172.16.0.*  
gluster volume set DATA group virt  
gluster volume set DATA storage.owner-uid 36  
gluster volume set DATA storage.owner-gid 36  
# Help to avoid split brain
gluster volume set DATA cluster.quorum-type auto  
gluster volume set DATA performance.cache-size 1GB


chown 36:36 /data1

mkdir -p /storage/iso  
gluster volume set STORAGE auth.allow 172.16.0.*  
gluster volume set STORAGE storage.owner-uid 36  
gluster volume set STORAGE storage.owner-gid 36

## MAKE SURE eth1 is set to "onboot=yes"
DEVICE=eth1  
TYPE=Ethernet  
ONBOOT=yes  
NM_CONTROLLED="no"  
BOOTPROTO=none  

The above is very much the same initial configs which I did on my Getting Started with Multi-Node OpenStack RDO Havana + Gluster Backend + Neutron VLAN as they are both very similar.

Now it's time to install the ovirt-engine. I'm looking forward to the hosted-engine solution which is being released soon which'll allow us to put the engine on a VM within the cluster!

# Engine Only
yum install -y ovirt-engine

# Follow the prompts - I found installing the All In One and putting VDSM on the engine had a few issues, so I do it later.

engine-setup

# I run everything in a NAT environment, so externally I need to use a proxy. However I still can't quite get this to work properly..

engine-config -s SpiceProxyDefault=http://proxy:8080  
service ovirt-engine restart  

Here's a quick list of steps I do to get my environment running:

  • Modify your cluster to include the Gluster Service
  • Install new host (hv01 and hv02). Do one at a time, and when you're installing hv01 (the one with the engine) uncheck "configure iptables".
  • If you get an install failed, generally it refers to vdsm-gluster. I have had two cases
    • One time it was installed and the engine complained. So I yum -y remove vdsm-gluster and re-ran the install.
    • The other time it wasn't installed and the engine complained. yum -y install vdsm-gluster (Don't ask me why this happens)
  • Now you should have 2 Hosts installed.
  • Remove ovirtmgmt as a VM network, go to the hosts tab and click setup networks. Edit ovirtmgmt and press resync.
    • While you're here may as well create a few VM networks. I created 10 networks each on their own VLANs. (I used the same VLAN Switch config from my previous post Mikrotik CRS125-24G-1S-RM with OpenStack Neutron)
    • Save your network settings from the ovirt UI (this component is a little buggy as it does network tests so you don't lose connectivity. You may have to try a few times and wait quite a while)

Redundant Gluster Deployment

Now we configure a keepalived redundant gluster volume using Keepalived. This means if one host goes offline, the other will still be able to keep connected to the gluster volume so we keep a semi HA infrastructure (the last part being ovirt-hosted-engine which is still in the works).

yum install -y keepalived

cat /dev/null > /etc/keepalived/keepalived.conf  
nano /etc/keepalived/keepalived.conf

# Node1 (copy this on HV01)
vrrp_instance VI_1 {  
interface ovirtmgmt  
state MASTER  
virtual\_router\_id 10  
priority 100   # master 100  
virtual_ipaddress {  
172.16.0.5  
}
}

# Node2 (copy this on HV02)
vrrp_instance VI_1 {  
interface ovirtmgmt  
state BACKUP  
virtual\_router\_id 10  
priority 99 # master 100  
virtual_ipaddress {  
172.16.0.5  
}
}

service keepalived start  
chkconfig keepalived on

# Work Around until libvirtd fixes the port conflict (http://review.gluster.org/#/c/6147/)
# Using this workaround, remember to include the port 50152 up till however many bricks you'll be using. My iptables gist file above is already updated. The oVirt host-deploy script will not apply the correct rules, so you need to do it manually!

nano /etc/glusterfs/glusterd.vol  
volume management  
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option base-port 50152
end-volume

chkconfig glusterd on  
service glusterd restart  
service glusterfsd restart  
  • Go to the datacenter and create a new DATA domain. I used a POSIX datacenter and mounted the volume as glusterfs. I'm eagerly looking forward to RHEL 6.5 which'll allow us to mount the gluster volumes directly on the VM for a huge performance boost

  • Create an ISO domain again use POSIX and mount it as a gluster volume. You can alternatively choose local storage or NFS too.

    • Upload the an ISO from the command line (there are talks about finally allowing it to be done through the UI).
wget http://mirror.centos.org/centos/6/isos/x86_64/CentOS-6.4-x86_64-minimal.iso  
engine-iso-uploader upload -i ISO CentOS-6.4-x86_64-minimal.iso  

oVirt has too many hidden gems which 5 months later I'm still discovering. Check out all the amazing features like:

  • Snapshots
  • VM Watchdog and HA
  • Vibrant and Active Community
  • Strong integration into OpenStack
  • Option to upgrade to the supported RHEV (which we'll be planning on doing when things take off).

If you don't think oVirt is ready just yet, check out these amazing features which I'm looking forward to:

  • libgfapi in RHEL 6.5 (to be released) which'll allow native gluster access to VM images.
  • ovirt-hosted-engine a true HA environment! The engine gets hosted as a VM within the cluster infrastructure and gets managed and brought up on a different node if it's current one fails.
  • oVirt UI redesign (YES PLEASE! I hate the old cluggy ui)

Tinker, configure, play!