Controlling glusterfsd CPU outbreaks with cgroups

Some of you may that same feeling when adding a new brick to your gluster replicated volume which already has an excess of 1TB data already on there and suddenly your gluster server has shot up to 500% CPU usage. What's worse is when my hosts run along side oVirt so while gluster hogged all the CPU, my VMs started to crawl, even running simple commands like top would take 30+ seconds. Not a good feeling.

My first attempt I limited the NIC's bandwidth to 200Mbps rather than the 2x1Gbps aggregated link and this calmed glusterfsd down to a healthy 50%. A temporary fix which however meant clients accessing gluster storage would be bottlenecked by that shared limit.

So off to the mailing list - a great suggestion from James/purpleidea (https://ttboj.wordpress.com/code/puppet-gluster/) on using cgroups.

The concept is simple, we limit the total CPU glusterfsd sees so when it comes to doing the checksums for self heals, replication etc. They won't have the high priority which other services such as running VMs would have. This effectively slows down replication rate in return for lower CPU usage.

First make sure you have the package (RHEL/CentOS) libcgroup

Now you want to modify /etc/cgconfig.conf so you've got something like this (keep in mind comments MUST be at the start of the line or you may get parser errors):

mount {  
    cpuset  = /cgroup/cpuset;
    cpu = /cgroup/cpu;
    cpuacct = /cgroup/cpuacct;
    memory  = /cgroup/memory;
    devices = /cgroup/devices;
    freezer = /cgroup/freezer;
    net_cls = /cgroup/net_cls;
    blkio   = /cgroup/blkio;
}
group glusterfsd {  
        cpu {
# half of what libvirt assigns individual VMs (1024) - approximately 50% cpu share
                cpu.shares="512";
        }
        cpuacct {
                cpuacct.usage="0";
        }
        memory {
# limit the max ram to 4GB and 1GB swap
                memory.limit_in_bytes="4G";
                memory.memsw.limit_in_bytes="5G";
        }
}

group glusterd {  
        cpu {
# half of what libvirt assigns individual VMs (1024) - approximately 50% cpu share
                cpu.shares="512";
        }
        cpuacct {
                cpuacct.usage="0";
        }
        memory {
# limit the max ram to 4GB and 1GB swap
                memory.limit_in_bytes="4G";
                memory.memsw.limit_in_bytes="5G";
        }
}

Now apply the changes to the running service:
service cgconfig restart

What this has done is defined two cgroup groups (glusterfsd and glusterd). I've gone and assigned the CPU share of the group to half of what libvirt assigns a VM along with some fixed memory limits just in case. The important one here is cpu.shares.

One last thing to do is modify the services so they start up in the cgroups. You can easily do this manually, but the recommended way (according to Red Hat docs) was to modify /etc/sysconfig/service

[[email protected] ~]# cat /etc/sysconfig/glusterd 
# Change the glusterd service defaults here.
# See "glusterd --help" outpout for defaults and possible values.

#GLUSTERD_LOGFILE="/var/log/gluster/gluster.log"
#GLUSTERD_LOGLEVEL="NORMAL"

CGROUP_DAEMON="cpu:/glusterd cpuacct:/glusterd memory:/glusterd"  
[[email protected] ~]# cat /etc/sysconfig/glusterfsd
# Change the glusterfsd service defaults here.
# See "glusterfsd --help" outpout for defaults and possible values.

#GLUSTERFSD_CONFIG="/etc/glusterfs/glusterfsd.vol"
#GLUSTERFSD_LOGFILE="/var/log/glusterfs/glusterfs.log"
#GLUSTERFSD_LOGLEVEL="NORMAL"

CGROUP_DAEMON="cpu:/glusterfsd cpuacct:/glusterfsd memory:/glusterfsd"  

Quick sum-up: We assign the gluster{d,fsd} service into the gluster{d,fsd} cgroup and define the resource groups we want to limit them to.

Now make sure cgconfig comes on at boot chkconfig cgconfig on

Ideally now, you should just send the host for a reboot to make sure everything's working the way it should.

When it comes back up, you can try the command cgsnapshot -s to see what your current rules are. -s will just ignore the undefined values.

Alternatively, before you define the "CGROUPDAEMON" in the sysconfig files shutdown the gluster services, then define "CGROUPDAEMON" and try start the gluster services again this should properly put them in the correct cgroups.

Note: I've only really tested this for a day - and so far I'm pretty impressed as the replication is no longer eating up my CPU and I haven't seen any performance drop in terms of read/write as all we've done is limited CPU and Memory. Bandwidth is untouched.

If you do your Google research you can also find the non-persistent method where you modify the files in /cgroup/ and create the groups there. I recommend doing that first to find the best config values for your systems.

For those interested, with my config values on a 2x Quad Core Server I cleaned out a brick and forced a re-replicate of the 1TB and glusterfsd happily chugged away at around 50% CPU and 200Mbps data transfer. I'm quite happy with that result, the obvious trade off cpu for replication rate is worth it for my scenario.

Please leave your suggestions/feedback and whether you found any possible ideal values for cgconfig.

HTH

Update (July 2014):

After a few months in using cgroups, I've removed the memory limits as gluster isn't that memory intensive. Similar to a comment as well, with a memory limit sometimes we hit an oom killer which is not great!

CPU performance DOES effect the read/write speed, so tweaking is required! The recent 3.5 version seems to be much better with CPU usage, making this appear to become obsolete. So kudos to the gluster devs!!

comments powered by Disqus