Samba performance with GFS2

I recently spent too much time troubleshooting sluggish Samba transfer speeds to find that GFS2 and Samba in Redhat 5.3 don't get along too well without some special treatment.

I should note that my GFS2 cluster is using Samba in a single-node configuration, that is to say, there is no concurrent access to files from multiple nodes to maintain TDB consistency. CTDB is now a viable option for clustered Samba. How I came to this configuration is another story, but in brief: I was forced to use GFS2 as a result of HA-LVM limitations, which are such that you may only use one logical volume per volume group in a Redhat cluster. I would much rather manage two volume groups than 6 partitions and 6 volume groups each with a single volume.

After I had my GFS2 volumes running I configured Samba. During some simple file copy tests I noticed my file copy process was incredibly slow, only 1.5 megabytes a second. The copy itself behavior of the copy appeared to be "bursty", write a little, pause, write a little again until the operation finished. I could find no reason after doing some basic troubleshooting why it should be so slow. I managed to copy some files over scp at rates one would expect with gigabit ethernet.

It was at this point I added a non-GFS volume as a share and performed the same file copy. Voila! Incredibly fast. But why would using scp result in a faster transfer when copying to GFS volumes? This is what lead me to think more about locking or cluster locking as the root of the issue.

That's when I found Ping Pong and realized I was being throttled to 100 posix locks per second by gfs_controld which is the default setting. Take note, the manpage for gfs_controld has information which may impact your network:

Heavy use of plocks can result in high network load.

Check out the gfs_controld manpage for more information.

I found that using adding two lines to cluster.conf just below the cman tag to increase the lock limits fixed my issue.

<cman/>
<dlm plock_ownership="1" plock_rate_limit="0"/>
<gfs_controld plock_rate_limit="0"/>

Setting the lock limit to zero means an infinite number, effectively as much as your systems can keep up with. The lock ownership option optimizes performance for repeated locking by a process on a single node. These options weigh heavily on caching strategies so they may affect memory usage as well.

As a result of my changes ping pong now runs at more than five thousand locks per second and Samba performance has improved to expected levels. Note that it is also possible to run a GFS2 with the lock_nolock option when running a single node to improve performance.