Alternatives to high availability on blob storage with GlusterFS?

hvelarde · November 10, 2015, 3:15pm

We have been using ZRS for a while but we still have the problem of a single point of failure on the blob storage. In the past I read about GlusterFS but many people said it had issues with small files and complex directory structures... just our use case.

Recently we stumbled upon @djay tweet on it:

So we decided to give a try: installation was smooth under Ubuntu 14.04 and I was able to mount a volume distributed in 2 Digital Ocean droplets following a couple of howtos.

Problems started when I decided to test a real environment: I created the volume and configured it and when I started copying the blob storage inside it, weird things started to happen.

First the second server became somehow unresponsive: some commands did not run like a single df -h. I restarted the machine and returned after a few hours, then I saw the CPU usage was on some sort of oscillation.

When I started my stack (nginx, Varnish, ZRS server and 4 Plone instances), I found that I was not able to reach Varnish on port 6081, neither the Plone instances on ports 8081-8084. I had to stop the GlusterFS volume to being able to reach them but seems ngnix is still unable to reach Varnish and I'm getting a lot of timeout errors on the log.

This test was not on the biggest site we're running so I guess either my configuration has problems or simply GlusterFS is not the right tool for this job:

$ sudo gluster volume info
 
Volume Name: blobstorage
Type: Replicate
Volume ID: 12345da2-40fc-4c61-a53e-123456e4dd96
Status: Stopped
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server1:/var/blobstorage
Brick2: server2:/var/blobstorage
Options Reconfigured:
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: WARNING
nfs.enable-ino32: on
nfs.addr-namelookup: off
nfs.disable: on
performance.cache-max-file-size: 2MB
performance.cache-refresh-timeout: 4
performance.cache-size: 256MB
performance.io-thread-count: 32
performance.write-behind-window-size: 4MB

I've been looking for alternatives and I read about using NFS with DRBD and Pacemaker but I'm not very enthusiastic because of the complexity on having to add 2 new technologies to our current stack.

What are you guys using nowadays to deploy HA sites with Plone?

jensens · November 10, 2015, 3:47pm

We are using DRBD for both, ZODB and NFS on a machine with a failover address.

The whole is OpenAIS based and uses Pacemaker as cluster resource manager and Corosync as Cluster Engine.
Also the chain NGINX, Varnish, HA-Proxy runs redundant on two machines using this technology.

Initially it is indeed a complex topic, but once you got it its worth every minute you struggled with it. Afterwards you'll have a real good and robust setup.

It replaces any systemd-supervisor-init script system to start the services (it uses itself OFS scripts, kind of similar to init scripts but with more complex status reporting).

djay · November 10, 2015, 4:19pm

We switched to just using zrs. It does sync the blobs. Less moving parts
than gluster. Although gluster worked fine for us

cleberjsantos · November 10, 2015, 5:41pm

@hvelarde @djay I also tried here with GlusterFs, but, the CPU to be idle, lots of memory to be available, no swapping, no I/O, but a high average load.

pbauer · November 10, 2015, 7:59pm

We also use zrs. I think the setup is much simpler that the alternatives. It supports blobs since 2009 (see https://github.com/zc/zrs/blob/master/HISTORY.txt).

hvelarde · November 10, 2015, 7:59pm

@jensens I think we will have to deal with something less complex because we don't have resources neither time to test that configuration.

@djay yes, you're right; I have used that configuration before and is quite easy; you don't even need NFS.

@cleberjsantos are you using real or virtual servers? I think this issue could be related with that fact.

thanks, guys!

hvelarde · November 10, 2015, 8:10pm

One more question, @djay @pbauer, do you have instances running on the ZRS slave?

In such case, how do you configure to allow them to be used by editors? do you set shared-blob = off? If so, what happens to the blobs on the instances?

We use a huge zeo-client-cache-size but I don't know if all the blobs are also copied to the blob-storage location or only those that are alive in the ZODB cache.

Anyone knows?

hvelarde · November 10, 2015, 8:16pm

I just read the documentation of plone.recipe.zope2instance and the answer seems to be there:

blob-storage
The location of the blob zeocache, defaults to var/blobcache. If shared-blob is on it defaults to ${buildout:directory}/var/blobstorage.

So that must mean that I have a penalty of transferring the blobs more that one time over the network.

cleberjsantos · November 11, 2015, 12:37am

@hvelarde Yes, I'm using a virtual server in the Digital Ocean droplets

hvelarde · November 11, 2015, 12:31pm

seems that "somebody" had changed the firewall rules on teh server blocking some ports on the private interface and that was messing around with Varnish and, probably, with GlusterFS; those rules were applied when I restarted the VM. so, part of the mystery is now solved.