Continuing the discussion from How to save and load the catalog query plan?:
Today I was trying to test if an increase in the zeo-client-cache-size
had some impact in performance on one of our sites and the results were pretty amazing.
First, I discovered that I've been wrong all these years thinking that we had configured a persistent client cache on our ZEO clients; in fact that was not the case.
I was very lucky to stumble upon the ZEO documentation on client cache; by reading it, I started looking for a file with .zec
extension and I was unable to find it; I then realized the zeo-client-client
was missing directive from our configuration.
I manually set the client
directive in the zope.conf
and started noticing some changes on the instance behavior:
2017-08-08T15:08:24 INFO ZEO.cache created temporary cache file '<fdopen>'
2017-08-08T15:08:24 INFO ZEO.ClientStorage zeostorage No verification necessary -- empty cache
2017-08-09T09:53:18 INFO ZEO.cache created temporary cache file '<fdopen>'
2017-08-09T09:53:19 INFO ZEO.ClientStorage zeostorage No verification necessary -- empty cache
2017-08-10T10:16:22 INFO ZEO.cache created temporary cache file '<fdopen>'
2017-08-10T10:16:22 INFO ZEO.ClientStorage zeostorage No verification necessary -- empty cache
2017-08-10T13:02:46 INFO ZEO.cache created persistent cache file '/opt/plone/buildout/parts/instance1/var/zeoclient-1.zec'
2017-08-10T13:02:46 INFO ZEO.ClientStorage zeostorage No verification necessary -- empty cache
The first time I restarted the instance nothing seemed to have changed and it took almost 3 minutes to warm up the cache as usual:
curl -o /dev/null http://localhost:8081/Plone
The interesting part is when I tried a second restart 10 minutes later and the warm up took only 15 seconds to finish.
The logs were showing something different:
2017-08-10T13:15:11 INFO ZEO.cache reusing persistent cache file '/opt/plone/buildout/parts/instance1/var/zeoclient-1.zec'
2017-08-10T13:15:13 INFO ZEO.ClientStorage zeostorage No verification necessary (cache_tid up-to-date '\x03\xc1Q\x8b\x007\x8f"')
The monitoring graphs were really interesting also:
As you can see, not only the time was reduced but also the CPU consumption, network transfer and disk I/O utilization.
The first restart occurred as 13:02 and everything was business as usual: high CPU and a lot of network traffic and disk I/O rate. the second restart happens at 13:15 and is almost imperceptible on network traffic and disk I/O rate.
I can say also that the instance responsiveness to requests was very high since the very beginning.
This test was done on ZEO clients running on a different machine than the ZEO server, now I need to test what happens if the ZEO clients are locally attached.