ConnectionStateError: Shouldn't load state for five.localsitemanager.registry.PersistentComponents 0x1a26bf when the connection is closed

Hello all. We're experiencing a situation where we can not edit sites because of a ConnectionStateError on clients. Anonymous access is fine at the moment, as we currently run a 4 zope client configuration and the other 3 appear to be OK. However, client1 is dying with this error message and a reboot is only way to bring it back up. We have gone through the trouble of rebooting the entire server as well.

We server approximately 10 sites out of this server. I am a bit concerned, interested to know if we may have a RAM issue. Currently the server runs on 8GB of RAM, however VMWare reports we're suing 7GB at the moment.

The error I'm seeing in the logs is:

ConnectionStateError: Shouldn't load state for five.localsitemanager.registry.PersistentComponents 0x1a26bf when the connection is closed

What could cause this and where do I look to resolve?

Thank you

It could be caching related as described here:

If that is the case, don't cache persistent objects, but cache some sort of derived value (a dictionary with some needed values perhaps)

and here: Using plone.memoize.instance with several ZEO clients

Thank you for the quick reply. This box has run fine for over a year, its just now doing this which is what has me a bit perplexed. If its caching related, how does one stop caching persistant objects? Is this a simple fix through the caching control panel?

I think there must some change somewhere - installed something new? Added or tweaked cache settings?
I think you can assume it is unrelated to the resource use, that would look different.

plone.memoize is used from the code, and it is in that case you have be careful with what you are caching.
But you have no code changes in this period?

No, we haven't done a great deal of development and nothing changed on the server in that period. We aren't a heavy development shop.

Very perplexed on this one...

It could also be unrelated to caching, but that would be my first thing to check.
Try to disable all caching in the caching control panel (if enabled) and see if it makes a difference.

Usually this problem occurs when a persistent object survives a request-response cycle in clients memory and is reused in the next cycle using a different connection from the pool.
As already stated, caching of persistent objects in a memory cache is the most common reason of this problem. But, there are for sure other reasons possible.
A reason why it did not show up for a long time is the load on the site. On low load, with a load balancer to 4 zeo clients, each client never used more than 1 database connection. Now if load increases and 2 connections are used, the problem shows up.

Thank you for the response. As sunew suggested, do you believe an easy first step to diagnose an at least pin point the problem would be to simply disable the caching via the Plone control panel?

I've noticed this problem only rears its head when many folks are logged in, editing the site. For example, we didn't have anyone logged in after business hours last night, and the problem did not show up.

We have Varnish in front of HaProxy and Plone. My setup right now has all logged in, editor traffic also going through Varnish. Would it be advisable to point all logged in traffic to one zeo client? Just spit balling here, but this one has me a bit perplexed. This was never an issue with our Plone 4 sites, but we've had some issues with caching when we first adopted Plone 5.