Using plone.memoize.instance with several ZEO clients

Hi everybody,

we are a bit struggeling with plone.memoize.instance invalidation when using several ZEO clients. Actually, if we manually invalidate a cached method (let's say a vocabulary), it is only invalidated on the client where the invalidation is called, other clients cache is not invalidated, that can lead to problems obviously...

Right, it is because the RAMCache is stored by ZEO client, so we tried to use memcached. It works correctly when it is about invalidating cache for all clients as the cache is centralized, but with memcached, we have another big issue it is with methods returning objects (or list of brains for example) where we have "TypeError: Can't pickle objects in acquisition wrappers." or "TypeError: can't pickle instancemethod objects"...

So we finally do not use plone.memoize.instance anymore but only plone.memoize.ram with a cache_key for all our cached methods, even the methods that we know will be cached for a long time and we wanted just to be able to invalidate manually... For these ones, we use a rapid check in the cachekey, we add a volatile (_v_my_method_name) attribute on the portal where we store "datetime.now()" and invalidation is made by updating this date...

Anyway, I wanted to know :

  • is there a way to invalidate every ZEO clients RAMCache at once (the entire cache or a single cache)?
  • what about using memcached with methods returning objects or LazMap instance (portal_catalog search result)?

Thank you for your help!

Gauthier

I have implemented something like this (but apparently, I have not published it -- it is used however in "Products.CCSQLMethods" (found on PyPI)). It uses an empty persistent synchronization object associated with the cache which is changed ("obj._p_changed=True") to indicate an invalidation. The cache checks on access whether is it still valid (comparing the current "_p_serial" value of the synchronization object with a stored value) and invalidates itself if this is not the case.

"plone.memoize.instance" could quite easily be extended to support this type of invalidation.

You have listed 2 problems: 1. acquisition wrappers and 2. methods.

WRT 1, if your application does not need the acquisition context or can recreate it, then instead of caching the wrapped object, cache only the base object ("obj.aq_base").

WRT 2, you can extend the pickling capabilities (it uses the same base features as Python's "copy" - and those are documented in the Python library reference). Note however, that this might make your site more vulnerable with respect to attacks (in the case that hostile users are able to present pickled data to your site).

big issue it is with methods returning objects

I recently found you should not use RAM cache at all anything that is, or contains a reference to, a persistent object. I had lots of these:

ConnectionStateError
Shouldn't load state for 0x1ddd7a when the connection is closed

I finally traced those to a ram cached view method that contained persistent objects. Disabling that cache solved the problem.

So it seems to me one should only ram.cache datastructures that are decoupled from the ZODB.

1 Like

Hi Dieter,

yes I tried with aq_base, but I had to put acquisition back in the caller method sometimes, and so on, I do not want to adapt my code that much to be able to use memcached, I mean this should not be necessary because it forces me to write unnecessary code for that... Moreover, even we solving this, I had the problem of some class that I could not pickle, I could write my serializer, but that cost seems way to high in comparison with using a cachekey with RAMCache.

To me the option to be able to invalidate cache on every ZEO client is the best because you do not fight with pickles and it works out of the box.

I will look at the package you proposed and see if I can make a pull request to plone.memoize or isolate this functionality in a separated package.

Thank you for your response,

Gauthier

@gyst,

Indeed we had such problems too, we take care to not return persistent objects, but it works for example for LazyMap instance (result of a portal_catalog search), but no more with memcached has it can not turn that LazyMap to a pickle...

Thank you,

Gauthier

"plone.memoize.instance" is not affected by this problem: it is using RAM associated with the "ZODB" cache itself and unless you do very strange things, this will not lead to "ConnectionStateError".

"plone.memoize.instance" is not affected by this problem

Yes of course. But Gauthier was explicitly asking about ramcaching. What I said about ram caching was about plone.memoize.ram. Thanks for pointing out how easy it is to use slippery language.

That's basically what happens if you commit a change to a persistent object. If you don't need to invalidate frequently then maybe you should just store your cache in the ZODB.

Can you cache the output of your rendering (i.e. JSON or a rendered template) rather than the inputs?

You can use a shared cache instead--something like memcache or redis.

Here is a redis example:

from plone.memoize.ram import AbstractDict, choose_cache as base_choose_cache
class RedisAdapter(AbstractDict):

    def __init__(self, client, globalkey=''):
        self.client = client
        self.globalkey = globalkey and '%s:' % globalkey or ''

    def _make_key(self, source):
        if isinstance(source, unicode):
            source = source.encode('utf-8')
        return md5(source).hexdigest()

    def get_key(self, key):
        return self.globalkey + self._make_key(key)

    def __getitem__(self, key):
        cached_value = self.client.get(self.get_key(key))
        if cached_value is None:
            raise KeyError(key)
        else:
            return cPickle.loads(cached_value)

    def __setitem__(self, key, value):
        try:
            cached_value = cPickle.dumps(value)
            self.client.set(self.get_key(key), cached_value)
        except cPickle.PicklingError:
            pass


def get_client(fun_name=''):
    server = os.environ.get('REDIS_SERVER', None)
    if server is None:
        return base_choose_cache(fun_name)

    client = getattr(thread_local, "client", None)
    if client is None:
        server = os.environ.get("REDIS_SERVER", "127.0.0.1:6379")
        host, port = server.split(':')
        client = redis.StrictRedis(host=host, port=int(port), db=0)
        try:
            client.get('test-key')
            thread_local.client = client
        except redis.exceptions.ConnectionError:
            return base_choose_cache(fun_name)
    return RedisAdapter(client, fun_name)

directlyProvides(get_client, ICacheChooser)

Then, zcml registration

<utility component=".cache.get_client" provides="plone.memoize.interfaces.ICacheChooser" />

1 Like

I really recommend to never store anything persistent i.e catalog brains (LazyMap, ...) in the cache. If there are more than one thread configured I would consider this harmful. It may happen to get an object from a different zodb connection than the one currently used by the thread.

Its probably easiest to refactor the places where this happens and use memcached. At least thats what I would do.

If thats not an option I'd use something like AMQP (using RabbitMQ) or Celery broadcast to message all other clients to invalidate its cache.

IIRC this example and the same for memcached is not documented in plone.memoize nor elsewhere. Should be done ...

Hi,

thank you for these optinions, I will avoid caching methods returning objects to be able to move to memcached, meanwhile I will use plone.memoize.ram with a cachekey to be sure that every clients are invalidated.

Thank you,

Gauthier