Site map and mega menu generation is slow

rafaelbco · August 4, 2017, 4:11am

Hello,

I'm seeking for feedback, advice, ideas about the following scenario.

I have a site with a mega menu in the top and a site map in the bottom. This means I have to construct a site map, i.e HTML code containing a tree of <ul>/<li> elements containing links for all the content in the site, down to a certain depth, let's say, 3 levels.

It happens that this is slow on Plone. Here is my code: https://gist.github.com/rafaelbco/4bfc512c6b81884298d997236617b7cf The bulk of the work is done by the function plone.app.layout.navigation.navtree.buildFolderTree. I'm on Plone 4.x with Dexterity-only content types.

Besides being slow, it fills up the DB cache with a lot of objects. I don't know what kind of objects end up in the cache, but I can see the number of objects in ZMI / Control Panel. My code avoids all calls to getObject. Only catalog brains are used. Any ideas?

So I resort to caching the generated HTML. I encountered the following issues, regardless the specifics of the caching approach:

I have to cache one sitemap per user.
Reality is a bit better than that: I could cache one version per set of groups, because the way role assignments are handled the site. But still ...
I have to invalidate the entire cache every time something changes in the site.
Every change is a potential change in the sitemap. I thought about using event handlers to figure out if the cache must be invalidated or not, or to invalidate just a portion of the site map, but it seems a lot of work and somewhat fragile.

Regarding the caching storage, I could still not decide what is better, considering a multiple Zope instances deploy:

Cache in memory.
Each instance has it's own cache.
Cache on ZODB, using an OOBTree annotated in the Portal root.
All instances share the cache, which is a good thing. However it polutes the Undo history and can be slow if the cache is invalidated a lot.
Cache in something external and shared by all instances, like memcached.
I have not tried it yet. The main drawback I can see is it's one more service to configure/deploy/monitor/etc.

Any feedback is appreciated. Throw your crazy ideas. Ask questions. Ask for money. I need help! _ _

djay · August 4, 2017, 4:33am

Why do you need custom code? webcourtier.dropdownmenu already does this and it seems pretty fast. We've used this to create fat menus with some diazo. See http://www.genetics.edu.au/. Of course going to 3 levels on a largish site could be a lot more content. I'd question the value of a menu with that many levels/items. If you really need it, I'd suggest perhaps using javascript to lazy load the last level if its not always visible.

If you really want to keep to your custom code (which seems very complex), then I'd do some profilling and ensure that there really isn't any getObject calls being called without noticing, or some other code thats expensive. Using catalog only should be pretty fast.

ebrehault · August 4, 2017, 7:30am

Maybe a frontend approach:

you install plone.restapi
in javascript, you call the @search end point to get your entire site structure (something like: https://server/@search?is_default_page=0&path.depth=8&metadata_fields:list=exclude_from_nav&metadata_fields:list=getObjPositionInParent&b_size=1000 ) but you do that only once
you cache the JSON result in localstorage (plus a timestamp to manage refresh)
and you render both sitemap and mega menu using this data

Compare to your approach, the benefits are:

the REST API is faster (because it does not take care of rendering anything)
everything that depends on the current context (which is the active menu, what is the current path, etc.) will be handled by the browser, so your server can have a rest (shall I say a REST )

dieter · August 4, 2017, 7:46am

On the Control_Panel page Debug Information, you will find a link Cache details. It shows you the (ZODB) cache content in the form "class -> number of cached objects". This way, you get an overview on how the cache is used.

I assume that this is a requirement of your application. Otherwise, by using the appropriate cache key, you can control for which situations you get specific versions.

That depends on what you cache. Instead of caching the complete result, you could cache components which are then fit together to produce the complete result. Maybe, this alleviates also the burden to maintain many versions (provided some parts of the result need less versions than others).

This is quite difficult to be kept consistent. Note, that the operation invalidating part of the cache happens in a single instance. While it is not difficult to invalidate the RAM cache in this instance, it is not easy to inform the other instances, that they, too, need to invalidate their cache.

In Products.CCSQLMethods, I use the ZODB cache and its invalidation protocol to ensure consistency across instances. However, you need a dedicated ZODB object for each cache you want to be invalidated individually.

I have worked in several projects using memcached. I have never been responsible for those parts of the projects but apparently, the responsible persons have been highly satisfied. It seems that memcached poses very few (if any) problems.

rafaelbco · August 4, 2017, 5:38pm

webcourtier.dropdownmenu uses the default Plone machinery to build the site maps, and that's what I found out to be slow. So I wrote the complex custom code, trying obsessively to make it faster, and I'm pretty satisfied with the progress. Unfortunately I haven't saved the timing comparisons data to show here.

So I'm pretty confident I'm already faster than webcourtier.dropdownmenu.

Good idea. This way I can use Varnish to cache the Ajax requests.

rafaelbco · August 4, 2017, 5:51pm

Interesting approach.

If I understood the code in plone.restapi correctly the search end point returns a flat list, instead of a tree structure. It's good because it does not uses Plone's buildFolderTree and friends, which causes the slowness, only the portal_catalog.

It's bad on the other hand because I'll have to build the tree structure myself. Or in Javascript or in an intermediate view. But then I can cache this view... Oh what a rabbit hole of possibilities is this!

rafaelbco · August 4, 2017, 6:13pm

Well, Plone allows for global and local roles per user. So the set of visible content items is potentially different for each user. Like I detailed, I improved on that by using the set of groups an user belongs as a cache key.

Well, I wrote about this in the original post. Like I said, the main problem with this approach is the complexity. It's just hard to implement.

You seem to be thinking of something better (and more complex).

My approach is more naive: each instance has it's own cache, in memory, which is a dict at module level. Since invalidation happens when anything changes in the site, I simulate this by using the catalog counter. I store the counter every time I write to the cache. When I want to retrieve something from the cache then I check if the catalog counter has changed. If not, then I just retrieve the value. If it has changed then I clear the cache, rebuild the sitemap and store it in the cache.

I will definitely take a look. Am I crazy or this approach is like using ZEO/ZODB as a kind of memcached ?

Thank you!

hvelarde · August 4, 2017, 7:02pm

mega menus are slow indeed, we used to have one in one site and it was consuming a lot of resources (CPU and memory) so we decided to just remove it.

I found your code really complex and I don't know if it make sense to maintain it or not as you're not posting what order of magnitude the gains were. I would try to fix performance in Plone code instead of trying to maintain a separate code base.

anyway there are some points that you should be aware of; this is really a bad idea:

def _render_tag_close(self, output, tag):
    output.write('</')
    output.write(tag)
    output.write('>\n')

try this instead:

def _render_tag_close(self, output, tag):
    output.write('</' + tag + '>\n')

it's almost 3 times faster:

>>> timeit.timeit('output.write("<");output.write(tag);output.write(">")', setup='from StringIO import StringIO;output = StringIO();tag="ul"', number=1000000)
1.9410121440887451
>>> timeit.timeit('output.write("<"+tag+">")', setup='from StringIO import StringIO;output = StringIO();tag="ul"', number=1000000)
0.7793159484863281

using format is also a bad idea (I just discovered that):

>>> timeit.timeit('"navTree navTreeLevel{}".format(level)', setup='level = "1"', number=1000000)
0.2354569435119629
>>> timeit.timeit('"navTree navTreeLevel"+level', setup='level = "1"', number=1000000)
0.06421804428100586

I would suggest you to follow @djay advice: make some profiling at Plone level and try to enhance things there trying to avoid complex optimizations as they are hard to maintain in the log term.

also, caching in the instance since the most viable solution to me as it's easy and good enough: yes, one person will have to wait a couple of seconds, but the rest of the people hitting the cache will find it fast.

I would love to know about how to free memory after caching invalidation as this is something that bites us a little bit in our projects.

rafaelbco · August 4, 2017, 7:58pm

Good to hear I'm not alone! Other evidence this is indeed slow: webcourtier.dropdownmenu, mentioned by @djay, offers a cache option.

Good catch! I was so happy with the speedup gained by not using ZPT that I didn't bother to optimize the HTML generation further.

I agree, in a philosophical sense. However it's more productive in the short term to work on a separate code base, tied to my needs only.

I feel it's not good enough when you have many instances and the content changes a lot. This makes cache hits low. Sharing the cache between instances would provide a nice improvement, I guess.

Well, cache is a dict, and I empty the cache with dict.clear. I suppose this frees the memory. Or not? Care to elaborate?

hvelarde · August 4, 2017, 8:36pm

can you share an example? I mean, my previous cache key will be invalidated because I added an object to the site and that increments the catalog counter.

now, my previous cache key is useless but it's still there... or is not?

dieter · August 5, 2017, 5:18am

I do not yet know the "catalog counter". Does it change for any catalog modifying operation?

It does not use ZEO/ZODB as a "kind of memcached": the cached values are stored locally in RAM, not centrally in "memcached". It uses the ZODB only for the distribution of invalidation messages. It is quite similar to your "catalog counter" method; it uses the "_p_serial" of a persistent object (somehow assiciated with a cache) instead. When you want to invalidate the cache, you modify the persistent object (thus globally changing its "_p_serial").

dieter · August 5, 2017, 5:20am

In a project, I work together with fans of the "angular" (Javascript) framework. Apparently, this framework has components able to build hierarchical structures (likely deepening on demand).

rafaelbco · August 7, 2017, 4:25pm

Here's how I do it:

class SitemapCache(object):

    def __init__(self):
        self._last_state = None
        self._storage = {}

    def get(self, key):
        current_state = self.get_state()
        if current_state != self._last_state:
            self._storage.clear()
            self._last_state = current_state

        return self._storage.get(key, None)

    def set(self, key, value):
        self._storage[key] = value

    def get_state(self):
        return api.portal.get_tool('portal_catalog').getCounter()

The cache key must not include the catalog counter. It must include:

Parameters used to generate the site map: root item path, depth, etc.
The user ID or other form of identifying which items are visible. In my case I use the set of groups of the user.

What do you think? I don't see how it may lead to memory being wasted.

rafaelbco · August 7, 2017, 5:26pm

What are the advantages of the mechanism implemented in CCSQLMethods compared to just storing what's need to be cached in the ZODB?

dieter · August 7, 2017, 7:59pm

If you really want to make the cache content persistent, you obviously can store it directly in the ZODB. For CCSQLMethods, I did not want to store the potentially large result sets in the ZODB (as persistent content).

You can use a "v" attribute as some way of ZODB based cache. However, this gives you a cache per connection (not per instance).

The method used in CCSQLMethods uses the ZODB to communicate invalidations across instances but otherwise uses an instance wide cache (in RAM).

rafaelbco · August 7, 2017, 8:53pm

I see. Thanks for the explanation.

What I'm thinking of is using ZEO as a replacement for memcache. I posted this do the ZODB mailing list if you're curious: Redirecting to Google Groups

I'd like your feedback too, since it seems you knkow a lot about ZODB.

agitator · August 8, 2017, 7:52am

My expectation for a mega menu from site owner and user stand point is that it is not purely generated but redacted/currated to show contents that not have to follow the content structure within the site but foremost give a rich overview what the selected navigation topic could provide.

An idea I'm carrying around for a while is to use a mosaic pages to create mega menus. This will allow you to create mega menus with all the freedom to create a "submenu" including static and dynamic content.

A small javascript to load and show that mosaic-submenu-item should be easy to do and one could use an extra p.a.caching rule if there's any performance issue.

hvelarde · August 8, 2017, 1:38pm

now I understand; you're clearing the cache on a state change.

we're using plone.memoize and I think we have an issue with it: as no clearing occurs in the code of, let's say, a view memoizer, memory consumptions grows until the point instances have to be restarted.

I think this could be a potential point of enhancement.

that's exactly what we had also in our use case: a viewlet on top showing a small collective.cover object loaded using an AJAX call.

the mega menu, on the other side, just included the most recent n item on folders showing title, description, lead image and link.

the Plone machinery for global sections is slow, and is also a potential point of enhancement.