Performance boost at Radio Free Asia (Part 2)

I sort of hinted at a description of how we profiled our plone site at RFA in my previous post: How we got a 10x Performance boost at Radio Free Asia

I figured I'd take some time and share it.

Please be advised that this work is completely my own.
I am in no way promoting that this is the perfect way to do performance profiling in Plone. It just was the fastest thing I could pull out in a time of need.
What I mean is, I didn't really consult anyone on best practices, pitfalls, or just 'bad science' that can occur with this method. In fact, that's why I'm here - peer review. Please comment if I went wrong somewhere.

#Let's start at the Template
Because I had no other place to start. The ZMI database cache was pegged, the slow query log was not helping. Pages were taking 3-4 seconds to load. That's all the information I had. Standing over the sysadmin's shoulder probably would give me little help. We know it's the code.

I created a quick and dirty browser view that kicks off cProfile, so I can call it from templates:

at the top of my I put this in:

<tal:x tal:define="profiler context/@@profiler;
                   dummy python:profiler.start(filename='control.pstat');

.... The rest of my ...

and the bottom

    <tal:y tal:define="dummy python:profiler.stop()" />

I also can set the output filename in the template, so I can quickly change it to grab test data.
In hindsight, I probably could have used the context and request to create a unique filename... but... whatever....

I threw the python code in a sub directory of my main product, because I like that:

├── configure.zcml

and added it to my main configure.zcml:

<include package=".performance_profiler" />

Being careful not to check this in. Adding profiling code to a production website is not a good idea - especially if you're looking to speed it up.

But basically, the template is now instructing the view to start and stop python's built-in cProfile tool.

Now, restart the server, and request a page - any page (that uses the macro in the main_template)

and magically, you have a file appear called 'control.pstat' in your instance's directory.

you can read this with the primitive pstats python module from the command line:

$ python -m pstats control.pstat
Welcome to the profile statistics browser.
% help

but I was using kCacheGrind - it's pretty

$ sudo apt-get install kCacheGrind    

(or yum, or build from source, whatever)

$ sudo pip install pyprof2calltree      

(converts python profiles to calltree format)

$ pyprof2calltree -k -i control.pstat

convert the file and launch kCacheGrind with output:

(Image is a sample from the web, and not Plone related)

And this is what clued me in that I was calling pt_render about 10 times more than I expected.

I also discovered a few more places that were taking up a significant amount of time, or being called repeadedly and could be cached.

The point of this output is to find a piece of code or two that a 1-hour refactor can significantly increase performance. Going farther than that is typically a waste of time. Without discipline you can spend weeks or months tweaking code to get a few more milliseconds, but there isn't much payoff. Also, as a general rule, the more you optimize code, the less readable and maintainable it becomes. There are tradeoffs.


"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." -Donald Knuth

kCacheGrind also won't tell you what data is being processed, just how much time you spend in each function. Adding a set_trace() to that one function that seems to be doing too much or called too many times, even if it's in an egg, can really shine a light on where you went wrong.

like <tal:condition tal:if='image'>

This also lead me to reading about Plone query optimization:

we are spending a TON of time in <BTrees._IIBTree.intersection> because more indexes are NOT better. Did you know that adding a path to a query can slow it down, not speed it up?

Question: How can I avoid having btree intersections? My research says that zodb query time is logarithmic if you use a single index, but can become linear if using more than one - and I think this is the evidence.

1 Like

@flipmcf You know there is profiler built into zope right? (not sure why this isn't in the plone manual). It's quick and easy but perhaps not as pretty.

And there are others e.g.

BTW. by default query optimisations are lost on restart. When profiling you might be seeing the unoptimised queries. We make a big performance boost by saving our query plan and having it automatically loaded on started. This is not hard to do.

@djay Similar to this thread on stackoverflow which outlines how to load and save the query plan. Where in code could you call this on shutdown of a Plone site?

Should add the link, ahem.

1 Like

Yes, you're right, and I forgot to mention that. Also, one of the zope profilers I saw did have an option to 'throw away' the first run to remove uncached requests. I can't remember which one it was.

I did find the first run after restart somewhat useful in our case (unfortunately) because it's where I discovered our navigation query was loading near every object into the zope object cache. A refactor sped 'first load' up by 4x on just that method, cached (only?) got a 20% lift.

If I may have an opinion here, I really don't like the 'exclude_from_nav' index - but I think I understand why it's built that way.

We now have a new index 'nav_item' that, in theory, should only have 7-8 values for 'True' - making that query very fast... well... as long as you don't intersect it with another index. (we still do)

That's very interesting! Thanks!