Why is Plone 4.3 faster than Plone 5.1?

Yesterday I was trying to find a way to add some performance tests to one of our add-ons and I came up with this:

I added one test to measure how long it takes to render 100 times the viewlet included in this package and I set an arbitrary value of 0.5 seconds as the limit.

To my surprise, tests are passing for Plone 4.2 and Plone 4.3 and fail in both Plone 5.0 and 5.1:

https://travis-ci.org/collective/collective.newsticker/builds/213848036

This is what I get locally on my machine:

Plone 4.3.12

*** PROFILER RESULTS ***
render (/home/hvelarde/collective/newsticker/src/collective/newsticker/tests/test_viewlet.py:149)
function called 1 times

         307779 function calls (290776 primitive calls) in 0.451 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 365 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.455    0.455 test_viewlet.py:149(render)
      100    0.000    0.000    0.454    0.005 simpleviewclass.py:43(__call__)
      100    0.000    0.000    0.453    0.005 pagetemplatefile.py:120(__call__)
      100    0.001    0.000    0.453    0.005 pagetemplatefile.py:47(__call__)
      100    0.001    0.000    0.448    0.004 pagetemplate.py:116(pt_render)
      100    0.001    0.000    0.444    0.004 pagetemplate.py:234(__call__)
      100    0.000    0.000    0.440    0.004 talinterpreter.py:267(__call__)
  600/100    0.004    0.000    0.440    0.004 talinterpreter.py:328(interpret)
      100    0.000    0.000    0.438    0.004 talinterpreter.py:850(do_condition)
1100/1000    0.002    0.000    0.388    0.000 tales.py:691(evaluate)
...

Plone 5.1b2

*** PROFILER RESULTS ***
render (/home/hvelarde/collective/newsticker/src/collective/newsticker/tests/test_viewlet.py:149)
function called 1 times

         598617 function calls (547661 primitive calls) in 0.661 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 326 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.664    0.664 test_viewlet.py:149(render)
      100    0.000    0.000    0.664    0.007 simpleviewclass.py:43(__call__)
      100    0.000    0.000    0.663    0.007 pagetemplatefile.py:120(__call__)
      100    0.001    0.000    0.663    0.007 pagetemplatefile.py:47(__call__)
      100    0.001    0.000    0.658    0.007 pagetemplate.py:116(pt_render)
      100    0.001    0.000    0.654    0.007 engine.py:86(__call__)
      100    0.001    0.000    0.653    0.007 pagetemplate.py:128(render)
      100    0.001    0.000    0.652    0.007 template.py:209(render)
      100    0.001    0.000    0.650    0.007 template.py:165(render)
      100    0.005    0.000    0.648    0.006 d13e2e5858eac1ff5089660f6bd49440.py:22(render)
...

As you can see, there are many more function calls in Plone 5.1b2 and main suspect here seems to be Chameleon.

UPDATE 1: Plone 5 includes Chameleon 2.25; I made a test with Chameleon 3.1 but it remained almost unchanged:

Plone 5.1b2 (Chameleon 3.1)

*** PROFILER RESULTS ***
render (/home/hvelarde/collective/newsticker/src/collective/newsticker/tests/test_viewlet.py:149)
function called 1 times

         598817 function calls (547861 primitive calls) in 0.652 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 326 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.655    0.655 test_viewlet.py:149(render)
      100    0.000    0.000    0.655    0.007 simpleviewclass.py:43(__call__)
      100    0.000    0.000    0.654    0.007 pagetemplatefile.py:120(__call__)
      100    0.001    0.000    0.654    0.007 pagetemplatefile.py:47(__call__)
      100    0.001    0.000    0.649    0.006 pagetemplate.py:116(pt_render)
      100    0.001    0.000    0.646    0.006 engine.py:86(__call__)
      100    0.001    0.000    0.645    0.006 pagetemplate.py:128(render)
      100    0.001    0.000    0.643    0.006 template.py:229(render)
      100    0.001    0.000    0.642    0.006 template.py:165(render)
      100    0.005    0.000    0.639    0.006 b2d962c72225cfd129cfe757d1259927.py:23(render)
...

UPDATE 2: I made another test adding Chameleon 3.1 to Plone 4.3.12 and the results show that Chameleon is in fact making Plone slower:

Plone 4.3.12 (Chameleon 3.1)

*** PROFILER RESULTS ***
render (/home/hvelarde/collective/newsticker/src/collective/newsticker/tests/test_viewlet.py:149)
function called 1 times

         267070 function calls (250767 primitive calls) in 0.499 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 319 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.502    0.502 test_viewlet.py:149(render)
      100    0.001    0.000    0.502    0.005 simpleviewclass.py:43(__call__)
      100    0.000    0.000    0.500    0.005 pagetemplatefile.py:120(__call__)
      100    0.002    0.000    0.500    0.005 pagetemplatefile.py:47(__call__)
      100    0.001    0.000    0.493    0.005 pagetemplate.py:116(pt_render)
      100    0.001    0.000    0.487    0.005 engine.py:86(__call__)
      100    0.002    0.000    0.486    0.005 pagetemplate.py:128(render)
      100    0.002    0.000    0.483    0.005 template.py:229(render)
      100    0.002    0.000    0.481    0.005 template.py:165(render)
      100    0.007    0.000    0.479    0.005 b9624fa8db166ff12860b946044af1cd.py:23(render)
...

Can someone explain this?

1 Like

Thanks for sharing this!

One thing to note is that on Plone 4 with Chameleon 3 there are fewer function calls, which is good. Also the difference between Plone 4 with and without Chameleon isn't that much. It's 0.49ms slower per rendering. That could also be by accident.

However, that does not explain why Chameleon isn't significantly faster than without it and why Plone 5 adds so much overhead that it's significantly slower.

Did you run in debug mode?

@hvelarde can you share the full call stack? each of the calls in the stack you posted doesn't consume much time. there must be something after it, which is a bigger time consumer and makes the difference.

no, I can see debug-mode off in zope.conf.

not full, but here you have the first 100 function calls:

now I can tell you that Chameleon is not our worst problem, but all the overhead associated with the behavior-based architecture used in plone.app.contenttypes.

by the time we hit the get_items() function in the viewlet there's already a huge difference among Plone 4.3.12 and Plone 5.1b2:

0.423    0.002 __init__.py:29(get_items)  <-- Plone 4.3.12 (Chameleon 3.1)
0.727    0.004 __init__.py:29(get_items)  <-- Plone 5.1b2 (Chameleon 3.1)

this is bad news.

Let's not jump to conclusions. This looks worth investigating further, but looking just at the two profiles for Plone 4.3, the difference in time for get_items between the version with Chameleon and the one without it (0.423 - 0.356 = 0.067) is greater than the total difference in time between the two (0.517 - 0.491 = 0.026). And there's no reason I can see why get_items should be affected by the use of Chameleon vs PageTemplates. So I don't necessarily conclude from this particular difference that Chameleon is making things slower.

as I said above, the problem is more complex and seem to be related with the behavior-based architecture used in plone.app.contenttypes.

I stopped the music I was playing on my computer and repeated the tests for Plone 4.3.12 adding one more with plone.app.contenttypes 1.1.1 installed and here are the results:

  • Plone 4.3.12: 0.404 seconds
  • Plone 4.3.12 (Chameleon 3.1): 0.359 seconds
  • Plone 4.3.12 (Chameleon 3.1, plone.app.contenttypes 1.1.1): 0.448 seconds

here is the measure at get_items():

0.291    0.001 __init__.py:29(get_items)  <-- Plone 4.3.12
0.292    0.001 __init__.py:29(get_items)  <-- Plone 4.3.12 (Chameleon 3.1)
0.380    0.002 __init__.py:29(get_items)  <-- Plone 4.3.12 (Chameleon 3.1, plone.app.contenttypes 1.1.1)

first 100 calls:

I suggest the FW team to set an automated testing environment to review this.

anyway, the tests ran on Travis don't lie: rendering the viewlet in Plone 5.1b2 with Chameleon seems to be 50% slower than in Plone 4.3.12 without Chameleon, and that's not what we expected.

1 Like

I was able to set builds for all cases on Travis:

https://travis-ci.org/collective/collective.newsticker/builds/214088060

these are the results (the whole function stack call is there):

  • Plone 4.2: 0.432 seconds
  • Plone 4.3: 0.483 seconds
  • Plone 4.3 (five.pt): 0.441 seconds
  • Plone 4.3 (plone.app.contenttypes): 0.654 seconds
  • Plone 5.0: 0.815 seconds
  • Plone 5.1: 0.924 seconds

a couple of conclusions:

  • Chameleon seems to be innocent (at least in Plone 4.3)
  • we have managed to make Plone slower with every new release and we need to fix that
2 Likes

FWIW, here are comparable results with my branch of plone.dexterity without implicit schema attribute default value lookup: https://travis-ci.org/collective/collective.newsticker/builds/214160634

  • Plone 4.3 (plone.app.contenttypes): 0.446 seconds

      278378 function calls (259117 primitive calls) in 0.446 seconds
    
  • Plone 5.0: 0.440 seconds

      260337 function calls (241383 primitive calls) in 0.440 seconds
    
  • Plone 5.1: 0.493 seconds

      308518 function calls (280764 primitive calls) in 0.493 seconds
3 Likes

thanks! those numbers are really awesome! we're still behind Plone 4.3 in terms of performance for this specific test but I think you found the chief cause.

I think is very important that we, as core Plone developers, set up some performance tests as part or our standard PR checks to avoid situations like this in the future and improve the whole QA process.

I don't know if what I did is valid or not; I was looking for examples on how to tests performance and found very few information, so it's probably time to have some unittest layer dedicated to this.

@datakurre it's still cold and cloudy in Jyväskylä? :wink:

1 Like

@hvelarde @datakurre maybe I am misunderstanding, but how would this affect a viewlet that is only (AFAICT) traversing to catalog brains?

Sean

Misunderstanding more: why on Earth is this a good idea: https://github.com/plone/plone.app.contentlisting/blob/master/plone/app/contentlisting/catalog.py#L44 ???

if you check the viewlet code you'll see we're traversing a collection and using its results to feed the template:

https://github.com/collective/collective.newsticker/blob/2.1b1/src/collective/newsticker/browser/init.py#L29-L40

seems that something pretty unperformatic takes place behind the scenes with Dexterity-based content types and behaviors (there's a collection behavior involved in the Collection content type) and @datakurre's branch takes care of it.

check the insane number of calls to getattr in the Plone 5.1 build:

 47304/2100    0.075    0.000    0.882    0.000 {getattr}

Blame the collection and its contentlisting machinery, I think. If you were dealing with just brains, this would be fine, but somehow, plone.app.contentlisting decides that multiple calls to getObject() (versus, say, zero) is a reasonable default (the getattr calls are there, and if they are expensive in your profiling, it is because waking up objects is costly, and a design flaw in plone.app.contentlisting, in my slightly out-of-context opinion).

Sean

But that does not explain the difference between Archetypes and Dexterity.

(Also, the according to the template, it should only ask for attributes available in brain [Title, Description, getURL] and therefore it should not wake up the real objects.)

I am wondering if it might be compounding (or merely exposing) any issues Dexterity has (by actually using the content object)? We would not see this difference in performance if, really, in fact we were only accessing brain objects?

That is what confuses me -- if this was just getting attributes on brains, Dexterity vs. Archetypes should be moot?

I made some a change to the viewlet code to avoid calling the get_items function more than once using a memoizer.

now everything looks better; anyway, Dexterity-based versions is 50% slower in Plone 4.3.

I'm going to start adding this kind of checks to our packages and I think we must do the same in Plone.

thanks for your attention!

1 Like

Well, the context and collection are still different (Archetypes on Plone 4 and Dexterity on Plone 5). Exciting, if that's enough for the observed difference.

@hvelarde Thanks for the test. I believe, it proved my point about that particular feature in Dexterity. Fixing that is another story...

1 Like

Did you set CHAMELEON_CACHE? See Chameleon-Documentation.
If not the Python code is generated each time you call the page template.

no, I didn't set that variable as this was a development deployment; I though the problem was Chameleon, but now I think it's Dexterity's fault.