Why is Plone 4.3 faster than Plone 5.1?

hvelarde · March 22, 2017, 3:04pm

Yesterday I was trying to find a way to add some performance tests to one of our add-ons and I came up with this:

I added one test to measure how long it takes to render 100 times the viewlet included in this package and I set an arbitrary value of 0.5 seconds as the limit.

To my surprise, tests are passing for Plone 4.2 and Plone 4.3 and fail in both Plone 5.0 and 5.1:

https://travis-ci.org/collective/collective.newsticker/builds/213848036

This is what I get locally on my machine:

Plone 4.3.12

*** PROFILER RESULTS ***
render (/home/hvelarde/collective/newsticker/src/collective/newsticker/tests/test_viewlet.py:149)
function called 1 times

         307779 function calls (290776 primitive calls) in 0.451 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 365 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.455    0.455 test_viewlet.py:149(render)
      100    0.000    0.000    0.454    0.005 simpleviewclass.py:43(__call__)
      100    0.000    0.000    0.453    0.005 pagetemplatefile.py:120(__call__)
      100    0.001    0.000    0.453    0.005 pagetemplatefile.py:47(__call__)
      100    0.001    0.000    0.448    0.004 pagetemplate.py:116(pt_render)
      100    0.001    0.000    0.444    0.004 pagetemplate.py:234(__call__)
      100    0.000    0.000    0.440    0.004 talinterpreter.py:267(__call__)
  600/100    0.004    0.000    0.440    0.004 talinterpreter.py:328(interpret)
      100    0.000    0.000    0.438    0.004 talinterpreter.py:850(do_condition)
1100/1000    0.002    0.000    0.388    0.000 tales.py:691(evaluate)
...

Plone 5.1b2

*** PROFILER RESULTS ***
render (/home/hvelarde/collective/newsticker/src/collective/newsticker/tests/test_viewlet.py:149)
function called 1 times

         598617 function calls (547661 primitive calls) in 0.661 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 326 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.664    0.664 test_viewlet.py:149(render)
      100    0.000    0.000    0.664    0.007 simpleviewclass.py:43(__call__)
      100    0.000    0.000    0.663    0.007 pagetemplatefile.py:120(__call__)
      100    0.001    0.000    0.663    0.007 pagetemplatefile.py:47(__call__)
      100    0.001    0.000    0.658    0.007 pagetemplate.py:116(pt_render)
      100    0.001    0.000    0.654    0.007 engine.py:86(__call__)
      100    0.001    0.000    0.653    0.007 pagetemplate.py:128(render)
      100    0.001    0.000    0.652    0.007 template.py:209(render)
      100    0.001    0.000    0.650    0.007 template.py:165(render)
      100    0.005    0.000    0.648    0.006 d13e2e5858eac1ff5089660f6bd49440.py:22(render)
...

As you can see, there are many more function calls in Plone 5.1b2 and main suspect here seems to be Chameleon.

UPDATE 1: Plone 5 includes Chameleon 2.25; I made a test with Chameleon 3.1 but it remained almost unchanged:

Plone 5.1b2 (Chameleon 3.1)

*** PROFILER RESULTS ***
render (/home/hvelarde/collective/newsticker/src/collective/newsticker/tests/test_viewlet.py:149)
function called 1 times

         598817 function calls (547861 primitive calls) in 0.652 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 326 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.655    0.655 test_viewlet.py:149(render)
      100    0.000    0.000    0.655    0.007 simpleviewclass.py:43(__call__)
      100    0.000    0.000    0.654    0.007 pagetemplatefile.py:120(__call__)
      100    0.001    0.000    0.654    0.007 pagetemplatefile.py:47(__call__)
      100    0.001    0.000    0.649    0.006 pagetemplate.py:116(pt_render)
      100    0.001    0.000    0.646    0.006 engine.py:86(__call__)
      100    0.001    0.000    0.645    0.006 pagetemplate.py:128(render)
      100    0.001    0.000    0.643    0.006 template.py:229(render)
      100    0.001    0.000    0.642    0.006 template.py:165(render)
      100    0.005    0.000    0.639    0.006 b2d962c72225cfd129cfe757d1259927.py:23(render)
...

UPDATE 2: I made another test adding Chameleon 3.1 to Plone 4.3.12 and the results show that Chameleon is in fact making Plone slower:

Plone 4.3.12 (Chameleon 3.1)

*** PROFILER RESULTS ***
render (/home/hvelarde/collective/newsticker/src/collective/newsticker/tests/test_viewlet.py:149)
function called 1 times

         267070 function calls (250767 primitive calls) in 0.499 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 319 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.502    0.502 test_viewlet.py:149(render)
      100    0.001    0.000    0.502    0.005 simpleviewclass.py:43(__call__)
      100    0.000    0.000    0.500    0.005 pagetemplatefile.py:120(__call__)
      100    0.002    0.000    0.500    0.005 pagetemplatefile.py:47(__call__)
      100    0.001    0.000    0.493    0.005 pagetemplate.py:116(pt_render)
      100    0.001    0.000    0.487    0.005 engine.py:86(__call__)
      100    0.002    0.000    0.486    0.005 pagetemplate.py:128(render)
      100    0.002    0.000    0.483    0.005 template.py:229(render)
      100    0.002    0.000    0.481    0.005 template.py:165(render)
      100    0.007    0.000    0.479    0.005 b9624fa8db166ff12860b946044af1cd.py:23(render)
...

Can someone explain this?

thet · March 22, 2017, 9:26pm

Thanks for sharing this!

One thing to note is that on Plone 4 with Chameleon 3 there are fewer function calls, which is good. Also the difference between Plone 4 with and without Chameleon isn't that much. It's 0.49ms slower per rendering. That could also be by accident.

However, that does not explain why Chameleon isn't significantly faster than without it and why Plone 5 adds so much overhead that it's significantly slower.

Did you run in debug mode?

thet · March 22, 2017, 9:33pm

@hvelarde can you share the full call stack? each of the calls in the stack you posted doesn't consume much time. there must be something after it, which is a bigger time consumer and makes the difference.

hvelarde · March 23, 2017, 12:02am

no, I can see debug-mode off in zope.conf.

not full, but here you have the first 100 function calls:

Plone 4.3.12: Plone 4.3.12 - Pastebin.com
Plone 4.3.12 (Chameleon 3.1): Plone 4.3 (Chameleon 3.1) - Pastebin.com
Plone 5.1b2 (Chameleon 2.24): Plone 5.1b2 - Pastebin.com
Plone 5.1b2 (Chameleon 3.1): Plone 5.1b2 (Chameleon 3.1) - Pastebin.com

now I can tell you that Chameleon is not our worst problem, but all the overhead associated with the behavior-based architecture used in plone.app.contenttypes.

by the time we hit the get_items() function in the viewlet there's already a huge difference among Plone 4.3.12 and Plone 5.1b2:

0.423    0.002 __init__.py:29(get_items)  <-- Plone 4.3.12 (Chameleon 3.1)
0.727    0.004 __init__.py:29(get_items)  <-- Plone 5.1b2 (Chameleon 3.1)

this is bad news.

davisagli · March 23, 2017, 12:48am

Let's not jump to conclusions. This looks worth investigating further, but looking just at the two profiles for Plone 4.3, the difference in time for get_items between the version with Chameleon and the one without it (0.423 - 0.356 = 0.067) is greater than the total difference in time between the two (0.517 - 0.491 = 0.026). And there's no reason I can see why get_items should be affected by the use of Chameleon vs PageTemplates. So I don't necessarily conclude from this particular difference that Chameleon is making things slower.

hvelarde · March 23, 2017, 1:06am

as I said above, the problem is more complex and seem to be related with the behavior-based architecture used in plone.app.contenttypes.

I stopped the music I was playing on my computer and repeated the tests for Plone 4.3.12 adding one more with plone.app.contenttypes 1.1.1 installed and here are the results:

Plone 4.3.12: 0.404 seconds
Plone 4.3.12 (Chameleon 3.1): 0.359 seconds
Plone 4.3.12 (Chameleon 3.1, plone.app.contenttypes 1.1.1): 0.448 seconds

here is the measure at get_items():

0.291    0.001 __init__.py:29(get_items)  <-- Plone 4.3.12
0.292    0.001 __init__.py:29(get_items)  <-- Plone 4.3.12 (Chameleon 3.1)
0.380    0.002 __init__.py:29(get_items)  <-- Plone 4.3.12 (Chameleon 3.1, plone.app.contenttypes 1.1.1)

first 100 calls:

Plone 4.3.12: http://pastebin.com/tvKh6fFw
Plone 4.3.12 (Chameleon 3.1): http://pastebin.com/tXa1uFUG
Plone 4.3.12 (Chameleon 3.1, plone.app.contenttypes 1.1.1): http://pastebin.com/ZvzDSjyH

I suggest the FW team to set an automated testing environment to review this.

anyway, the tests ran on Travis don't lie: rendering the viewlet in Plone 5.1b2 with Chameleon seems to be 50% slower than in Plone 4.3.12 without Chameleon, and that's not what we expected.

hvelarde · March 23, 2017, 2:23am

I was able to set builds for all cases on Travis:

https://travis-ci.org/collective/collective.newsticker/builds/214088060

these are the results (the whole function stack call is there):

Plone 4.2: 0.432 seconds
Plone 4.3: 0.483 seconds
Plone 4.3 (five.pt): 0.441 seconds
Plone 4.3 (plone.app.contenttypes): 0.654 seconds
Plone 5.0: 0.815 seconds
Plone 5.1: 0.924 seconds

a couple of conclusions:

Chameleon seems to be innocent (at least in Plone 4.3)
we have managed to make Plone slower with every new release and we need to fix that

datakurre · March 23, 2017, 8:55am

FWIW, here are comparable results with my branch of plone.dexterity without implicit schema attribute default value lookup: Travis CI - Test and Deploy with Confidence

Plone 4.3 (plone.app.contenttypes): 0.446 seconds

  278378 function calls (259117 primitive calls) in 0.446 seconds

Plone 5.0: 0.440 seconds

  260337 function calls (241383 primitive calls) in 0.440 seconds

Plone 5.1: 0.493 seconds

  308518 function calls (280764 primitive calls) in 0.493 seconds

hvelarde · March 23, 2017, 3:55pm

thanks! those numbers are really awesome! we're still behind Plone 4.3 in terms of performance for this specific test but I think you found the chief cause.

I think is very important that we, as core Plone developers, set up some performance tests as part or our standard PR checks to avoid situations like this in the future and improve the whole QA process.

I don't know if what I did is valid or not; I was looking for examples on how to tests performance and found very few information, so it's probably time to have some unittest layer dedicated to this.

@datakurre it's still cold and cloudy in Jyväskylä?

seanupton · March 24, 2017, 3:41pm

@hvelarde @datakurre maybe I am misunderstanding, but how would this affect a viewlet that is only (AFAICT) traversing to catalog brains?

Sean

seanupton · March 24, 2017, 3:57pm

Misunderstanding more: why on Earth is this a good idea: plone.app.contentlisting/plone/app/contentlisting/catalog.py at master · plone/plone.app.contentlisting · GitHub ???

hvelarde · March 24, 2017, 3:59pm

if you check the viewlet code you'll see we're traversing a collection and using its results to feed the template:

https://github.com/collective/collective.newsticker/blob/2.1b1/src/collective/newsticker/browser/init.py#L29-L40

seems that something pretty unperformatic takes place behind the scenes with Dexterity-based content types and behaviors (there's a collection behavior involved in the Collection content type) and @datakurre's branch takes care of it.

check the insane number of calls to getattr in the Plone 5.1 build:

 47304/2100    0.075    0.000    0.882    0.000 {getattr}

seanupton · March 24, 2017, 4:28pm

Blame the collection and its contentlisting machinery, I think. If you were dealing with just brains, this would be fine, but somehow, plone.app.contentlisting decides that multiple calls to getObject() (versus, say, zero) is a reasonable default (the getattr calls are there, and if they are expensive in your profiling, it is because waking up objects is costly, and a design flaw in plone.app.contentlisting, in my slightly out-of-context opinion).

Sean

datakurre · March 24, 2017, 6:11pm

But that does not explain the difference between Archetypes and Dexterity.

(Also, the according to the template, it should only ask for attributes available in brain [Title, Description, getURL] and therefore it should not wake up the real objects.)

seanupton · March 24, 2017, 6:35pm

I am wondering if it might be compounding (or merely exposing) any issues Dexterity has (by actually using the content object)? We would not see this difference in performance if, really, in fact we were only accessing brain objects?

That is what confuses me -- if this was just getting attributes on brains, Dexterity vs. Archetypes should be moot?

hvelarde · March 24, 2017, 6:39pm

I made some a change to the viewlet code to avoid calling the get_items function more than once using a memoizer.

now everything looks better; anyway, Dexterity-based versions is 50% slower in Plone 4.3.

I'm going to start adding this kind of checks to our packages and I think we must do the same in Plone.

thanks for your attention!

datakurre · March 24, 2017, 7:03pm

Well, the context and collection are still different (Archetypes on Plone 4 and Dexterity on Plone 5). Exciting, if that's enough for the observed difference.

@hvelarde Thanks for the test. I believe, it proved my point about that particular feature in Dexterity. Fixing that is another story...

icemac · March 27, 2017, 11:24am

Did you set CHAMELEON_CACHE? See Chameleon-Documentation.
If not the Python code is generated each time you call the page template.

hvelarde · March 27, 2017, 1:38pm

no, I didn't set that variable as this was a development deployment; I though the problem was Chameleon, but now I think it's Dexterity's fault.