Profiling Tests in Plone found zope.interface as bottleneck

So, yes, sometimes Plone is slow. But why?

The following outputs are coming from Py-Spy.
The Own-Time is here of interest.
It is what a function takes without time of the functions it calles.
For simpler later calculations I toggled the output to aggregate by-function instead of the default by-line.

It is a buildout.coredev on the 6.0 branch as on 2020-02-18.
I let all tests run on my Thinkpad 490s w/o Robot-tests.
After a normal buildout I did pip install py-spy.
Then I started the tests in py-spy with time py-spy top ./bin/test.
And had to wait.
Following is the output.
It is test-code, so a running Plone in production has different numbers.
Also the numbers differ from run to run by 1-2%, so take it as rough estimates.
But it gives us hints.

%Own   %Total  OwnTime  TotalTime  Function (filename)
0.00%   0.00%   153.9s    153.9s   __hash__ (zope/interface/interface.py)
0.00%   0.00%   146.1s    312.6s   _lookup (zope/interface/adapter.py)
0.00%   0.00%   55.07s     1470s   run_tests (zope/testrunner/runner.py)
0.00%   0.00%   45.54s    45.62s   parse (chameleon/astutil.py)
0.00%   0.00%   45.27s    45.27s   __hash__ (zope/interface/declarations.py)
0.00%   0.00%   28.79s    121.8s   wrapper (chameleon/codegen.py)
0.00%   0.00%   27.33s    65.63s   __get__ (plone/dexterity/content.py)
0.00%   0.00%   24.58s    36.95s   decorator (plone/dexterity/schema.py)
0.00%   0.00%   24.56s    85.15s   visit (chameleon/astutil.py)
0.00%   0.00%   22.08s    55.75s   generic_visit (ast.py)
0.00%   0.00%   21.42s    41.05s   getState (ZODB/serialize.py)
0.00%   0.00%   17.27s    380.1s   queryMultiAdapter (zope/interface/adapter.py)
0.00%   0.00%   17.13s    18.87s   rewrap_in_request_container (Products/CMFPlone/patches/gtbn.py)
0.00%   0.00%   16.28s    16.42s   build (chameleon/loader.py)
0.00%   0.00%   11.54s    21.26s   __call__ (ZODB/utils.py)
0.00%   0.00%   10.99s    11.63s   get (weakref.py)
0.00%   0.00%   10.79s    26.47s   _dump (ZODB/serialize.py)
0.00%   0.00%    9.69s     9.70s   iter_fields (ast.py)
0.00%   0.00%    9.66s    34.84s   loadBefore (ZODB/DemoStorage.py)
0.00%   0.00%    9.34s    45.26s   __getattr__ (plone/dexterity/content.py)
0.00%   0.00%    9.03s    56.51s   adapter_hook (zope/component/hooks.py)
0.00%   0.00%    8.95s    25.43s   getFields (zope/schema/_bootstrapfields.py)
0.00%   0.00%    8.76s    57.94s   visit (ast.py)
0.00%   0.00%    8.75s    12.07s   get (zope/interface/interface.py)
0.00%   0.00%    8.64s     9.54s   __iter__ (plone/registry/recordsproxy.py)
0.00%   0.00%    8.57s     8.57s   _write (chameleon/astutil.py)
0.00%   0.00%    8.45s    13.06s   transformIterable (plone/app/theming/transform.py)
0.00%   0.00%    8.43s    75.38s   __init__ (chameleon/astutil.py)
0.00%   0.00%    8.23s    66.34s   visit (chameleon/codegen.py)
0.00%   0.00%    7.96s     8.98s   names (zope/interface/interface.py)
0.00%   0.00%    7.93s    15.58s   persistent_id (ZODB/serialize.py)
0.00%   0.00%    7.88s    18.05s   __getitem__ (collections/__init__.py)
0.00%   0.00%    7.69s     7.89s   _getOb (OFS/ObjectManager.py)
0.00%   0.00%    7.68s    364.0s   __getattr__ (plone/indexer/wrapper.py)
0.00%   0.00%    7.51s     8.76s   loadBefore (ZODB/MappingStorage.py)
0.00%   0.00%    6.99s    11.75s   lookup_fti (plone/dexterity/schema.py)
0.00%   0.00%    6.69s    19.16s   queryUtility (zope/interface/registry.py)
0.00%   0.00%    6.50s     6.91s   getPhysicalPath (OFS/Traversable.py)
0.00%   0.00%    6.06s     6.11s   load_persistent (ZODB/serialize.py)
0.00%   0.00%    5.77s    10.79s   _subscriptions (zope/interface/adapter.py)
0.00%   0.00%    5.69s     6.02s   store (ZODB/Connection.py)
0.00%   0.00%    5.36s     5.45s   __init__ (ZODB/_compat.py)
0.00%   0.00%    5.30s    32.83s   __iter__ (_collections_abc.py)
0.00%   0.00%    5.09s     6.51s   dedent (textwrap.py)
0.00%   0.00%    4.88s     4.88s   __cmp (zope/interface/interface.py)
0.00%   0.00%    4.80s     4.80s   normalize_lang (zope/i18n/negotiator.py)
0.00%   0.00%    4.71s    80.37s   setstate (ZODB/Connection.py)
0.00%   0.00%    4.70s    43.60s   _store_objects (ZODB/Connection.py)
0.00%   0.00%    4.66s    319.7s   _uncached_lookup (zope/interface/adapter.py)
0.00%   0.00%    4.16s    589.1s   subscribers (zope/interface/adapter.py)
0.00%   0.00%    4.07s    63.28s   __call__ (chameleon/compiler.py)
0.00%   0.00%    3.87s    48.86s   setGhostState (ZODB/serialize.py)
0.00%   0.00%    3.75s     3.77s   __contains__ (weakref.py)
0.00%   0.00%    3.75s     3.75s   records (plone/registry/registry.py)
0.00%   0.00%    3.67s     6.91s   rolesForPermissionOn (AccessControl/ImplPython.py)
0.00%   0.00%    3.66s     3.66s   _new_line (chameleon/astutil.py)

The output of time:

real	25m3,478s
user	24m36,560s
sys	0m50,579s

Let's reduce this to time in zope.interface:

%Own   %Total  OwnTime  TotalTime  Function (filename)
0.00%   0.00%   153.9s    153.9s   __hash__ (zope/interface/interface.py)
0.00%   0.00%   146.1s    312.6s   _lookup (zope/interface/adapter.py)
0.00%   0.00%   45.27s    45.27s   __hash__ (zope/interface/declarations.py)
0.00%   0.00%   17.27s    380.1s   queryMultiAdapter (zope/interface/adapter.py)
0.00%   0.00%    8.75s    12.07s   get (zope/interface/interface.py)
0.00%   0.00%    7.96s     8.98s   names (zope/interface/interface.py)
0.00%   0.00%    6.69s    19.16s   queryUtility (zope/interface/registry.py)
0.00%   0.00%    5.77s    10.79s   _subscriptions (zope/interface/adapter.py)
0.00%   0.00%    4.88s     4.88s   __cmp (zope/interface/interface.py)
0.00%   0.00%    4.66s    319.7s   _uncached_lookup (zope/interface/adapter.py)
0.00%   0.00%    4.16s    589.1s   subscribers (zope/interface/adapter.py)

This sums up to 405.24s only in zope.interface.
Or ~27.5% of the test runtime! WTF!

Now there were optimizations in zope.interface master done by me and others.
What about using zope.interface master in Plone?
It is not well tested by now with Plone and is highly experimental!
Anyway, there are no unsolved issues blocking us from running the tests.

So I added zope.interface to sources and auto-checkout.
Then I run py-spy as above, here the reduced output::

%Own   %Total  OwnTime  TotalTime  Function (filename)
0.00%   0.00%   96.23s    125.7s   _lookup (zope/interface/adapter.py)
0.00%   0.00%   36.35s    36.35s   __hash__ (zope/interface/interface.py)
0.00%   0.00%   14.64s    185.2s   queryMultiAdapter (zope/interface/adapter.py)
0.00%   0.00%    8.10s     8.58s   names (zope/interface/interface.py)
0.00%   0.00%    8.07s    11.29s   get (zope/interface/interface.py)
0.00%   0.00%    6.60s    17.36s   queryUtility (zope/interface/registry.py)
0.00%   0.00%    4.64s    132.3s   _uncached_lookup (zope/interface/adapter.py)
0.00%   0.00%    4.41s     4.41s   __cmp (zope/interface/interface.py)
0.00%   0.00%    4.22s     5.19s   _subscriptions (zope/interface/adapter.py)

real	21m20,264s
user	20m54,982s
sys	0m43,547s

This sums up to 183.26s or ~14.5% of the tests time.
Much better, in fact 2.2x faster, but not good enough.
Component architecture is a great thing, but not if it slows down the program that hard.

What about implementing _lookup in C?

I took my 20 years sleeping C knowledge out of some drawers back in my brain and learned about the Python C-API.
The zope.interface branch implement-_lookup-in-C is the outcome.
It works in Python 3.7.9 at least and is far from finished.
Probably someone with better C-Python skills could get more out of it.
Lets see how it performs (reduced output).

%Own   %Total  OwnTime  TotalTime  Function (filename)
0.00%   0.00%   63.13s    96.56s   _uncached_lookup (zope/interface/adapter.py)
0.00%   0.00%   39.80s    39.80s   __hash__ (zope/interface/interface.py)
0.00%   0.00%   15.42s    156.0s   queryMultiAdapter (zope/interface/adapter.py)
0.00%   0.00%    9.16s    13.12s   get (zope/interface/interface.py)
0.00%   0.00%    9.07s     9.67s   names (zope/interface/interface.py)
0.00%   0.00%    7.47s    17.37s   queryUtility (zope/interface/registry.py)
0.00%   0.00%    6.83s     6.83s   __cmp (zope/interface/interface.py)
0.00%   0.00%    5.10s     6.70s   _mergeOrderings (zope/interface/ro.py)
0.00%   0.00%    5.03s     6.13s   _flatten (zope/interface/ro.py)
0.00%   0.00%    5.03s     6.18s   _subscriptions (zope/interface/adapter.py)

real	23m46,821s
user	23m17,787s
sys	0m52,630s

This sum up to 166s or ~12% of the tests time.
Still a lot, but much better than in current Plone 5.2.1.

More optimizations need to be done!

11 Likes

Really good job!
Thanks for that :slight_smile:

@jensens you rock! Thank you for working on this!!!

Great work. Doesn't jenkins run performance tests? This would help.

I worked on performance tests for Plone core and Plone REST API. I could set them up again. Though, I don't have access to our jenkins.plone.org infrastructure any longer.