What is the meaning of this stack trace

habibcs · November 18, 2016, 7:37pm

what is the meaning of this stack trace?
I am getting many errors like that for all the dynamic contents; error stack trace differ usually but they all end with ClientDisconneted.

Any ideas?

Using Plone 4.3.3, Python 2.7.6

Traceback (innermost last):
Module ZPublisher.Publish, line 138, in publish
Module ZPublisher.mapply, line 77, in mapply
Module ZPublisher.Publish, line 48, in call_object
Module grokcore.view.components, line 142, in call
Module zope.publisher.publish, line 107, in mapply

_ traceback_info _ bound method EditUserProfile.update of hiddensitename.theme.browser.user.EditUserProfile object at 0x7f2554c46890

Module zope.publisher.publish, line 113, in debug_call
Module hiddensitename.theme.browser.user, line 1583, in update
Module UserDict, line 19, in getitem
Module ZODB.Connection, line 860, in setstate
Module ZODB.Connection, line 901, in _setstate
Module ZEO.ClientStorage, line 833, in load
Module ZEO.ClientStorage, line 88, in getattr
ClientDisconnected

dieter · November 19, 2016, 8:14am

While trying to "load" an object from the ZODB, the ZEO "ClientStorage" found out that the connection to the ZEO server has been lost (likely the ZEO server has been shut down or restarted or it died)

habibcs · November 19, 2016, 8:03pm

Dear dieter,

Thanks for your meaningful response.
From your explanation I guess that the zeoserver.log should give me the hints of the actual exception/error message detail otherwise somebody please guide me.

Excuse me that I am new with Plone (and python).

Can you or somebody explain me the entities involved (or could point me to some document that explains the architecture in terms of diagrams) please.

A (an entity) is loading an object (requested to be loaded by the client browser) from ZODB (potentially all resources except some in the file system), and then the ZEO's "ClientStorage" (another entity(ies)) found out that the connection to the ZEO server (an entity) with entity A or ClientStorage's entity has been lost.

Like ZEO server has been shutdown or restarted or terminated by itself or somebody (entity B).

Many thanks

dieter · November 19, 2016, 10:19pm

Plone stores its changing data in a database, the so called "ZODB" (= "Zope Object Database"). The "ZODB" manages a net of persistent objects. When Plone must access such an object, it is (if necessary) automatically loaded from the ZODB. The real storage of the objects is delegated to so called "Storage"s. There are many kinds of storages, among them the "FileStorage" (data is stored in a set of OS files) and the "ZEO.ClientStorage" (data is managed by a ZEO (=" Zope Enterprise Objects") server). Read more about the ZODB at https://en.wikipedia.org/wiki/Zope_Object_Database

Your Plone has been set up to use "ZEO". In such a case, the "ZODB" used by Plone delegates loads and stores of persistent objects to a "ZEO.ClientStorage". It, in turn, opens a connection to the ZEO server for communication. In your case, the "ClientStorage" detects that it has lost its connection to the ZEO server.

habibcs · November 19, 2016, 10:36pm

dieter,
Many thanks for your explanation.

I noticed that ZEO server was not running behind the scene and therefore the many error logs/messages reported; so it was died/terminated.

We have two instances and worker (3 altogether) + ZEO server configured.

Do these process STATS look normal? (ps aux --sort -rss)
We have 6vCPUs and 16GB RAM (its a ubuntu virtual machine in a cloud)
I feel they are consuming a lot of CPU and the RAM - And if my guess is correct, what are the suggestions to look for possible issues/configuration and the respective recommendations?

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
mysql 1152 0.1 19.5 5607716 3207380 ? Ssl Nov18 3:32 /usr/sbin/mysqld
ubuntu 12719 23.3 16.8 3238264 2768196 ? Sl 22:05 3:52 /data/WSHidden/bin/python /data/WSHidden/parts/instance2/bin/interpreter /data/WSHidden/buildout-cache/eggs/Zope2-2.13.22-py2.7.egg/Zope2/Startup/run.py -C /data/WSHi
ubuntu 12705 12.4 10.8 2469920 1778172 ? Sl 22:05 2:04 /data/WSHidden/bin/python /data/WSHidden/parts/instance/bin/interpreter /data/WSHidden/buildout-cache/eggs/Zope2-2.13.22-py2.7.egg/Zope2/Startup/run.py -C /data/WSHid
ubuntu 12700 1.1 1.0 829320 179052 ? Sl 22:05 0:11 /data/WSHidden/bin/python /data/WSHidden/parts/worker/bin/interpreter /data/WSHidden/buildout-cache/eggs/Zope2-2.13.22-py2.7.egg/Zope2/Startup/run.py -C /data/WSHid
ubuntu 12695 7.1 0.3 524080 52928 ? Sl 22:05 1:11 /data/WSHidden/bin/python /data/WSHidden/buildout-cache/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZEO/runzeo.py -C /data/WSHidden/parts/zeo/etc/zeo.conf
ubuntu 12693 0.0 0.0 177176 14876 ? Ssl 22:05 0:00 /data/WSHidden/bin/python ./bin/zeo -S /data/WSHidden/buildout-cache/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZEO/zeoctl.xml -C /data/WSHidden/parts/zeo/etc/zeo.con
ubuntu 12698 0.0 0.0 148960 7440 ? Ssl 22:05 0:00 /data/WSHidden/bin/python /data/WSHidden/parts/worker/bin/interpreter /data/WSHidden/buildout-cache/eggs/zdaemon-2.0.7-py2.7.egg/zdaemon/zdrun.py -S /data/WSHidden
ubuntu 12703 0.0 0.0 148960 7440 ? Ssl 22:05 0:00 /data/WSHidden/bin/python /data/WSHidden/parts/instance/bin/interpreter /data/WSHidden/buildout-cache/eggs/zdaemon-2.0.7-py2.7.egg/zdaemon/zdrun.py -S /data/WSHidde
ubuntu 12717 0.0 0.0 148960 7440 ? Ssl 22:05 0:00 /data/WSHidden/bin/python /data/WSHidden/parts/instance2/bin/interpreter /data/WSHidden/buildout-cache/eggs/zdaemon-2.0.7-py2.7.egg/zdaemon/zdrun.py -S /data/WSHidd
www-data 1301 0.0 0.0 441016 5300 ? Sl Nov18 0:42 /usr/sbin/apache2 -k start
www-data 1300 0.0 0.0 441048 5292 ? Sl Nov18 0:42 /usr/sbin/apache2 -k start
ubuntu 12472 0.0 0.0 22164 4764 pts/1 Ss+ 21:45 0:00 -bash

dieter · November 20, 2016, 10:35am

CPU time and memory consumption are high. Whether this is to be expected depends on what your instances are doing.

The available data may indicate that you are looking at a starting up situation. In this situation, Plone processes take a lot of CPU time. On the other hand, a memory consumption of 3.2 GB would be strange at startup (while it might be expected after a Plone process has run for a long time).

Much more and a much deeper analysis would be necessary to determine whether something strange happens in your installation.

habibcs · November 20, 2016, 9:45pm

Thank you dieter.

Its kind of startup situation, CPU usage does stabilize but still varies between 20 to 80%.
But within half hour instance1 and instance2 are keeping 4GB memory each and 2GB my mysql for my server having 16GB in total; and the memory usage is increasing.

[quote="dieter, post:6, topic:3059"]
Much more and a much deeper analysis would be necessary to determine whether something strange happens in your installation.[/quote]
True; any pointers?

habibcs · November 21, 2016, 8:47am

My two instances both keeping at least 4GB in memory consistently and another 3GB by mySQL and then some other processes; its taking up the whole memory in the server.
I am not sure where to find the leaks with the plone instance services.. I feel its the programming/configuration problem not the Plone's default behavior; we do not have very high traffic as of yet.

Thanks

jensens · November 21, 2016, 9:10am

Without in-depth information about configuration of Plone it is difficult to give any advice. You may want to give us more information here.

Usually vanilla Plone is between 200 and 300 MB per instance. Large sites should not take more than 1GB. But this depends heavily on your configuration (mostly configuration of caches).

Some years ago I wrote a short About Instances and Threads, Performance and RAM consumption in order to help people to understand what's going on behind the scenes.

dieter · November 21, 2016, 9:20am

I have started up my Plone (4.x) development instance. It took 20 s CPU time for startup and used 130 MB of virtual memory after startup. Thus, that you Plone instance used about 3.7 GB of memory after startup seems to indicate that it is quite special.

In general, it is very difficult to analyse memory use, especially memory use in long running processes (where some small leaks may over time cause a massive memory loss). In the past, I have not always succeeded with those analysis.

Analysing high memory use already after startup should however be simpler. Some non-standard component of your system should be responsible. I would proceed in the following way: start with a vanilla Plone installation and check its memory use. Successively add components to your system and recheck the memory use (I would not add one component at a time but apply some form of logarithmic search for the responsible component). This should reveal which componet is responsible for the high memory consumption.

In one of my past installations, I had incorporated a "Java Virtual Machine" ("JVM") in my Zope process. Of course, in this setup, the process already started with high memory use as the JVM has preallocated its Java memory. I suppose that an external component like this (though it may not be a "JVM") is responsible for your high memory use after startup. But which component this is, must be analysed in your setup.

habibcs · November 21, 2016, 9:22am

Dear Jensens,
Thanks for your message. I will read your article.
You might be right that there could be issues with cache configuration. We have also configured caching proxy which is running on another server so that it caches most of the stuff for good duration and that has helped us with some performance issues. But I agree with you, given your experience and mentions, it looks like my side should not take more than 1 or 2 GB memory as opposed to 8 GB.

Let me what more information should I share here (on public) (and perhaps what can I share with you directly?); both for cache configuration and other relevant configuration that could give us more insights into the memory/performance issue.

Also let me know if you guys think, its better to create a new topic/thread on this matter (suggesting the new topic name).

Regards, Habib

dieter · November 21, 2016, 9:50am

You need not necessarily be worried that all your RAM resources are used. Modern operating systems try to use all available RAM - for non essential staff (like buffers and caches) if essential staff does not use it all. Thus, seeing that your RAM is used to 100 % is no sign to worry about. You should start worrying when some processes start to dramatically slow down due to a large number of page faults. You can get hints towards such situations from a "top" display looking at the "%wa" information (it indicates the "waiting" percentage).

habibcs · November 21, 2016, 10:07am

Thanks dieter,

You are right. But I see that the memory starts to grow, from 4GB in say half hour to more than 10GB (2 instances+mysql) in couple of hours (as I noticed overnight). My Plone frequently gives gateway timeout specially for the logged in users (where there is no caching applied) and I see free memory is 200 megs - it might be tripping due to no memory available but I am not sure.

Good but I am talking issues with the live / production system unfortunately.
The company owner got it developed by some freelancers; and I just joined them to only support, plus I am not a Plone or Python engineer, but supporting them.

What I notice is that there are number of python scripts floating around. Some are ran inside plone/zope's python virtualenv and a couple of them outside.

Yes, Where should I begin?

zeo.conf?
buildout.cfg
?

dieter · November 21, 2016, 10:56am

Maybe, the developers should get involved to explain the memory consumption (at least one of them should have some hints what uses that much memory). I do not think that someone new to Plone/Python is optimal to solve those issues by himself.

"zeo,conf" configures the ZEO server. This server process almost never gives issues with RAM usage. Your memory consumption comes from the Plone(/Zope) processes.

Looking at "buildout.cfg" is a good starting point to determine the differences between your special Plone installation and a "standard" Plone installation. A "buildout" configuration file consists of sections and definitions in those sections. Most interesting to determine special components is the "eggs" definition (usually in the "buildout" section). In my instance, it looks like`

eggs += 
      plone.reload
      dm.zdoc
      Products.PDBDebugMode
      Products.Ploneboard
      dm.xmlsec.binding
      ulif.openoffice

This indicates: use the standard Plone "eggs" (= components) and add plone.reload, dm.zdoc, ...

Note that a "buildout" configuration often uses extension (extends definition in section buildout) to add new definitions and/or sections to base configuration files. In the example eggs += ... above, the base configuration files define to what eggs collection the ... eggs are added. To determine the special components (aka "eggs") of your installation, you may need to look at the complete "buildout" configuration structure, not only "buildout.cfg".