Waitress queue issues

hietpasd · September 14, 2021, 4:27pm

Hi,

I am running Plone 5.2.4. I have 2 zeoclients setup behind Varnish and Apache.

I have a very high traffic site and I am getting this constantly. It keeps resulting in 500 errors being thrown at users. I previously had Plone 5.0.2 and never ever had an issue keeping up with traffic.

2021-09-14 11:13:56,864 WARNING [waitress.queue:116][MainThread] Task queue depth is 1
2021-09-14 11:13:56,903 WARNING [waitress.queue:116][MainThread] Task queue depth is 1
2021-09-14 11:13:56,907 WARNING [waitress.queue:116][MainThread] Task queue depth is 2
2021-09-14 11:13:56,914 WARNING [waitress.queue:116][MainThread] Task queue depth is 3
2021-09-14 11:13:56,938 WARNING [waitress.queue:116][MainThread] Task queue depth is 3
2021-09-14 11:13:56,979 WARNING [waitress.queue:116][MainThread] Task queue depth is 4
2021-09-14 11:13:57,205 WARNING [waitress.queue:116][MainThread] Task queue depth is 5
2021-09-14 11:13:57,206 WARNING [waitress.queue:116][MainThread] Task queue depth is 6
2021-09-14 11:13:57,206 WARNING [waitress.queue:116][MainThread] Task queue depth is 7
2021-09-14 11:13:57,206 WARNING [waitress.queue:116][MainThread] Task queue depth is 8

Thanks,
David

tschorr · September 15, 2021, 7:32am

That's an odd error code. I would expect a 503. What's in your Plone and Apache error logs? Also the waitress output shows only warnings, in my experience waitress can still serve pages without errors when the queue depth is > 10.
Next and before taking any other measures I would try to identify the long running/error requests from the logs.

tschorr · September 15, 2021, 7:51am

There's haufe.requestmonitoring · PyPI that can help you identify long running requests.

jensens · September 15, 2021, 8:19am

This warning is just a warning, no request is dropped. Filled up queues are probably resulting in timeouts somewhere between webserver/cache/loadbalancer and then into 500's?

In any case I would increase the amount of threads or the amount of ZEO clients. Rule of thumb:

2 or 3 threads per ZEO-client are usually fine (depends on application and server),
one ZEO client per virtual CPU core.
Monitor your request/s-, CPU- and IO-load and increase/decrease accordingly.