Waitress queue issues

Hi,

I am running Plone 5.2.4. I have 2 zeoclients setup behind Varnish and Apache.

I have a very high traffic site and I am getting this constantly. It keeps resulting in 500 errors being thrown at users. I previously had Plone 5.0.2 and never ever had an issue keeping up with traffic.

2021-09-14 11:13:56,864 WARNING [waitress.queue:116][MainThread] Task queue depth is 1
2021-09-14 11:13:56,903 WARNING [waitress.queue:116][MainThread] Task queue depth is 1
2021-09-14 11:13:56,907 WARNING [waitress.queue:116][MainThread] Task queue depth is 2
2021-09-14 11:13:56,914 WARNING [waitress.queue:116][MainThread] Task queue depth is 3
2021-09-14 11:13:56,938 WARNING [waitress.queue:116][MainThread] Task queue depth is 3
2021-09-14 11:13:56,979 WARNING [waitress.queue:116][MainThread] Task queue depth is 4
2021-09-14 11:13:57,205 WARNING [waitress.queue:116][MainThread] Task queue depth is 5
2021-09-14 11:13:57,206 WARNING [waitress.queue:116][MainThread] Task queue depth is 6
2021-09-14 11:13:57,206 WARNING [waitress.queue:116][MainThread] Task queue depth is 7
2021-09-14 11:13:57,206 WARNING [waitress.queue:116][MainThread] Task queue depth is 8

Thanks,
David

That's an odd error code. I would expect a 503. What's in your Plone and Apache error logs? Also the waitress output shows only warnings, in my experience waitress can still serve pages without errors when the queue depth is > 10.
Next and before taking any other measures I would try to identify the long running/error requests from the logs.

There's haufe.requestmonitoring ยท PyPI that can help you identify long running requests.

This warning is just a warning, no request is dropped. Filled up queues are probably resulting in timeouts somewhere between webserver/cache/loadbalancer and then into 500's?

In any case I would increase the amount of threads or the amount of ZEO clients. Rule of thumb:

  • 2 or 3 threads per ZEO-client are usually fine (depends on application and server),
  • one ZEO client per virtual CPU core.
  • Monitor your request/s-, CPU- and IO-load and increase/decrease accordingly.

Plone Foundation Code of Conduct