Ulimit open files

I am not the sysadmin here but I am looking for some information I can pass on to my sysadmin. I ran into the below when upgrading a site:

2024-05-10 16:58:03,502 ERROR   [waitress:435][waitress-0] Exception while serving [site]/manage_doUpgrades
Traceback (most recent call last):
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/waitress/channel.py", line 428, in service
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/waitress/task.py", line 168, in service
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/waitress/task.py", line 434, in execute
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/paste/translogger.py", line 69, in __call__
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/ZPublisher/httpexceptions.py", line 30, in __call__
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/ZPublisher/WSGIPublisher.py", line 389, in publish_module
  File "/usr/lib64/python3.11/contextlib.py", line 144, in __exit__
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/ZPublisher/WSGIPublisher.py", line 239, in transaction_pubevents
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/ZPublisher/WSGIPublisher.py", line 61, in reraise
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/ZPublisher/WSGIPublisher.py", line 187, in transaction_pubevents
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/transaction/_manager.py", line 257, in commit
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/transaction/_manager.py", line 134, in commit
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/transaction/_transaction.py", line 283, in commit
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/transaction/_compat.py", line 50, in reraise
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/transaction/_transaction.py", line 274, in commit
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/transaction/_transaction.py", line 457, in _commitResources
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/transaction/_compat.py", line 50, in reraise
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/transaction/_transaction.py", line 429, in _commitResources
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/ZODB/Connection.py", line 481, in tpc_begin
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/ZODB/mvccadapter.py", line 206, in tpc_begin
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/ZEO/ClientStorage.py", line 912, in tpc_begin
  File "/plone/prod/venvs/portals/lib64/python3.11/site-packages/ZEO/TransactionBuffer.py", line 38, in __init__
  File "/usr/lib64/python3.11/tempfile.py", line 657, in TemporaryFile
  File "/usr/lib64/python3.11/tempfile.py", line 650, in opener
  File "/usr/lib64/python3.11/tempfile.py", line 256, in _mkstemp_inner
OSError: [Errno 24] Too many open files: '/tmp/tmpcxeubg_o.tbuf'

The number of open files ulimit has was increased, which solves the immediate problem. I haven't been able to find any recommendation on what this should be for a server running X wsgi clients, Y ZEO servers, Z plone sites. What are the factors that would lead to open files (which I know includes sockets)?

It probably makes sense to get a sum of, for each PID of each process (instance, zeo) in production the output of:

lsof | grep $PID | grep -P "^(\w+\s+){3}\d+\D+" | wc -l

(the vast majority of lines from lsof for any given process are .so files that do not, to my understanding, count toward Linux process limits).

On top of that, I would assume you need to be able to handle a number 3:1 of file descriptors per publisher thread for: (a) TCP connections to waitress/wsgi, plus (b) some additional for buffered waiting HTTP connections on a busy site, plus (c) an assumption that at any given time, any publisher thread could have a file descriptor open for a BLOB.

Your sysadmin should be able to list each of your processes respectively in /proc/${PID}/fd to get a count per process. You should also look at /proc/${PID}/limits to verify what the FD limit actually is per process, and you should investigate whether this is a per-process or per-user limit.

There may be some other things I am forgetting here.

I've generally run fine with limits of 1024 per process and 65536 per user, but you probably need much less for the latter (or for either, I think).