Too Many Open Files with pas.plugins.ldap

Hi folks. I've got a Plone-5.1.5 site with 2 or 3 simultaneous users who routinely experience getting their login sessions being "kicked out" and an inability to log back in for a while. These users are backed by an LDAP server (Apache DS) using pas.plugins.ldap :slightly_frowning_face:

The error log shows either INVALID_CREDENTIALS: {'info': u'INVALID_CREDENTIALS: Bind failed: null', 'desc': u'Invalid credentials'} or SERVER_DOWN: {u'info': 'Too many open files', 'errno': 24, 'desc': u"Can't contact LDAP server"}. I'm thinking the former is a red herring, and the "Too many open files" is the actual problem. :thinking:

If I run Plone in isolation and keep lsof -p PID running on it with a constant curl --cookie __ac="…" http://localhost/…/folder_contents on the site, sure enough I see the file count rise up and up and up, especially TCP connections to my LDAP server hanging around in the ESTABLISHED state.

Why isn't pas.plugins.ldap re-using or closing these connections? :angry:

Note that there is a memcached server in use and indeed if I telnet into and check stats I see a nice constant number of objects in it. But Plone+Zope+pas.plugins.ldap insists on opening new LDAP connections :face_with_raised_eyebrow:

Versions:

  • Products.CMFPlone-5.1.5
  • pas.plugins.ldap-1.7.2
  • python_ldap-3.2.0
  • node.ext.ldap-1.0b10

I don’t see this happening in an environment where we have multiple Plone sites using plas.plugins.ldap 1.5.4 in Plone 4 and Plone 5.2 with quite some users logged in per site. As in: I have not received any complaints about user logins for the last 2-3 years.

But the LDAP backend is MS AD (1500-2000 users).

I plan to upgrade the plugin in the 5.2 site shortly.

Maybe ApacheDS is doing something funny with not closing/allowing to re-use the connection? I guess if it was an obvious bug in pas.plugins.ldap other setups would have experience similar issues earlier.

I’ll try to redo your tests with this setup and see what happens with the open files.

In general we increase max open files per process anyway in our Plone stacks for the user running: nginx, varnish and load balancer use a lot of FD’s and the default open files limit of 1024 on some linux distributions is too low.

Thanks @fredvd.

In the intervening time I've tried:

  • Setting up an OpenLDAP 2.4.50 installation, migrating all of the ApacheDS data into it (2000 users, 500 groups), installing identical indexes, and switching Plone to use it :x:
  • Migrating the site from Plone 5.1.5 to 5.2.1 :x:
  • Testing on an empty Plone 5.2.1 site with no local code at all :x:

ApacheDS doesn't support the memberOf attribute, so on the off chance that was the issue I enabled the memberOf-overlay on OpenLDAP and checked the corresponding box on pas.plugins.ldap and it still failed :x:

All it takes to hit the errno 24 "Too many open files" limit is to have LDAP on and hit /Plone/folder_contents in three concurrent curl loops (with --cookie _ac="…"), and it doesn't take long either—about a minute on my Mac.

I'm becoming more convinced this an issue in pas.plugins.ldap (or its dependencies, perhaps node.ext.ldap) because this is the first time the problem has shown up for us since we were using Plone 4 with Products.LDAPUserFolder and friends with the same LDAP server just fine.

(I don't have access to Microsoft Active Directory so I can't compare there.)

I'll see if I can't narrow it down, but in the mean time I've got angry bosses to placate :sweat_smile:

I believe we have just run into this same problem with a site we recently upgraded to Plone 5.2.1 and switched to pas.plugins.ldap.

Have you discovered anything further since your last comment?

Hi @abosio: Sadly nothing new. I've been busy with other priorities.

My bosses just chalk it up to "Well that's Plone being Plone."

On the plus side, Plone 5.2's use of Waitress instead of ZServer makes things a lot more "peppy", so even when there are lots of concurrent transactions (like an image-heavy page with an image-heavy portlet), the issue isn't quite as bad. I still see the number of open files rise sharply during load.

We had similar problems. After replacing pythons pylibmc with python-memcached the Can't contact LDAP server error disappeared.

1 Like

Would be great to have a bug report at https://github.com/collective/pas.plugins.ldap/issues with the collected knowledge from this thread - any takers?

Done: https://github.com/collective/pas.plugins.ldap/issues/106

1 Like

Today i get the many,many Tracebacks:

2021-02-23 09:37:04,653 WARNING [waitress:321][MainThread] server accept() threw an exception
Traceback (most recent call last):
  File "/opt/plone/buildout-cache/eggs/waitress-1.4.4-py3.6.egg/waitress/server.py", line 311, in handle_accept
    v = self.accept()
  File "/opt/plone/buildout-cache/eggs/waitress-1.4.4-py3.6.egg/waitress/wasyncore.py", line 420, in accept
    conn, addr = self.socket.accept()
  File "/usr/lib/python3.6/socket.py", line 205, in accept
    fd, addr = self._accept()
OSError: [Errno 24] Too many open files

Now i have enable memcached for ldap queries and wait what happend

Never operate pas.plugin.ldap without caching.

But the bug is valid, must be something with the bind in here.

ldap.ldapobject.ReconnectLDAPObject should handle closing the connections?