Hi!
I wanted to share with you all the issues I've had to face lately when dealing with intranets connected with Products.PloneLDAP to LDAPs with a large user and group base. I work at the Barcelona Tech University and we have lots of Plone sites connected to our 80000 user and 3000 groups LDAP server. We've been using the "standard" de facto multilayered P.PloneLDAP and its siblings since forever.
The small sites and the public sites are mostly unaware of the problems. The issues arises on largely used intranets. I've been tracing down the queries on the LDAP and the results were very surprising to me, I had to admit it. Let me explain.
Initial plugin config
My initial LDAP plugin config was to enable:
- IAuthenticationPlugin
- IGroupEnumerationPlugin
- IGroupIntrospectionPlugin
- IGroupsPlugin
- IPropertiesPlugin
- IRoleEnumerationPlugin
- IRolesPlugin
- IUserEnumerationPlugin
Sharing tab view
The sharing tab was the first stop with the debugging process. Let's say you search for a user/group, the plugin queries for it and the result return a large set of user/groups matching the search (because I've not narrowed enough the query yet, or because there are lots of users groups that match the query). Ok, the first query is normal and sane, but then the plugin tries to verify that each and every returned result are ok and performs a query for each of them. A simple operation that could be done with one result could scalate to hundreds of useless queries.
Then, the sharing view uses AJAX… if you are slow enough to type the user/group name… You can provoke a DoS to your server yourself.
Use of getMemberById
The omnipresent getMemberById is the most interesting part. You have to face it sooner or later (by using plone.api or portal_membership or its friends) if you want to retrieve any user properties, and it's widely used everywhere. However, every time it's invoked triggers the PAS pipeline making lots of useless queries only to retrieve a bunch of user information like display names or emails.
Groups and recursive groups (both IGroupsPlugin type)
This is by far the most heavy plugins. Recursive groups (the lambda icon) is a plugin activated by default in PAS that allows to have nested groups in Plone, localy... and in every group activated plugin. The former one is the one responsible for the feature that enables to grant permissions to an LDAP group.
So let's say we invoke a getMemberById for the render of a view, this actions triggers all the PAS pipeline, when it’s the turn of LDAP groups then the legit query is sended to know all the groups where the user is member. Then for all the groups in the response, each group is validated against (again) the LDAP. Let’s say I’m assigned to 30 groups…
Then it’s the turn of the recursive groups plugin that searches inside all the groups that the user has membership and queries every and each of them searching for more nested groups (if any). Thanks God our LDAP group structure is plain… Multiply. I’ve had more than 400 requests to LDAP for one single getMemberById. Insane.
Of course, the RAM cache will do its job and maintain a fragile feeling of “everything is ok”… but it’s temporal, of course. However, the memory consumption of the Zope processes scalates quickly to 1.5Gb per day… forcing us to reboot daily.
Workarrounds and possible solutions
After my research, I’ve concluded that I had activated more plugins that I really needed and deactivate some useless (for my setup) ones, so I left:
- IAuthenticationPlugin
- IGroupEnumerationPlugin
- IGroupIntrospectionPlugin
- IGroupsPlugin
- IUserEnumerationPlugin
And disabled the recursive group plugin, as I do not need it at all.
I’ve been talking with some plonistas about their point of view (Asko, Ramon) and they told me that they have been struggling with these same issues before for the same scenarios. Asko shared with me some insights on how to fix them:
-
Sharing views could be fixed to ask LDAP details by AJAX in a way,
which wouldn't block Zope. (Medusa/Asyncore/ZServer related magic I've
blogged before). Of course, it'd still block with HA Proxy configued to
allow only fixed amount of request per instance.) -
PAS could be fixed to so that it'd pass through lazily evaluatable
iterators (the greatest blocker for this is that PAS currently sorts the
results and sorting prevents the usage of lazy iterators). -
python-ldap should be replaced with more modern library (which
preferrably would support the iterator approach).
I’ve been playing myself with some workarround for the getMemberById by using a paralel user property catalog based on plone.souper and repoze.catalog and maintaining it via events binded to user properties modifications and user creation/deletion. The default way to deal with searching users:
hunter = getMultiAdapter((portal, self.request), name='pas_search')
fulluserinfo = hunter.merge(chain(*[hunter.searchUsers(**{field: query}) for field in ['fullname', 'name']]), ‘userid')
using PAS is triggering all the pipeline for each result returned… This should be somehow re-thinked. I do not know if the approach I’ve used is valid, but it’s a first idea.
Rework all the default views to use such alternate getMemberById should be done too.
Lately I’ve been studying pas.plugins.ldap but it seems it has a blocker issue:
that involves performance issues with large LDAPs.
To rewrite a simple plugin that worked with python-ldap to make the strict (and more sane) use of LDAP is other option.
What do you think? Have you ever faced that issues? If so, which are your workarrounds/approaches?
I thought that was worth to make notice of it and start planning fix some of the issues on the Plone roadmap.
Cheers,
V.
PD: Sorry if I’ve been too much exhaustive in this bloggish-like post!