Extremely Slow to Change Workflow State

I'm using plone 5. Specifially 5.2.2 but I think this has been an issue for a few versions now.

I have imported lots of data. on one of my smaller sites I have 80k objects in total, and under a specific folder have 3k objects. When the import completes, the folders are not in a published state.

When I select "Publish" to publish this folder, I have to wait a significant amount of time before plone returns. Even after going into advanced settings and making sure that 'Include contained items' is unchecked.

a folder with 4,229 objects: 58.8 seconds

I would assume this would be a fast transaction - just one object is having a workflow transition. But it takes forever.

Anyone also experiencing this? Any suggestions on how to speed this up?

I noticed reindexObjectSecurity being called on every subobject.

<FSControllerPythonScript at /RFA/english/multimedia/content_status_modify>
Line 50
Module Products.CMFCore.WorkflowTool, line 252, in doActionFor
Module Products.CMFCore.WorkflowTool, line 544, in _invokeWithNotification
Module Products.CMFCore.WorkflowTool, line 610, in _reindexWorkflowVariables
Module Products.CMFCore.CMFCatalogAware, line 116, in reindexObjectSecurity
Module Products.ZCatalog.CatalogBrains, line 64, in _unrestrictedGetObject

Changing the visibility of a folder may affect the visibility of its descendants. To avoid surprises, it is therefore possible that a reindexObjectSecurity (or similarly spelled) is routinely called. This is recursively called on all descendants.

To understand what happens here, I would use profiling. If there is not yet a solution integrated with Zope, you can change the workflow in an interactive session (--> bin/{instance|client1} debug) and use Python's cProfile and pstats modules there.

1 Like

You used to be able to delay indexing with collective.indexing but I think that functionality was added to Plone some versions ago. It would be great to come up with a smarter way of reindexing that didn’t touch contained items except when necessary.

CastleCMS sends long running operations like that to a celery queue. That would be a nice feature to get into Plone.

There is an ancient extension called Products.QueueCatalog (or similar) which allows to perform operation for some indexes offline. Not integrated into Plone, but it should be usable with not too much effort.

For the record, you might find this link useful. It mentions Products.QueueCatalog and Products.PloneQueueCatalog:

I have meanwhile published dm.zope.profile on PyPI. It should help you to understand what the time is used for.

I'm digging this one back up. sorry.

Read the docs: How a folder’s workflow state affects its content — Plone Documentation v5.2

Thus, putting a folder in the private state is not a guarantee of security for any of its contents. Unless, of course, all the content has been made private, as well. This can be done in bulk and in a single step, as described in Advanced Control.

And now tell me - should changing a folder's workflow state kick off an "update security" task recursively on it's children?

I will take the position that the documentation is correct, and this 'feature' is actually a bug.

Anyone care to take on an opposing position? (That's an invitation for debate, not a challenge)

flipmcf via Plone Community wrote at 2022-5-13 21:03 +0000:

...
And now tell me - should changing a folder's workflow state kick off an "update security" task recursively on it's children?

I will take the position that the documentation is correct, and this 'feature' is actually a bug.

Anyone care to take on an opposing position? (That's an invitation for debate, not a challenge)

I would say: "this depends on the workflow":
it is normal for Zope that an object further down the hierarchy
allows more operations than its ancestors.
In this context, it would be normal that a private folder
can contain public content.
A special workflow would be necessary to ensure that
privateness of the folder should cause privateness of its content.

For the stock workflows, the workflow state affects only
the object itself not its subobjects. Therefore, workflow security, too,
needs only be updated for the objects, not its subobjects.
Because workflow security updates can be expensive, it is good
to avoid unnecessary updates.

If you use a workflow where a state change implies state
changes of subobjects, it is up to the workflow to update
the workflow security of the subobjects.

I agree with everything there. It does depend on the workflow. So….

Should the default folder workflow recursively transition child objects and update security settings?

I would say no. Save the time and cpu.

——-
The “advanced” workflow page does have the checkbox to transition child objects. The documentation says to use that if you need to “make a folder and all it’s contents private”. It’s still possible, even without designing a special workflow, it’s just not the happy-path through the UX.

By default permissions are acquired from parent objects, so children need to be reindexed if permissions changed.
We don't really have a good way to check for that, so on workflow change or permission change we just reindex it - at least that works (most of the time).

There's an issue for this here @@sharing causes reindexing of a possibly large amount of content without warning and without feedback · Issue #1270 · plone/Products.CMFPlone · GitHub . It refers to experimental.securityindexing, which aims to aid in this.

It would be awesome if you could try experimental.securityindexing out and see if that would still work!

Roel Bruggink via Plone Community wrote at 2022-5-18 18:14 +0000:

By default permissions are acquired from parent objects, so children need to be reindexed if permissions changed.

The permissions managed by a workflow are usually not inherited
but maintained locally at the object.

Therefore, I would provide a way to control whether changing
a folder state should update the security setting of its content.
When I have understood a former comment correctly, then this is
already possible.

1 Like

afaik when, say, the View permission is changed, reindexObjectSecurity is called so portal_catalog does/doesn't return the children. That should be the allowedRolesAndUsers index.

Plone Foundation Code of Conduct