Extremely Slow to Change Workflow State

I'm using plone 5. Specifially 5.2.2 but I think this has been an issue for a few versions now.

I have imported lots of data. on one of my smaller sites I have 80k objects in total, and under a specific folder have 3k objects. When the import completes, the folders are not in a published state.

When I select "Publish" to publish this folder, I have to wait a significant amount of time before plone returns. Even after going into advanced settings and making sure that 'Include contained items' is unchecked.

a folder with 4,229 objects: 58.8 seconds

I would assume this would be a fast transaction - just one object is having a workflow transition. But it takes forever.

Anyone also experiencing this? Any suggestions on how to speed this up?

I noticed reindexObjectSecurity being called on every subobject.

<FSControllerPythonScript at /RFA/english/multimedia/content_status_modify>
Line 50
Module Products.CMFCore.WorkflowTool, line 252, in doActionFor
Module Products.CMFCore.WorkflowTool, line 544, in _invokeWithNotification
Module Products.CMFCore.WorkflowTool, line 610, in _reindexWorkflowVariables
Module Products.CMFCore.CMFCatalogAware, line 116, in reindexObjectSecurity
Module Products.ZCatalog.CatalogBrains, line 64, in _unrestrictedGetObject

Changing the visibility of a folder may affect the visibility of its descendants. To avoid surprises, it is therefore possible that a reindexObjectSecurity (or similarly spelled) is routinely called. This is recursively called on all descendants.

To understand what happens here, I would use profiling. If there is not yet a solution integrated with Zope, you can change the workflow in an interactive session (--> bin/{instance|client1} debug) and use Python's cProfile and pstats modules there.

1 Like

You used to be able to delay indexing with collective.indexing but I think that functionality was added to Plone some versions ago. It would be great to come up with a smarter way of reindexing that didn’t touch contained items except when necessary.

CastleCMS sends long running operations like that to a celery queue. That would be a nice feature to get into Plone.

There is an ancient extension called Products.QueueCatalog (or similar) which allows to perform operation for some indexes offline. Not integrated into Plone, but it should be usable with not too much effort.

For the record, you might find this link useful. It mentions Products.QueueCatalog and Products.PloneQueueCatalog:

I have meanwhile published dm.zope.profile on PyPI. It should help you to understand what the time is used for.

I'm digging this one back up. sorry.

Read the docs: How a folder’s workflow state affects its content — Plone Documentation v5.2

Thus, putting a folder in the private state is not a guarantee of security for any of its contents. Unless, of course, all the content has been made private, as well. This can be done in bulk and in a single step, as described in Advanced Control.

And now tell me - should changing a folder's workflow state kick off an "update security" task recursively on it's children?

I will take the position that the documentation is correct, and this 'feature' is actually a bug.

Anyone care to take on an opposing position? (That's an invitation for debate, not a challenge)

flipmcf via Plone Community wrote at 2022-5-13 21:03 +0000:

...
And now tell me - should changing a folder's workflow state kick off an "update security" task recursively on it's children?

I will take the position that the documentation is correct, and this 'feature' is actually a bug.

Anyone care to take on an opposing position? (That's an invitation for debate, not a challenge)

I would say: "this depends on the workflow":
it is normal for Zope that an object further down the hierarchy
allows more operations than its ancestors.
In this context, it would be normal that a private folder
can contain public content.
A special workflow would be necessary to ensure that
privateness of the folder should cause privateness of its content.

For the stock workflows, the workflow state affects only
the object itself not its subobjects. Therefore, workflow security, too,
needs only be updated for the objects, not its subobjects.
Because workflow security updates can be expensive, it is good
to avoid unnecessary updates.

If you use a workflow where a state change implies state
changes of subobjects, it is up to the workflow to update
the workflow security of the subobjects.

I agree with everything there. It does depend on the workflow. So….

Should the default folder workflow recursively transition child objects and update security settings?

I would say no. Save the time and cpu.

——-
The “advanced” workflow page does have the checkbox to transition child objects. The documentation says to use that if you need to “make a folder and all it’s contents private”. It’s still possible, even without designing a special workflow, it’s just not the happy-path through the UX.

By default permissions are acquired from parent objects, so children need to be reindexed if permissions changed.
We don't really have a good way to check for that, so on workflow change or permission change we just reindex it - at least that works (most of the time).

There's an issue for this here @@sharing causes reindexing of a possibly large amount of content without warning and without feedback · Issue #1270 · plone/Products.CMFPlone · GitHub . It refers to experimental.securityindexing, which aims to aid in this.

It would be awesome if you could try experimental.securityindexing out and see if that would still work!

Roel Bruggink via Plone Community wrote at 2022-5-18 18:14 +0000:

By default permissions are acquired from parent objects, so children need to be reindexed if permissions changed.

The permissions managed by a workflow are usually not inherited
but maintained locally at the object.

Therefore, I would provide a way to control whether changing
a folder state should update the security setting of its content.
When I have understood a former comment correctly, then this is
already possible.

1 Like

afaik when, say, the View permission is changed, reindexObjectSecurity is called so portal_catalog does/doesn't return the children. That should be the allowedRolesAndUsers index.

I’m not feeling confident yet on this discussion. I don’t know if I’m not explaining my case well, or if there is much more at stake than I think.

I will admit that it’s risky to change the paradigm “run security checks just to be perfectly sure”.

Starting from the documentation (see link above) but not using documentation as a foundation of the argument.

First, I have a question:
True or false: The most accepted way to change an object’s permission is through a workflow transition.

Most of my analysis has this truth a foundation. If it’s not true. I need to step back.

I think this is false, but not sure. I was unaware that an indexing took place. Well, of course the object being transitioned needs to be reindexed, but not it’s children. At least that the position I’m defending and the subject of this whole post.

For default content types and workflows, If a folder is made private, a direct-descendent child image will acquire the folders permissions because the image has no workflow state property. Acquisition rule.

Other objects in that folder with workflows assigned will retain their permissions.

So, my experiment is to use debug to change a workflow state on a folder to private, reindex only that folder, (no transition events) and see if the image becomes private.

If so, then there is no reason to reindex the image.

Other objects that do have workflows are not transitioned, so the user is not changing their security, and indexing is unnecessary.

Does this make sense?

From documentation:

  • However, any published content of a privatefolder (or even of any of its sub-folders) willappear in the site search, even for anonymous users.

Are these two statements at odds? Is the built in “site search” concerned about allowedRolesAndUsers

Is there other use cases I’m missing?

More importantly, from the docs:

Thus, putting a folder in the private state is not a guarantee of security for any of its contents. Unless, of course, all the content has been made private, as well. This can be done in bulk and in a single step, as described in Advanced Control.

Those docs are incorrect. Published pages inside unpublished folders are not searchable by anonymous users - nor other users who cannot view the unpublished folder.
Try English for fun :slight_smile:

This is why the indexing happens and why it takes so long :smiley:

It’s maybe an important distinction to be made between “workflow transition to private” and “Change View Permission”

They are closely related.

But workflow transitions encapsulate instructions to update view permissions.

If there were two UI paths: one for workflow and one for view permissions, I would think users would expect view permission changes to be recursive, but not workflow state transitions.

I still think workflow transitions should not reindex the children.

But I’m afraid it’s only my use-case, and I’m not seeing a bigger picture yet.

Also, the “indexing acquired attributes” argument seems strong. If I index an image (no workflow state) does that image’s metadata get written with an explicit view permission?

Or does a request of image metadata view permissions ‘miss’ the attribute and look at parent metadata instead?

(Acquisition hurts my head. Thanks Jim!)

Ha! Awesome!

Documentation -vs- implementation fight!

I love my job.

1 Like

Implementation always wins :wink:

2 Likes

Write the documentation first! :innocent:

I'll bet that the docs were written first, but didn't get updated after fixing the problem.