Who uses Workflow variables and indexes them?

Today I had to reindex some new indexes in one of my customer databases.
It contains roughly 50000 objects and reindex takes 360 seconds. Why?

360 seconds (several times) are enough to install and run Py-Spy. While doing so, I found lots of expensive/slow calls to portal_workflow.getCatalogVariablesFor(object) in plone/indexer/wrapper.py.

I do not use workflow variables at all!

For a quick speedup-check I edited the plone.indexer egg and removed the lines:

and

After that I reindexed the index again. And yes, it took "only" 236 seconds.

-> 1.5 times faster. (65% of the original time).

My question: Who uses Workflow variables and indexes them with this provided feature?

I even did not know this is possible.

I guess, they are rarely used. I plan to provide a short PLIP to remove the feature of automagical access to the variables. If one want to index them an indexer function should do it. Its more explicit/less magic.

I would propose a PLIP for Plone 6 to remove the autoaccess

Additional, a reply "WTF? Workflow variables?!?!" or "I never used workflow variables" or "I had no idea about this hidden feature" (or alike) would help me to figure out too.

Is this used for the review_state?

Yes, review_state is stored along with some other variables (https://github.com/plone/Products.CMFPlone/blob/56d33f2c6b9ccb95456d54729f11c63e308de6c7/Products/CMFPlone/profiles/default/workflows/plone_workflow/definition.xml#L221) stored on workflow transition.
By default we index only review_state in portal_catalog (https://github.com/plone/Products.CMFPlone/blob/56d33f2c6b9ccb95456d54729f11c63e308de6c7/Products/CMFPlone/profiles/default/catalog.xml).
There might be applications where more or all workflow variables are indexed in a seperate catalog - that is something I haven't seen so far.

A 'quick' win could be defering self.__vars until it is actually needed (ie move it out of __init__) + making sure review_state is only indexed on workflow change, so the need to load the workflow variables is removed until actually needed.

Alternatively we could set review_state on the content object on workflow transition, and index as usual. This also need a lazy self.__vars thingy.

I think this can be done in Plone 5, so it would not need a PLIP nor would this have to wait until Plone 6 to reap the benefits of Jens his awesome py-spy adventures.

What are your thoughts?

1 Like

I do not think this helps speed wise. The code will run for every failed index name anyway. If one object does not have an indexer wrapper for a given indexed attribute, then next the workflow variable lookup happens.

So I propose deprecation now and removal for Plone 6 and provide a plone.indexer named adapter "review_state" for the review state and drop transient support for workflow variables. This elemiates the risk of attribute clash with the other workflow variables action, actor, time, comments, review_history. If one in a custom dexterity schema would use this names and add an index to the catalog, without an custom indexer, strange side effects would happen.

To get the speed benefits in Plone 5 we can introduce a environment variable, i.e. PLONE_NO_TRANSIENT_WORKFLOW_VAR_INDEXING and if it is set the feature is dropped.

1 Like

+1 a memoized property will easily replace it. but IMHO it will be called quite often looking at: plone.indexer/plone/indexer/wrapper.py at e6b33b48d68b47ae884d2ed3f349bb82b99ece93 · plone/plone.indexer · GitHub

Not so sure that is so quick.

-1, I prefer your previous idea.

+1 if we just lazy define self.__vars: I do not see how this could break things.
It might be more a problem if we move

and the end of the __getattr__ method to decrease the change of computing self.__vars.
Current status is that if you have a review_state wf variable and a review_state attribute the former is indexed.
Moving that down will do the opposite.
A bolder move would be remove completely the __vars thing and add a review_state indexer.
That will be a big breaking change: if you want to index a wf variable you have to write another indexer.

It seems a good plan!
Also I wonder why it takes so much time to call getCatalogVariablesFor.
Another thing I noted: if you have multiple queue processor (e.g. membrane) indexers are initialized multiple times.
But those are other stories.

So, I skimmed the code and assumed it checked attributes of the object before it would check workflow variables, which made an ass out of you and me - but mostly me.
I'll take exit stage left and be on my merry way :slight_smile:

1 Like