Reindexing or publishing takes ages

maartenkling · January 26, 2016, 12:48pm

If you have a site including 20k items and you need to reindex the catalog why would you like to wait 4 hours on that... (100% CPU burst) or you need to publish 400 items, five coffee isn't enough to get to the time.

I this this is the biggest problem Plone is having at this point in my opinion.
There is no way to explain a customer, yes we will install a update and you cannot do anything in your plone site for 4 hours from now on.

Anyone else got these problems and how do you handle them?

do3cc · January 26, 2016, 12:52pm

I never checked this myself, so it is partly lore, but some tasks can trigger a reindex multiple times. This is why collective.indexing helps. It will not only delay the indexing options to the end of the transaction, it is also able to see that it got asked twice to index one thing and will this only do once.

vangheem · January 26, 2016, 1:14pm

If you have to reindex the entire catalog, do it on a separate process that won't affect performance of the site. Also, you can split it up into smaller bits.

Reducing this number of indexes might help

At Wildcard, we are implementing, for larger sites, the ability to push all potentially long-running operations into a celery task using collective.celery. The user is notified that the task could take some time and that they will be notified when it is complete. Right now, this happens with pasting, deleting and moving large number of items.

vangheem · January 26, 2016, 1:20pm

Also, here are some columns you might be able to remove:

getId
Date
in_reply_to
sync_uid
total_comments
cmf_uid
commentators
in_response_to
total_comments
meta_type

it depends on what version of Plone you are on and what indexes you use on your site.

maartenkling · January 26, 2016, 1:33pm

In a highly active plone site including many writes this is still a problem, conflict errors will start the process over and over again

maartenkling · January 26, 2016, 1:35pm

Yes, so why are these not removed from Plone completely ? if there is a plip +100 on this
If Plone is slow people will drop the use of it and i think this is the case

gyst · January 26, 2016, 1:41pm

We use a comparable async approach in Quaive / Ploneintranet to execute the reindex in a separate worker instance. Of course that's equivalent to launching the reindex from a worker ZMI.

Additionally, you can greatly reduce the occurrence of ConflictError by batching your reindex operations with intermediate commits. A commit for a 100 object reindex after 5 mins is much more likely to go through than a commit for a 20k object reindex after 4 hours.

If you wrap that in an upgrade handler that is smart enough to detect which objects have been reindexed already, you can restart that upgrade multiple times until you're done.

vangheem · January 26, 2016, 1:56pm

maartenkling · January 26, 2016, 2:02pm

This is still sticking plasters.

When setting a value to 'randomX' and its takes 4 hours or more there is no way to sell this. In SQL this is done in 1second, update kekjo set value = randomx
Now its getting all the objects (takes to long) reindexing (takes longer) writing (traalalalalal) and conflicting and starting again.

gyst · January 26, 2016, 2:34pm

Have you tried a ReindexIndex('keklo') instead of reindexing all indexes for all those objects?

hammertoe · January 26, 2016, 2:52pm

I did some work related to this whilst at Netsight:

Optimising the Reindexing of Local Role Security in Plone

It is mainly aimed at optimising the reindex of the allowed_roles_and_users index in the catalog after you change the local roles of a container. But might be possible to make it a bit more generic.

One of the main issues in Plone is that if you have a path /a/b/c/d/e then when you reindex 'a' you cause b,c,d,e to also be reindexed. You then reindex b and cause c,d,e to be reindexed. You then reindex c and cause d,e to be indexed... etc. etc. So it can end up re-indexing the same object an insane number of times.

-Matt

tkimnguyen · January 26, 2016, 3:09pm

Nice!!! Seems worth hammering (um, sorry) on this to iron out any remaining issues, then give the package a slightly less frightening name than "experimental.securityindexing"

maartenkling · January 26, 2016, 3:15pm

experimental.securityindexing already is merged in to plone as i remember correctly

nice to see the active response btw!