Catalog entries from other Plone object

We generally run several Plone sites in one Zope, across a small number of mounted ZODBs. Recently I noticed that some catalog entries appear to have been indexed into the wrong portal. For example /db01/foo would have records for /db02/bar/*. These were all valid paths but obviously the completely wrong Plone site. I'm at a loss as to how this could even happen.

I wrote a diagnostic tool to be run by a cron job to hopefully isolate the case into a specific window of time should it happen again. It appears to be rare though - we have about 100 sites on this Zope and only one of them had this issue.

def sites(self, root=None):
    if not root:
        root = self.context
    data = []
    for site in root.objectValues('Plone Site'):
        setSite(site)
        site_path = '/'.join(site.getPhysicalPath())
        cat = site.portal_catalog
        bad_paths = [path for path in cat._catalog.uids if not path.startswith(site_path)]
        data.append({
            'site': site_path,
            'bad_paths': bad_paths
        })
    for folder in root.objectValues('Folder'):
        data.extend(self.sites(root=folder))
    return data

Maybe our use case multiple Plone sites on one Zope is atypical. The only thing I can think of is something happens with catalog optimization maybe, where two updates are run simultaneously and mistakenly use the same portal object. Is there even an existing test layer that lets you build multiple Plone sites easily?

Modern Plone versions can access tools (such as portal_catalog) as local utilities (rather than via acquisition). Should you have requests which traverse into more than a single "site", then it might be possible that the wrong local utilities are used; in particular, it might be possible that objects are indexed in the wrong catalog.

1 Like

Hmmm, thank you I can see two possible cases where that might have happened. I have some that code checks for updates across multiple sites in a single transaction, and sends emails. This shouldn't write to the catalog though only read (my understanding is a read can cause queued indexing tasks to execute, but this should only happen within a single transaction).

My other case is I have written helper code to run specific portal_setup upgrade steps across multiple sites that all have a certain profile installed. Depending on the nature of those upgrade steps I could see this maybe being an issue - perhaps trying to sync upgrades in a single request is just a bad idea?

This is a scenario which may trigger a request caching bug of Products.ZCatalog. Due to the bug catalog searches may return results that belong actually to a different catalog. Should this be your case, the errors would be temporary (and occur apparently non-deterministically); the catalog data is in fact correct - the problem comes from the request cache delivering stale data.

1 Like