Purging Archetypes objects from ZODB - migration to 5.2

I have long ago moved from Archetypes to Dexterity, but there are some lingering artifacts in my ZODB that I would like to purge. (I know these might not need to be purged, but I'll get to my reasons why I think they might still be a problem later). I've put in a lot of work in eliminating some of this content from my dbs already but there are a couple tricky objects that are difficult to find.

First, here is the zodbverify result for one of my dbs after I did some cleanup:

INFO:zodbverify:Done! Scanned 146823 records.
Found 280 records that could not be loaded.
Exceptions and how often they happened:
ImportError: No module named Archetypes.BaseUnit: 8
ImportError: No module named Product: 2
AttributeError: 'module' object has no attribute 'IPersistentExtra': 102
ImportError: No module named Archetypes.ReferenceEngine: 31
ImportError: No module named ATContentTypes.tool.metadata: 126
ImportError: No module named ResourceRegistries.interfaces.settings: 9
ImportError: No module named ATContentTypes.content.document: 2

This is after I ran a script that cleaned up a couple locations. This cleans up three areas:

  1. Removes any object from zope version control repository in portal_historiesstorage that raises a POSKeyError or is an instance of Removed
  2. Removes any object from portal_historyidhandler shadow storage that cannot be retrieved, is broken, or removed
  3. Removes any broken persistent utilities (same behavior as wildcard.fixpersistentutilities)

script:

import plone.api
from Products.CMFCore.interfaces import IMetadataTool
from Products.CMFEditions.ZVCStorageTool import Removed
from ZODB.POSException import POSKeyError
from ZODB.broken import PersistentBroken, BrokenModified
from zope.component import getSiteManager

def remove_bad_utilities(context=None):
    if context:
        sm = context.getSiteManager()
    else:
        sm = getSiteManager()
    subscribers = sm.utilities._subscribers[0]
    adapters = sm.utilities._adapters[0]
    if IMetadataTool in subscribers:
        logger.info('deleting subscriber: {}'.format(subscribers[IMetadataTool]))
        del subscribers[IMetadataTool]
        sm.utilities._subscribers = [subscribers]
    if IMetadataTool in adapters:
        logger.info('deleting adapter: {}'.format(adapters[IMetadataTool]))
        del adapters[IMetadataTool]
        sm.utilities._adapters = [adapters]

def remove_bad_histories():
    bad_repo_ids = set()
    tool = plone.api.portal.get_tool('portal_historiesstorage')
    for sequence in tool.zvc_repo._histories:
        for version in tool.zvc_repo[sequence]._versions:
            try:
                obj = tool.zvc_repo[sequence].getVersionById(version)._data._object.object
            except POSKeyError:
                bad_repo_ids.add(sequence)
                logger.warning('POSKey error on bad object from zvc repo: %s' % sequence)
            except AttributeError:
                if isinstance(tool.zvc_repo[sequence].getVersionById(version)._data._object, Removed):
                    logger.warning('Ignore removed object: %s' % sequence)
                else:
                    logger.warning('Unknown error: %s' % sequence)
            else:
                if hasattr(obj, 'aq_base'):
                    obj = obj.aq_base
                if isinstance(obj, PersistentBroken):
                    bad_repo_ids.add(sequence)
                    logger.warning('Removing broken object from zvc repo: %s' % obj.__module__)

    for bid in bad_repo_ids:
        del tool.zvc_repo._histories[bid]

    # remove deleted items of deprecated class objects from shadow storage
    deleted = []
    total_hids = []
    hidhandler = plone.api.portal.get_tool('portal_historyidhandler')
    for hid in tool._getShadowStorage(autoAdd=False)._storage:
        workingCopy = hidhandler.unrestrictedQueryObject(hid)
        if not workingCopy:
            try:
                tool.retrieve(hid).object.object
            except KeyError:
                logger.warning('Could not retrieve history id %s, removing from shadow storage' % hid)
                deleted.append(hid)
            except BrokenModified:
                logger.warning('Broken history id %s, removing from shadow storage' % hid)
                deleted.append(hid)
            except AttributeError:
                logger.warning('Removed object %s, removing from shadow storage' % hid)
                deleted.append(hid)
        total_hids.append(hid)

    logger.warning('Removing %d out of %d history ids in ZVC storage' % (len(deleted), len(total_hids)))
    for hid in deleted:
        del tool._getShadowStorage(autoAdd=False)._storage[hid]

So this cleans up quite a bit, and at this point the remaining objects do not seem to be accessible through the ZMI. How can I going about finding and removing them? Here is a sample of a particular record that could not be found related to ATContentTypes.

INFO:zodbverify:
Could not process unknown record '\x00\x00\x00\x00\x00\x01C\xb6':
INFO:zodbverify:'cProducts.ATContentTypes.tool.metadata\nMetadataTool\nq\x01.}q\x02(U\x12__ac_local_roles__q\x03}q\x04U\nadmin_ericq\x05]q\x06U\x05Ownerq\x07asU\x04DCMIq\x08(U\x08\x00\x00\x00\x00\x00\x01C\xfacProducts.ATContentTypes.tool.metadata\nMetadataSchema\nq\ttq\nQU\x05titleq\x0bU0Controls metadata like keywords, copyrights, etcq\x0cu.'
INFO:zodbverify:Traceback (most recent call last):
  File "/sprj/btp_zope_plone5/plone-btp-dev-02/buildouts/eggs/zodbverify-1.0.1-py2.7.egg/zodbverify/verify.py", line 58, in verify_record
    class_info = unpickler.load()
ImportError: No module named ATContentTypes.tool.metadata

portal_metadata tool would be the obvious answer but that has already been removed from the ZMI (for all 10 sites on this db) and the db packed. There was also something referencing it in the persistent utilities, but that also has been deleted. Where else might this object live?

I also tried using the alias_module function from plone.app.upgrade to render it harmless

try:
    from Products.ATContentTypes.tool.metadata import MetadataTool
except ImportError:
    alias_module('Products.ATContentTypes.tool.metadata.MetadataTool', SimpleItem)

Unfortunately this seems to cause the zcml condition "installed Products.ATContentTypes" to evaluate as True which leads to a host of other problems.

I know that this section of the migration guide says that the site may still work with these warnings. My concern is that some of these objects turned out to be problems that were not immediately noticeable. For instance, the history storage did not appear to be an issue until I went to portal_historiesstorage or attempted to edit certain pages. I'd much rather clean up the ZODB than risk leaving in stealth bugs.

I would not do it this way: there is an additional data structure (utility_registrations) relevant for utility managment and this, too, will need to be cleaned up. Use the official API to remove no longer required utilities. For this to work, some corresponding code may need to still exist. I have used fake modules to achieve this.

MetaDataSchema derives from OFS.SimpleItem.SimpleItem. This means that it is designed as a Zope publishing object and likely is reachable via the ZMI. You could try to use ZopeFindto locate those objects via theirmeta_type ('Metadata Schema'`).

You are right about the _utility_registrations data structure in the site manager, and through testing I confirmed that all remaining instances of that metadata tool are located in getSiteManager()._utility_registrations. However I'm not clear on how to remove this using "the official API". I assume I want to use getSiteManager().unregisterUtility?

This is on a trashable test environment so I did some playing around, pulling in a fresh backup restore. First, the results of sm.registeredUtility() contained UtilityRegistration(<PersistentComponents argylecc>, IMetadataTool, u'', broken object, None, u''). This is actually the class Products.CMFCore.interfaces.IMetadataTool but appear so be what I want

>>> sm.getUtility(provided=IMetadataTool)
<persistent broken Products.ATContentTypes.tool.metadata.MetadataTool instance '\x00\x00\x00\x00\x00\x01C\xb6'>
>>> sm.unregisterUtility(provided=IMetadataTool)
True

After commit and pack, it is no longer picked up by zodbverify! So that's one down at least

Esoth via Plone Community wrote at 2019-10-11 19:40 +0000:

You are right about the _utility_registrations data structure in the site manager, and through testing I confirmed that all remaining instances of that metadata tool are located in getSiteManager()._utility_registrations. However I'm not clear on how to remove this using "the official API". I assume I want to use getSiteManager().unregisterUtility?

That's how I removed atreal utilities in preparation for
an upgrade from Plone 4.2 to Plone 5.2 (those utilities are not
yet Zope 4 compatible). The parameter p is a portal object.

def atreal_cleanup_utilities(p):
"""remove atreal utilities."""
c = p._components
regs = c._utility_registrations
to_del = [r for r in regs if r[1].startswith("atreal")]
print (to_del)
for r in to_del: c.unregisterUtility(regs[r][0], r[0], r[1])

In my case, I run this cleanup step in a Plone 4.2 setup (when
the "atreal" packages were still available). You may need fake
interfaces and/or classes if the packages are no longer there
(maybe, "wildcard.fixpersistentutilities" can help alternatively).