Split the portal_catalog from Data.fs on its own storage

Hi all! :wave:t4:

For more than 11 years have this step already working (that's what's more frustrating tbh :confounded: ) on our website, but I can't remember how to reproduce it on a fresh new installation.

I searched around and only came to half explanations :confused: so at the end I did as much as I could and wrote it down on this repo:

There is a buildout.cfg that goes with it, so one can follow the steps and see if they end up with the same problem: the split Catalog.fs does not work/is being ignored :scream:

Any ideas on what I'm doing wrong? any pointers to any documentation that I missed?

I plan to contribute that whole README to Plone's official documentation (if that might encourage someone to help me :pray:t4: :smiley: ).

I know that it probably is not a good performance strategy nowadays, as using Solr or elasticsearch is far better, but as I'm having the problem right now, and I do need to get this fixed I'm reaching you all :smiley:

My company could even hire someone for a few days to clear it out as well if more in-depth investigation is needed :money_with_wings: contact me privately for that.

Could it be as simple as this?

Mounting an existing Plone site

Posted by Sergey Volobuev at May 10, 2007 - 04:45

I just have spent some time trying to mount an existing Data.fs in order to access data there and would like to share my experience.

There seems to be a little trick which seems not to be documented very well. Zope actually doesn't mount the "root" of object tree, which would be a behaviour similiar to Unix mount. What it does is mounting an object within that tree, which has the same name as the mount point. So if, for example, we want to access a folder called "mysites" located at the root of a ZODB we're trying to mount, we have to call the mount point "mysites" as well.

Also, in case we want to mount a Plone Site (or some other object different from Folder), there's a container-class parameter in <zodb_db> section.

See here: Multiple Plone sites per zope instance -- using separate Data.fs files for each one. β€” Plone CMS: Open Source Content Management

and here

Thanks, I did not came across this links, as they focus mostly on getting a complete Plone site on a different mount point, rather than using an existing portal_catalog from another mount point.

The not well documented trick, indeed we have something similar on our production server:

    <zodb_db portal_catalog>
        mount-point /website/portal_catalog:/empty/catalog/portal_catalog
        container-class Products.CMFPlone.CatalogTool.CatalogTool
        <zeoclient>
            server ZEO_ADDRESS
            storage 2
            name catalog
            var ${buildout:directory}/parts/instance/var
        </zeoclient>
    </zodb_db>

This /website/portal_catalog:/empty/catalog/portal_catalog is what is puzzling me.
Our plone instance is called website and I have few memories of creating an empty plone instance where to use the portal_catalog from, but this /empty/catalog/portal_catalog I don't get it :confused:

Any other ideas? :thinking:

Quick and dirty solution ...

  1. create the new empty plone site

  2. stop zeo and client

  3. copy Data.fs over Catalog.fs

cp var/filestorage/Data.fs var/filestorage/Catalog.fs

  1. start zeo

  2. start instance in debug

Check that the Plone site and the catalog are using the same db connection:

% bin/instance debug
>>> app.Plone._p_jar
<Connection at 7fc646501610>
>>> app.Plone.portal_catalog._p_jar
<Connection at 7fc646501610>

Remove the portal_catalog and add the zodbmountpoint.

>>> app.Plone.manage_delObjects('portal_catalog')
>>> from Products.ZODBMountPoint.MountedObject import manage_addMounts
>>> manage_addMounts(app.Plone, ('/Plone/portal_catalog', ))

Now the Plone site and the catalog are using a different db connection.

>>> app.Plone.portal_catalog._p_jar
<Connection at 7fc64652d6a0>
>>> app.Plone._p_jar
<Connection at 7fc646501610>

commit

>>> import transaction
>>> transaction.commit()

p.s. I don't know which is the reason, but after starting the instance I don't see the portal_catalog in the ZMI.

To solve it just open this url http://localhost:8080/manage_addProduct/ZODBMountPoint/addMountsForm and create again the zodbmountpoint via ZMI.

This is not a "good" solution, but IMHO it works.

Thanks for the reply, indeed I can see that they no longer use the same connection, but somehow something must be off:

If you look at the filesize of Data.fs and Catalog.fs one grows and the other stays the same: a plain plone site is around 6Mb, and after adding lots of content I make Data.fs grow up to 40Mb. No matter how much reindexing I add, or clear and rebuild buttons I click, Catalog.fs remains always at 6Mb :confused:

Search does work, so, somehow although the portal_catalog is mounted and using a different connection, data is still being stored on the main Data.fs? :thinking:

First
The reasoning using the portal_catalog on own mount point is about optimized caches only. There are no other advantages. Conflicts still happen. Backup/Restore get much more complex. Is it worth the effort?
With RelStorage and it's caching optimizations this vanishes mostly.

Second
Why would one have 2 Plone sites in one ZODB? It used to be a thing 15+ years back. It adds more problems than it solves.
Firing up an dedicated instance make much more sense The memory overhead of a new instance is just 10-16 MB. -> Caches are optimized per-instance.

Third
Cross-database references are tricky and so we had several bugs (in past, all solved AFAIK). I know some companies who had to fight those with some 1000€ consulting costs added b/c of those. I hope this is all past and there is nothing left in ZODB from Zope 5.

My Conclusio

  • Optimize everything else first.
  • fast RelStorage (PostgreSQL based) configured history free if you do not need ZODB Undo - which is anyway pointless on large sites with frequent edits.
  • ensure short transaction time (check custom code/add ons to not slow down your site)
  • profiling, profiling, profiling (repoze.profile/py-spy). Often there is some easy way to speed up a site. A vanilla Plone is not slow, if yours is slow it mostly addons, custom code/templates, ....
  • drop-In replacement for slow indexes (ZCTextIndex) like collective.solr or collective.elastic.plone.
  • do your front-end caching homework using varnish or paid edge caching SaaS providers.
  • ... and some more, dependning on your use case ...
3 Likes

Thanks @jensens for your answer, but you missed the whole point, sorry :sweat_smile:

My problem is that we already have this setup in production, but I can not re-create it from scratch.

Somehow +10 years ago we managed to create such a setup (a plone instance with a split portal_catalog) and it works as expected: data is stored on the Catalog.fs and portal_catalog ZMI works as well, but I can not reproduce it anymore.

The underline problem is trying to fix zodbverify errors. Data.fs is all fine, but no matter what I try on Catalog.fs errors are still there :confused: Clearing the catalog and packing was not helping either.

So my next idea is trying to re-create it from scratch, so I can attach a brand new, database to our production to get past these zodbverify errors.

We could move to another solution all together? yes, sure, but as you mention on your first point, our setup is complex enough, that I want to keep it the same good old configuration that we have until we decide to make the extra effort to do something else (Relstorage?).

2 Likes

I found the original documentation we used 10 years ago :tada:

I will see now if this document still holds true on Plone 5.2.8 :crossed_fingers:t4: wish me luck :smiley:

1 Like

Unfortunately it still does not work :person_shrugging:t4: maybe it has to do with Plone 4 -> Plone 5? no idea...

Instead I'm using a different, approach, silence the errors (by adding aliases) and see if that gets me where I want :tada: wish me luck :crossed_fingers:t4:

1 Like

So mystery is still around :ghost: :person_shrugging:t4:

While I got the database from py2 to py3, re-creating the Catalog.fs freshly new seems to still not work :person_facepalming:t4:

Anyway at least I can move to python 3 without having to find a solution for the Catalog.fs at the same time.

A clear and rebuild of the catalog is mandatory as well, otherwise all searches return nothing :sweat_smile:

You sure on that? I think you are forgetting the size the loaded code which would be way more than that.