Metadata column "metadata_version" for conditional metadata refresh

Following idea:

  • In my policy package (which pulls in GS profiles of several extension packages), I define a catalog metadata column, say, metadata_version (which is completely unrelated to the version in metadata.xml).
  • The adapter to feed this column simply returns a fixed integer number, e.g. taken from the policy package's setupdata module.
  • Whenever one or more of my packages changes or adds metadata column definitions, I'd increase my .setupdata.METADATA_VERSION constant by one, and thus the value applied to the metadata_version brain attribute while reindexing.
  • Now, when reindexing objects, I'd be able to tell easily whether the catalog brain has the most recent metadata columns already (e.g. because it has been selectively reindexed in some upgrade step) or not.

Good idea?
Old idea?
Any remarks?
Thank you!

The standard catalog methods allow (only) to specify "do" or "do not" include metadata when reindexing. You cannot specify the inclusion of individual metadata columns.

However, this should not be a big problem: unlike for indexing, the metadata usually does not need expensive processing to determine the corresponding value; in addition, all metadata for a singe object are managed by the same persistent object. As a result, all this metadata is read/written always as a whole. Thus, individual update of metadata columns would not give you much benefit.

Yes, I know.

My idea is not at all about individual updates of metadata columns; I'd simply add an information about the over-all version number of the metadata.

Here is my usage scenario:
I have a site which can be viewed via different hostnames, and the visible contents of some folders vary slightly, depending on the hostnames.
Some objects are visible in all hostnames, some only in a subset.
Now I change something about the metadata which is used by the public views of those folders; in my upgrade step, it is perfectly sufficient to update the metadata of the objects listed there.
Without the metadata_version information, I'd

  • seek all objects for the first hostname,
  • update their metadata,
  • and when seeking the objects for the next hostname, I'd update the metadata again; but most objects have been updated before.

With the metadata_version at hand, I'd be able to skip the previously-updated objects easily.

Perhaps it turns out the metadata calculation effort really doesn't matter.
No problem: stop using it, and delete that column.
But perhaps it does make a difference; it depends, I'd say.

I changed the topic title; "selective" metadata refresh was in fact misleading.
It is really not about individual metadata columns but about an easy method to tell whether an object has the most current metadata configuration or not.

FYI, I have created a little package: collective.metadataversion; comments, contributions, help welcome.

1 Like