Indexing: How to index 'contained items'

I have a content type: "Company", which contains items (in subfolders) of type 'Ship'.
I want to index 'how many ships' the company has.

What is the best approch for this?

This is not ChatGPT :wink:

What is you current idea and why doesn't that work?

I have 'Folderish Content type A', with items 'Folder A1', 'FolderA2', and 'Folder A1 & A2' contains several 'Ship' Content types

So for 'Folderish Content type A' I want to index 'count of ships'.
I am wondering what is the best approach.

I could update (parent) A1 somehow, every time changes are done in 'FolderA-something'. Maybe by an event handler. This does not 'feel right'.

If A1 is 'aware' of changes done in the subfolder, things would be better(?). Then I could have an event handler on A1 instead. Or maybe an indexer would be enough.

Also, I am not sure how the different approaches would work with published / unpublished content (at least if it is a folder, since items inside it might be publised)

Also: I don't really need a field for this, maybe this will matter (so not 'every parent folder' needs to be (re)indexed

Make a combined query by path and portal_type.

The indexer itself is quite easy, but I am not sure 'how to call it'.

In other words, if I put this on 'Folder A' (type) it works when I save it (but not if I add more items to it):

def antallIndexer(self):
    count  = len(plone.api.content.find(self, portal_type="skip"))
    return count

UPDATE: By 'not when I add more items to it, I mean 'the subfolder'. If I had only 'Folderish Content type', with 'items in it', an indexer is enough, but not when I have folders within folders

One possible approach could be to add the Collection behavior to your Folderish content type.

The Collection behavior includes the necessary fields: query and item_count among others.

There you could (as suggested by @zopyx) set a query by path and portal_type in the query field and read the item_count field.

Modify your view/template to show the item_count and you are done.

I am not sure if I get it. If my folder structure is

ContainerTypeA
    FolderB
        ContentTypeC (1)
        ContentTypeC (2)

I can do the indexing on ContentTypeA (that is already working) when I edit ContainerTypeA.
If I put the 'ContentTypeC' inside 'ContainerTypeA', the index is updated when (for example) another item is added.

But, when the items are inside FolderB, the index is not updated when they are added. Maybe I can add an indexer on 'Folder' that loops up 'parent ContainerType A' (?)

Update: Maybe something like this is 'good enough'

@indexer(IFolder)
def antallIndexer(self):
    while self.portal_type=="Folder":
        self = self.aq_parent
    if self.portal_type=="myfolderishcontenttype":
        antall  = len(plone.api.content.find(self, portal_type="skip"))
        return antall

Why do you want to use an index? If the only reason is to benefit from the events handled by the index this can become complicated if your content_type (Ship) is in different places. You must then make sure the events go to your "main" folder (Company?).

Why not simply let the Company Folder query how many Ships are around? If performance is not a critical issue and there are not thousands of Ships to be counted this would be the easiest way and less prone to failure too. And you'll avoid a event handling nightmare.

Please read answers.

As stated earlier, you query by portal_type='Ship' and path='/plone/path/to/company'.
There is zero need for a custom indexer or a custom index.

2 Likes

If the numbers are low, there are not much problems in calling the catalog multiple times to get the ship info. The indexer above will work only when saving the top folder, not when adding a new ship. So, the indexer should have to do a query to count ships with len(plone.api.content.find(self, portal_type="skip")), and then you've to reindex the top folder every time something happens below.

Let instead use a general approach on object databases.

So the problem is: when I add a ship in a subfolder, how do I reindex the topfolder? You've the IObjectAdded event, for example. But what if the indexed value depends on ship values (maybe you want to count also green ships and so on)? Then you've the modified event. What about removing a ship? Delete event and so on. But this can be tricky over time and there's no real gain in doing it.

There's not a ready to use solution in Plone. The most similar query problem is something that SPARQL queries can solve. But that means to have a different model in indexing content. If you can serialize Plone content with triples (we can), then we can use a SPARQL python library with an underneath database that can handle very complex queries. Catalog queries could be implemented with simpler SPARQL queries (the bigger problem is then how data is represented, for example dates, integers, missing values).

So the only sustainable solution in the Plone world now is to perform a query on the catalog on the fly and count the results, thus cycling on them if you need more detailed info.

Whatever you decide to do, do not use a catalog query to return the value of a catalog index :wink:
You might have unwanted surprises there!

Almost true: the indexer works if adding items (ships) directly into it, but not in the subfolders.
It looks like it is possible to put an indexer on 'Folder' and let it update its parent's index, not sure if that is 'Ok', or if an event handler is better.


PS: I will use this with collective.collectionfilter, so I need an index.
I will also have a 'from' to 'filter', so it is possible to show Companies with '5-20' ships.

I think I will need it for collectionfilter, alternatively, I would probably have to add a field to 'Company', and make it hidden (?)

Thanks

I wonder why I answer questions here?!

1 Like

I think he means you just need a new entry in the catalog indexes/metadata, which index an object property/method. So no need to create a custom indexer, just an index.

Note: if you don't want parent values to be indexed instead of child values, you can use this strategy: http://www.derstappen-it.de/tech-blog/plone-prevent-unwanted-indexing-of-child-objects-thru-zope-acquisition

Note to @all: 21. Views III: A Talk List – Mastering Plone 5 development — Plone Training 2023 documentation mention creating an @indexer but does not explain it later... @stevepiercy @pbauer

I thought without an indexer one could only index fields. Is that wrong?

Ps: I have been used to using a custom inexer for int fields with colletive collectionfilter (they only works when they are strings)

Another approach is to register an IObjectMovedEvent for those folderish content_types that are allowed to have "Ships" and then to have a subscriber that re-counts the "Ships" and updates your ship count whenever a "Ship" has been added or removed.

IObjectMovedEvent (a super-type of IObjectAddedEvent and IObjectRemovedEvent) is fired when an object is added to, removed from, renamed in, or moved between containers.

Your subscriber could store the count is some place, e.g. in a field "ship_count" of your "Company" folder.

The "Company" folder (or folders if more than one) could be programmatically queried from the subscriber instead of hard wiring it in the subscriber.

1 Like

Would you please create an issue and assign it to @pbauer and @ksuess, or make a PR? They are the primary maintainers of that training. I lack familiarity of the topic to contribute. Thank you!

A bit off topic, but it looks like collectionfilter does not work without a field with the exact same name as the index.