How to get all "Subjects" used in a Folder (efficient)?

So we have a site using lineage, but the problem is present with multilingual sites as well: I want a vocabulary of all subjects (aka tags) used under a specific path.

Now its possible to get all items using a query and aggregate the Subjects from the results, but that is far from efficient.

Any other ideas? If not, then I need a good caching...

This product builds a tag portlet based on user-entered configuration and has an option to build a tag-cloud for a given section of a site, so perhaps you can reuse the code there, which also uses caching to save the data.

Edited: I forgot the link https://github.com/collective/qi.portlet.TagClouds

If waking up all the objects under the folder (witch I think as a deep nested structure) is out of question, the solution is to use another storage (e.g. a separate catalog, solr, a persistent object or even the annotations of the folder) to store the data.

Depending on what are you needs and your choices, you can meet several problems in order to manage edit/removals.

If this not viable the other solution is caching, which is actually storing your data in RAM.

I think a good way to get all the subjects under the path could be:

  1. getting unique valuse for the subject index, i.e. portal_catalog._catalog.indexes['Subject']
  2. making a search for every subject in your path and check you have some brains.

Untested code:

subjects = filter(
    portal_catalog._catalog.indexes['Subject'],
    lambda subject: portal_catalog.unrestrictedSearchResults(Subject=subject, path=folder_path)
)

In this way you will never iterate on anything, even if you will stress portal_catalog with lots of queries.

2 Likes

For the records: I came up with a solution operating directly on the indexes, using its IISets/IITreeSets doing direct intersection see https://gist.github.com/jensens/ba4945d46d0b7fa00a414c5ae18a1f82

This is the fastest way I found.

1 Like