Automatically strip spaces on keywords and case insensitive search


we've got some custom content types with the behavior plone.categorization. If a user adds new keywords to such a content type he/she is able to put a space before or after a keyword. This happens if he/she uses copy and paste. Theses spaces will not be stripped when the content is saved and end up in the catalog as e.g. " research" instead of "research".
Where can I override that behavior so that all keywords are checked and stripped before saving and indexing them to the context?

Ans as a second question: How can I use plone.api.content.find(Subject = (...)) to search for content independent of the case of the keyword. So Subject = 'research' should output the same result as Subject = 'Research'.

Thank you.

  1. this is clearly a bug or a misfeature. Keywords must be stripped - either on the UI level before processing and/or in the backend
  2. keywords aka "Subject" tags are case-sensitive. Perhaps a custom indexer normalizing the keywords would solve the problem.

It seems this happens if our customer copies multiple keywords at once into the field. If you write a; b;c into the field it shows up as one tag but after saving it automatically splits into the tuple ('a', ' b', 'c'). Our customer uses this feature because he sometimes has lots of keywords and does not want to write it into the field one by one.

I think the best idea would be to add a feature to support pasting multiple keywords at once. Then the frontend (javascript?) should recognize a keyword consisting of ; and split it up automatically.

You also can create keywords with trailing spaces when you type for example "mykeyword " and hit Enter. After saving the spaces are still there.

Probably this is a bug used as a feature.

I often have my own Subject indexer. In my case by combining different fields (there are public displayed and functional/for collections usage only Subjects). Here an example form a project. Modified something like this may return stripped and lower-cased values for the index.

# from utils:
def keyword_index_items(obj, attr, behavior=None):
    obj = aq_base(obj)
    behavior = behavior(obj, None) if behavior else obj
    if not behavior:
        raise AttributeError(u"Not a keyword-indexable type.")
    # keep order, no set here!
    return getattr(behavior, attr, []) or []

# ITagging is a behavior providing a tags field for public display
def tags_subject_indexer(obj):
    result = set(utils.keyword_index_items(obj, "tags", ITagging))
    subject = obj.Subject()
    if subject:
    if not result:
        raise AttributeError("empty tags")
    return tuple(result)


1 Like

Looks interesting. I never created my own indexer or IndexType for the catalog.
However in my case the plone site is already in production and I am unsure if I should change that now.

I think the best way for me would be to check the Subject field before saving using some event that fires between save and commit, or by somehow intervening the form submit request. Then I can write a script iterating through all the content with the plone.categorization behavior and clean the keywords up.