we've got some custom content types with the behavior plone.categorization. If a user adds new keywords to such a content type he/she is able to put a space before or after a keyword. This happens if he/she uses copy and paste. Theses spaces will not be stripped when the content is saved and end up in the catalog as e.g. " research" instead of "research".
Where can I override that behavior so that all keywords are checked and stripped before saving and indexing them to the context?
Ans as a second question: How can I use plone.api.content.find(Subject = (...)) to search for content independent of the case of the keyword. So Subject = 'research' should output the same result as Subject = 'Research'.
It seems this happens if our customer copies multiple keywords at once into the field. If you write a; b;c into the field it shows up as one tag but after saving it automatically splits into the tuple ('a', ' b', 'c'). Our customer uses this feature because he sometimes has lots of keywords and does not want to write it into the field one by one.
I often have my own Subject indexer. In my case by combining different fields (there are public displayed and functional/for collections usage only Subjects). Here an example form a project. Modified something like this may return stripped and lower-cased values for the index.
# from utils:
def keyword_index_items(obj, attr, behavior=None):
obj = aq_base(obj)
behavior = behavior(obj, None) if behavior else obj
if not behavior:
raise AttributeError(u"Not a keyword-indexable type.")
# keep order, no set here!
return getattr(behavior, attr, ) or 
# ITagging is a behavior providing a tags field for public display
result = set(utils.keyword_index_items(obj, "tags", ITagging))
subject = obj.Subject()
if not result:
raise AttributeError("empty tags")
Looks interesting. I never created my own indexer or IndexType for the catalog.
However in my case the plone site is already in production and I am unsure if I should change that now.
I think the best way for me would be to check the Subject field before saving using some event that fires between save and commit, or by somehow intervening the form submit request. Then I can write a script iterating through all the content with the plone.categorization behavior and clean the keywords up.