Categorisation text to keyword in multiple languages

My (migrated from html) site now has a field called 'shiptype'.

The content of the fields typically is 'Dampskip, stykkgods steam ship, general cargo." (or I can get it from the html, where it is a
before 'steam ship'. The two first are norwegian and the two last are english.

I have a list of 'new categories from my customer', so I want to save this with keywords 'Steamship->Cargo->General Cargo' (and then translate it if user is norwegian/english)

Questions:

  1. Is Product.KeyWordManager 'the way to go'
  2. How can I set the keywords with a script (in the example above, I will need to add keyword 'Cargo' even if it is not in the origianl

PS: Sail ships (etc) will also have category 'Cargo', but I probabably will never need to 'show/search for them both at the same time' (so Cargo can be a sub-category of Steamship).

In my case I am importing either PDF documents or medical photography/radiography and I need to set keywords on the data. Some has been done by hand, but in most cases I need to check whether an entry for a list of keywords is in the searchable text, and depending on which fields the word is in, set it as a keyword.

Products.KeyWordManager is great, but if this is a huge data or messy data set you will want to script/automate it the process.

For this sort of scripting, I have found using plone.restapi the best tool for my purposes - it exposes everything I need, without doing anything risky,

HTH

Do you have 'subcategories'?
I am not sure how 'intuitive it will be for users to change it later if they are 'not grouped'.

Maybe an event handler on save could add 'Steamship' and 'Cargo' and "General Cargo" automatically when 'Steamship General Cargo' is selected.

PS: Is there any alternatives to Products.KeywordManager (from what I remember, its UI is a bit 'confusing'. Probably, I dont need the users to be able to add avalable keywords, the categories will probably 'almost never change'.

I think you both need Taxonomies collective.taxonomy · PyPI

4 Likes

The UI got an overhaul "recently" (2021, but for one of the oldest living addons it is)!
Since Dec.22 Plone 6 support is released too.
Check it out PloneKeywordManager maintenance - #4 by flipmcf or just install it.

1 Like

The sub-category question is interesting.

In the PDF library use case the documents are regulations, which have associated structured metadata (Year, Country, Category, sub-Category). In that case, I have added these as dexterity fields, manually added an index for each, and I use plone.restapi to manage the data, which originates in an external database.

In the veterinary medicine use case, the index data was presented in a partially hierarchical manner ("radiographs, abdominal", "radiogaphs, thorax" etc.), and there we used the individual words as keywords, as e.g. "abdominal" will appear in other contexts.

In the former case, a taxonomy would have been the ideal tool - I'd like to play with that. The veterinary index data is messier - a subset of it is in hierarchical form, but not enough to provide enough value to the user warrant changing our approach.

I have found the keyword manager tricky in the past - but that was a while ago, and the site in question had many keywords with spaces and hyphens, which seemed to be an issue. Most recently it has worked well for me, but that was for a trivial task so YMMV.

My customer 'had to think about that', so currently I am not sure yet.

Thanks, I guess that is what I need.

Currenty, I am waiting feedback from my customer about if:

Motorship -> Cargo -> General Cargo
Sailship -> Cargo -> General cargo

Should be filterable etc together or separate for 'Cargo'.

Thinking about, I could mabe just make a custom indexer and change that if they change their mind… (?)