Search by language without being a multilingual site

djay · May 5, 2021, 9:33am

We built a site that has a publications folder where many of those resources are translated. The site itself is not translated and they don't want a UI where users select the language globally. ie, they just want a custom search folder that lets users filter by various things include what languages that resource is available in.

We did this without p.a.m because it didn't seem suited. I'm curious now if p.a.m could have been use to achieve the same results?

btw, the site is Resource Search — MHCS

espenmn · May 6, 2021, 2:30pm

Slightly off topic, but your solution looks practical 'from the user'. For example: When searching for something in 'Omoro' ), I would also see the other languages ( Cue cards — MHCS ). Useful, since you see other languages spoken in the same country (Amharic, Tigrinja etc). I am not sure how to do that with pam (without manually linking / related content manually)

Interesting images, the husband is 'in the big city', while the wife is watching TV (?)

h2o · May 7, 2021, 1:24am

A client recently asked about this as well. We are using a third party service for translation so PAM is disabled and we just added additional languages in the control panel and added a language index.

jensens · May 7, 2021, 9:56am

I would add a language index and add it for collections to be searchable. Then use collectionfilter to let the user find its language.

Something different. It is probably not what you are looking for: An approach is to use pam, but enhance search with fallbacks. Therefore we wrote GitHub - plone/plone.app.multilingualindexes: Indexes optimized to query multilingual content made with plone.app.multilingual

djay · May 7, 2021, 10:34am

When you say language index do you mean a special kind of index? or just what we've done already. Have a field to specify the language and index that?

I looked at it but I'm still unsure if if I can use it.
I guess what I'm asking is can I install pam, have only 1 language and no language selector for the site. but still be able to use the pam UI to translate some specific content, and then have the indexes work in a way that I can use collectionfilter to search for a given language?

The solution I have now works ok but it does use some content rules to copy some data between which isn't great. Also it uses collective.fieldcollapsing which makes the paging inexact. Finally the work to translate are a little error prone. You have to create a subobject for each translation and its hard to tell if you have done all the languages or duplicates of a language.

tmassman · May 7, 2021, 11:04am

So you don’t want to have distinct objects for the translations? If so, maybe GitHub - propertyshelf/ps.zope.i18nfield: A zope.schema field for inline translations. is something for you. It hasn’t been updated in a while, but is is also working with Plone and used in at least two Plone 5 projects, e.g. pkan_dcatapde/src/pkan/dcatapde/content/dct_language.py at master · BB-Open/pkan_dcatapde · GitHub. Also, it was mentioned during world plone day by University of Dresden.

djay · May 7, 2021, 11:09am

Thanks. I'll have a look at it.
The current system is mainly pdfs and videos. So if it handles blobs then it could work. Does it also allow searching in any language?

tmassman · May 7, 2021, 11:34am

I think you could extend it to support namedfile fields as well, currently it is only TextLine and Text. Searching is then done in Plone using your custom indices. There is an index included for z3c.index which we used for a Zope 3 application.

jensens · May 10, 2021, 8:10am

Yes, just a FieldIndex.

I think it is not what you are looking for, I just dropped it in case someone with similar problems - but PAM in use - reads here.

djay · May 10, 2021, 9:03am

@tmassman it's a shame it doesn't have more documentation with it. I see it has its own index but I think that is just allowing you to search by languages available?
I'm unsure what happens in the case of textindexing. I assume it's not keeping a text index per language, and if it doesn't then doesn't the relevency get mucked up if all the languages get combined for one object in a single textindex?
One thing we were aiming for was to retain relevancy ranking when searching across languages and with a certain language selected.

espenmn · May 10, 2021, 10:05am

These are just some thoughts, so maybe completely irrelevant:

Would something like this be useful?

A control panel where one set all the available languages. All these languages get their own index (if more fields, maybe more indexes for each language
For a content type: A field for each of these languages. Maybe the fields could be 'predefined' data grid field with one line for each language
Search 'within' each language is just a search in the corresponding index. If nothing is found: search in (some of the) other indexes and show 'nothing found for your language, but we found these).

Unfortunately, Datagridfield and files might not work properly. Not sure if there is any 'workaround'.

tmassman · May 10, 2021, 10:55am

I know, I know, shame on me. At that time we were forced to cut all expenses for that project. That included docs and testing. Today I don’t care anymore what customers say about that... Lessons learned.

The index we used for the Zope 3 app containes BTrees for all the languages and is able to search the correct language (e.g. the user selected language). I don’t think there is something similar in Plone yet. The “magic” happens here: ps.zope.i18nfield/src/ps/zope/i18nfield/index.py at master · propertyshelf/ps.zope.i18nfield · GitHub

    def doIndex(self, oid, value):
        """Index a value by its object id."""
        if isinstance(value, I18NDict):
            for lang in utils.available_languages():
                lang_val = value.get_for_language(lang)
                if lang_val is None:
                    continue
                index = self.get_or_add_index(lang)
                index.doIndex(oid, lang_val)

It iterates over all available languages in the system and indexes the corrsponding content for that field in a given language in the matching sub-index. Searching then happens again in the sub-index for the given language.

jensens · May 11, 2021, 8:45am

I do not think its a good idea. This would bloat the catalog and indexing time. And does not scale well.

An idea I had >10 years and before PAM was a thing, but was never realized, is to store "micro"-objects with the pure data attached to main objects (the one visible), like i.e. an OFS.Folder containing SimpleItems). Index contains the micro-objects with its language, for display always some magic in catalog needs to return the main (content) object. An accessor on the main object - very like attribute access in Dexterity already - then fetches the language version based on the current negotiated language from the matching micro object - or if not available from the configured fallback (best chained). On mutator call the micro-object is created or updated with the new field information.

Different languages need different URL (b/c SEO etc.), so a traverser or a different domain or a post-fix is needed per language: Foo.com foo.com/++de++/some/path foo.com/++en++/some/path (++ can be skipped, here just to visualize), foo.com/some/path/de, foo.com/some/path/en.
Ian nay case the some/path (the ids path) are not easy to translate).