Hello everyone,
I started implementing Elasticsearch 8 support for collective.elasticsearch. I called it 6.x branch.
Currently, it still supports Plone 5.2 and Plone 6 and Elasticsearch 7/8. I also merged some older bug fixes and features into this branch.
It’s the backbone for all search/list/query-related requests in our Plone-based ecosystem.
So I do have a genuine interest in this package, and I’m happy to maintain it. Thanks @ericof and others for putting in the effort so far ![]()
Would it be possible to get maintainer access on pypi for the package? My pypi username is “maethu“.
I would like to get some releases out there! You can ping me if there is an open issue or PR you would like to see resolved or merged.
To the AI part:
I have a POC of a RAG Q&A, basically using the current architecture and hooking up an external/local embedding model and an LLM (using openai lib).
If you run collective.elasticsearch with Redis, the indexing is handled by a worker and does not bother your Plone instance much anymore. The same happens with the embeddings.
+------------------+ +------------------+ +------------------+
| Plone CMS | | Redis/RQ | | Elasticsearch |
| | | Worker | | |
| - REST API |---->| - Embedding |---->| - Chunks Index |
| - Subscribers | | Generation | | - kNN + BM25 |
+------------------+ +------------------+ +------------------+
|
v
+------------------+
| Ollama/Mistral |
| - Embeddings |
| - LLM Chat |
+------------------+
With ES 8, you can store and query vectors, which makes it possible to use it as a store for all vectorized data. The SearchableText is already a plain-text version of whatever needs to be indexed, and it is already in a good format for the embedding service. ES 8 can extract text from nearly everything.
Example request:
curl --location 'http://localhost:8080/Plone/@rag-ask'
--header 'Accept: application/json'
--header 'Content-Type: application/json'
--data ' {
"question": "What is the address of the tax administration?",
"top_k": 5
}'
Response:
{
"@id": "http://localhost:8080/Plone/@rag-ask",
"answer": "The address of the tax office is: \n\Somestreet 100\n12345 City\n+00 00 00 00 00",
"question": "What is the address of the tax administration?",
"sources": [
{
"chunk_index": 0,
"path": "/Plone/to/a/contact/in/plone",
"score": 32.188305,
"title": "Address of tax administration building"
},
{
"chunk_index": 3,
"path": "/Plone/path/to/news",
"score": 20.34761,
"title": "New tax administration building address"
},
....
]
}
At the last Plone conference, there was a lot of discussion about Plone being late to the game. Maybe this is a step forward toward having a “plugin” that provides at least an entry point, without an external vector DB.
I’m wondering if there are others out there who would be interested in pursuing this potential approach to AI integration in Plone.
Cheers,
Mathias