Feedback Request: Plone + MariaDB Integration Idea for a Hackathon

Hi everyone,

I'm a developer participating in the MariaDB Python Hackathon. The goal of the event is to improve how MariaDB works with other open-source projects, and the organizers specifically mentioned Plone as a great candidate for an integration project.

I'm relatively new to the Plone ecosystem but I'm keen to build something that could be genuinely useful to the community.

From my initial research, I understand that Plone primarily uses the ZODB for data storage. It seems incredibly powerful and tightly integrated for Plone's internal operations. My core question is about what happens when you need to get data out of Plone.

My Understanding of the "Problem"

It seems like it could be challenging to use external, SQL-based tools for reporting and data analysis (like Metabase, Power BI, Tableau, etc.) directly on a Plone site's content. Because these tools speak SQL, the data in the ZODB isn't readily accessible to them, potentially creating a "data silo."

My Proposed Idea for the Hackathon

I'm thinking of creating a Plone add-on that acts as a bridge. The idea is an "exporter/sync" tool that would allow a site administrator to map Plone content types to tables in a MariaDB database. You could then run a process to push the content from Plone into MariaDB, making it instantly available for any standard reporting tool to query.

Am I on the right track?

This is where I would be incredibly grateful for your expertise.

  1. First, is my main assumption correct? Do the vast majority of Plone sites rely on ZODB as their primary storage?

  2. Is the reporting/analytics issue a real-world problem you or your clients have faced? Is getting data out of Plone a common pain point?

  3. Does an add-on like the one I described sound useful? Or am I missing a key point about the Plone architecture that makes this a bad idea?

  4. Are there any existing solutions for this that I should be aware of?

My goal is to use the hackathon to contribute something of value. Any feedback, corrections, or even completely different ideas would be hugely appreciated!

Thank you for your time.

Primary pivot here is whether this is “one time” or ongoing. Presuming latter, the best choice is to:

  1. Use Plone REST API endpoints
  2. Lean into the JSON you get, it is very rich; keeping it for consumption elsewhere is good
  3. That means, fundamentally, you want things like a JSON type column in RDBMS that you can later attach indexes to.
  4. If you need to have this be event driven, write event subscribers in Plone to make HTTP webhooks to other systems to poke them about that event.

IMHO, push is way more painful than pull. Write notifications for content lifecycle events in System B, have Plone make “heads up” requests to those endpoints, and fetch the content over Plone REST API. Making Plone mirror directly to your custom RDBMS schemas feels like a route to frustration; use an intermediary system.

All Plone sites do. That ZODB may be stored in FileStorage, or in RelStorage (which is not an ORM, to be clear, it is a transaction and pickle storage in RDBMS of your choice).

The “way of integration” generally is Plone REST API, and incidentally things like plone.exportimport for one-time exports.

1 Like

The Plone is tied to the ZODB, and there is no way to replace the ZODB with a RDBMS natively.

The ZODB can be used:

  • standalone
  • ZEO
  • Relstorage

For RelStorage, most people use Postgres afaik. But Relstorage only replaces the underlaying storage layer of the ZODB (instead of using FileStorage). The ZODB uses Python pickles both for Filestorage, ZEO and Relstorage and there is no way around this dilemma. It is just not feasible to build replace the ZODB (with pickles) with a native RDBMS integration or a NoSQL integration - tried a couple of times, always greatly failed due to the complexity of Zope and how the ZODB works.

Just forget it :slight_smile:

3 Likes

Thanks, @seanupton The "pull over push" model using the REST API is a much better way to think about this. That's super helpful guidance, I really appreciate you taking the time to explain it.

Thanks @zopyx , that's exactly the kind of direct feedback I was hoping for. Knowing that replacing ZODB is a dead end is a huge help—you just saved me a ton of time and frustration. Thank you!

Thank you all for the fantastic feedback. My main takeaway is to use the official REST API for any integration.

With that in mind, would a more meaningful way to integrate MariaDB be an external service that pulls content from the Plone REST API (triggered by a webhook) to create a read-only copy in MariaDB?

The sole purpose would be for analytics and reporting, leaving the live Plone site untouched. Does that sound like a more sensible and useful approach or does it sound like I am just forcing mariadb just to create something for hackathon?

My feeling is “probably” – There is a reason why previous projects, like contentmirror/wiki/Introduction.wiki at 833b478ea845034f70ec82255ba527432bc9f701 · kapilt/contentmirror · GitHub failed to achieve traction. But maybe the situation is different these days.I think Plone RestAPI provides any tools you might need to extract the data, implementations will vary.

1 Like

Thanks for the feedback, especially the history of contentmirror . That’s a great reality check.

For this hackathon, I'm still keen on creating a project that connects Plone with MariaDB.

Knowing the pitfalls of past projects, my question is there any ongoing project or any current area where I can research and find a problem where I can integrate MariaDB in some meaningful way? Like an enhanced logging system, a read-only data source, or something like that

We are using Plone connected to MariaDB with a usecase where we trail (create, update, delete) events in Plone with a full copy to SQL using “AfterSaveHooks”.

We mirror dexterity objects into json in MariaDB (with added metadata columns UID, zodbpath, user and when which are standard plone metadata). Both UID and zodbpath are unique for a ZODB object

In Plone we then visualize “who did what when” based on Read from the same jsons and metadata.

We have 1.000.000+ objects from Plone/ZODB we send to MariaDB this way.

We have ideas how this can be generalized into a community approach for both trail and reporting and are happy to take a meeting with you if you are interested.

When I use the word Trail is “same” as you call Logging.

1 Like

This is exactly the kind of meaningful project I was hoping to find for the hackathon.

Just to give you some context, I'm new to the Plone ecosystem specifically, but I have a strong development background and I'm a quick learner. Your project to create a generalized "trailing" system sounds like the perfect opportunity for me to dive in and contribute.

I am interested to know more about the ideas and work on it.