Dependence on RelStorage

Someone brought this potential issue to my attention last year, that maybe our dependence on RelStorage is a risk.

Remember: I'm not a deep technologist, so I am probably getting some of this wrong.

When we deploy Plone, we usually think of the easiest deployment, then going up in complexity (but greater performance):

  • a single ZODB file, Data.fs, and zinstance process
  • multiple ZEO clients talking through a ZEO server process reading & writing a Data.fs
  • RelStorage reading & writing Python pickles to a PostgreSQL database (theoretically can perform way better since you can have lots of RelStorage processes reading & writing to the database, and you can replicate the database and get geographical distribution for failover or faster local reads)

If we look at who's been committing to RelStorage, it has been mostly one person since about 2015: Jason Madden

The contributors dashboard shows that he started contributing in 2015, with a sort of hand off from Shane Hathaway

OK, OK, what's the risk?

Almost every high performance deployment of Plone uses RelStorage. RelStorage is maintained by one person who isn't a member of the Plone community (though technically he is a member of the Zope Foundation, according to his GitHub profile).

If he were to stop maintaining RelStorage, we would have a crisis on our hands.

What do do about this risk?

I'm not sure. I just wanted to point it out, because no one was talking about it (at least, not within my hearing).

Another Gripe

There may be nothing to do about this particular gripe, which is that it seems wasteful, not to mention deeply opaque, to be using a SQL database to store Python pickles.

At least one I know of does NOT use RelStorage, but if I told you which one, I'd have to :skull_and_crossbones::skull_and_crossbones::skull_and_crossbones::skull_and_crossbones: you

:wink:

Relstorsge is probably near feature complete. I bet if Plone foundation reached out to Jason he may have some insight on what would need to be done to get it feature complete. The tech stack is well trodden. No innovation is being done.

4 Likes

A few comments:

  • The table schema in RelStorage is relatively simple and easy enough to understand if you know the relationships between transactions, object ids, etc. It really isn’t more complicated than FileStorage, not that anyone is that concerned in general use with the internals of either.
  • Debugging weirdness is easier with FileStorage handy vs. RelStorage
  • Shipping RelStorage from prod to staging/dev is more work, and often involves zodbconvert round trips, but that is not unreasonable if you get benefits from using RelStorage.
  • ZEO has its own risks, and has to have a copy of your code (things can drift, and this can get messy), and ZEO isn’t going to provide you robust security over the network unless you use stunnel or similar.
  • RelStorage is better for scaling out and network resilience (e.g. system maintenance, taking a server node down, backups, etc); Plone’s performance profile often requires scaling out.
  • On the subject of opacity, ZODB is deeply opaque, and Plone stores (IMHO) way too much state in all sorts of places within objects we persist; that’s an entirely different subject, but I think looking at the lower-level storage is not really the place to improve on opacity — the complexity of that storage fades into the background, unlike leaky abstractions higher in the stack.

On balance, I think I prefer RelStorage to ZEO for large-ish deployments, and have used both for them. The entire ZODB ecosystem is all pretty settled stuff in maintenance mode, unless you really want new things (I can think of a few cases here around BLOBs, but not much else).

3 Likes

The problem is not storing Python pickles since have a century in the ZODB. The core problem of Plone and Zope is actually the usage of the ZODB at all. Not switching to a RDBMS or a document storage is one of the core architectural mistakes that Plone made (at least 15 years ago).

Is it possible for Plone to switch to using a SQL database? How does the Django ORM work? Could something like it be written for Plone or Zope?

That would basically mean reimplementing the entire Plone backend to make a clearer separation between application logic and a data access layer. Fortunately @robgietema has already done that for us: https://nickcms.org/

1 Like

yep, but Rob's track record... the jury is still out :joy:

Short answer: impossible. The ZODB and Plone heavily depends on the magic happening on the persistence layer in Zope and in the Persistence base class in particular.

Interesting... because nickcms implements a compatible API, I was able to use Ploa with the demo.

@davisagli

1 Like

Put it ironically: First replace the frontend with a non-python framework, then replace the backend too with a non-python framework. That could be the end of Plone (classic and non-classic).

With all due respect: What will happen when future frontend react programmers ask themselves why to maintain a backend in another framework than nodejs?

1 Like

You could fall back on the idea that Plone is a set of features, which is not the same as the language they happen to be implemented in.

I'm afraid that the word Plone is nowadays kind of an elastic concept with very different meanings (backend aka Classic, Volto etc.).

One of the definitions of Plone is (from an architectural point of view) based on the REST-API. From this perspective I agree that Plone is a set of features (the entry points of the REST-API). Departing from the REST-API one can implement Volto, Nick etc. and do not care about the language used.

But (as of today) the REST-API hides and ignores lots of features of a powerfull ecosystem behind (Plone in its tradicional pre-volto meaning) that is writen (almost) entirely in Python.

I'm not sure whether all the features of Plone (backend and down to zodb) could be implemented in another language.

Maybe not all features of current Plone are important or useful enough to worry about.

When a potential customer is evaluating Plone, they almost never care how the database storage is implemented (because the decision makers are usually not techies, sadly), until they find out it's not a SQL database (and, no, RelStorage does not really count), and then it becomes an issue.

The last time I looked at Plone vs Django CMS or Wagtail, Plone was still way beyond those in terms of content editing and management and workflow. But Django CMS and Wagtail have the benefit of reassuring potential buyers because they use SQL databases.

With this in mind, “the CMS Plone” would be dead and “Plone” would only stand for a protocol to be implemented between client and backend. “Plone - The Protocol”…sounds cool :upside_down_face:

For certain usecases where you only need the Plone content and its metadata for the public part of the site (which is basically our standard usecase in the remaining Plone projects):

  • we export the Plone content and metadata as JSON
  • we export related secondary content and metadata as JSON (we have some publishing projects with XML and related formats on external storages)
  • we generate Pydantic models directly out of the various JSON models using datamodel-code-generator
  • we generate related Pydantic models automatically for `sqlmodel` for storing the data and content in Postgres (as native column types, not as JSON)
  • we import the JSON into Postgres
  • the backend is being implemented as a slick FastAPI application
  • the backend API is pretty generic and small
  • the frontend can be implemented with any technology you want

I implemented a proof of concept for one of our larger projects where we want to decouple the public part of the application completely from Plone. After two or three days of vibe coding, we have a PoC that makes the Plone content and secondary content available through FastAPI without touching the Plone backend.

1 Like

Sorry but this has to be explained. How SQL databases, without a deep knowledge of tables, can be more informative than object documentation? Also consider that it is very easy to do bad queries nowadays, and different SQL vendor support can be problematic.

Look the the __dict_ of some Plone object. You will see a wild mixture of object metadata, data related to the ZODB…not even speaking of data hidden in various annotations. This is completely intransparent in comparison with a well-defined RDBMS model. Not saying that a SQL model can not be complex..but at least we can query any data in a “reasonable” way…every tried to search for certain values in persistent objects hidden in annotations or some field outside the objects metadata scope and schema definition?

You cannot query an SQL source without a deep knowledge of the application. An object data model has it documented directly, while with SQL you need to describe it outside the application, for every application feature. That's why application based on SQL omits the data model and document only the API calls.

But I agree that having some tool to query the data can help to find out where the data is, how it works, and data extraction. Most of application does not need a complex data model, you can use mongo, redis or similar. Elasticsearch, Solr does not use SQL, and they aim to search for data. It depends on the application.

For Plone, being an object model, would be problematic to use a "relational database" that aims to structured data and permanent recording of operations. For the paychecks, I would use SQL. For a CMS, a relational database can be a wrong data model, ORM exists exactly because of this.

Do those customers really specifically ask for a "SQL database"?
Do they really understand the difference between object-relational and relational mappings?
What do they expect from a "SQL database"?
Do they really want to make SQL-queries on the complex object model of Plone/Zope/Dexterity? Really? What do they expect to find?
What do they actually answer when they are offered a specialized object-relational database?

I'd suggest to careful think about that. Because decision makers are not techies we change our technology? How about explaining why do they need a specific technology?

And ... If they are not techies. Why do they don't like when they found out that there is a PostgreSQL database? What are their arguments against "RelStorage"?