we at der Freitag (www.freitag.de only in German, it is a weekly newspaper) built the functionality that users can follow others and get notifications whenever some actions happen (a new article from someone you follow, comments made on your articles, etc...).
We built it on top of zc.relation and it is showing its limits. Data points: around 1 million notifications are generated on a weekly basis.
It reached a point where we need to really rewrite it from scratch, so rather than doing the same mistake over again, we are reaching you all out to see if you have some tips for us, or if meanwhile (we created that back in 2012!) some add-ons or external tools are available which we could integrate.
So far I've looked at:
collective.notifications it is storing the notifications in a central plone.registry setting, which might work for low frequency notifications, but not in our 1M scale
starzel.votable_behavior which uses persistent dictionaries and lists, which again, do not scale well on our usage range
CastleCMS, which, maybe that was my bad search foo has nothing regarding following and or staring content, proof me wrong please!
quaive/plone.intranet which does have a network tool to store relations within users and within user/objects. But again, not sure if it scales enough for the 1M usecase, and last changes on that file are from 2016
That was my research so far as for publicly available add-ons. Did I miss any? Any suggestion on the implementation?
If we are about to rewrite it from scratch, we are thinking about creating an external service and create a REST API (GraphQL if we want to be cooler ) and then integrate Plone with it. But if someone else convinces us otherwise, we might as well do it within Plone and maybe if we decouple it enough, make it generic enough so that we can share it with the wider Plone community.
Interested parties that want this feature, by all means, reach out to us! We have the feature already, but as said, does not perform well enough, so if we team up, we can come up with a proper solution that can be fine not only for us and you, but for the wider Plone community as well
We have the resources to come up and code a solution, so if you want to be involved, all the best
Look at @arangodb where you can store documents, key-value pairs and relations as a graph. Mass data that changes frequently is just a improper use-pattern for the ZODB. We use Arango in Plone 4 for a "Gefahrstoffkataster" for the University of Saarbrücken.
Thanks! For what I see it does store nothing on the database but rather uses the request itself to pass who will get that notification.
I see that it is Archetypes only and meant to be triggered manually and the user itself deciding who to send the notification to.
We are using 5.1 (since a few weeks ), only Dexterity and our use case is tied to some generic options the user has (to decide how she wants to get the notifications) as well as that the receivers are decided automatically by the network of who follows who.
Adding all these changes to ftw.notification.base would be something you would accept as a merge request, or is far beyond your usage?
As I said on the first message, we would be more than happy to collaborate on rewriting our current code, if that rewrite is not from scratch and only for us, but already benefits someone else, all the best for all
I'm not sure if ifttt.com would ban us if we start sending them 1M requests per week for our online editors, that might help them (specially the social media possibilities) but for the notifications use case, IMHO, sending 1M requests to an external server is probably overkill
Thanks for the tip though! We have been thinking of using it every now and then at der Freitag, but not for the notifications
If you want to keep all in ZODB I would look at souper.plone as a base. This will serve for a 5-10mio case (storage wise) if used and configured correctly, probably more.
Otherwise a classical RDBMS like Postgresql works for sure beyond the 100mio+ case.
So, I don't know ArangoDB, but given it's graph capabilities (and the need to use them) that sounds like a good alternative.
When you say 1M per week, what is your actual bottle neck? If your model is users follow pages and an update on a page leads to an email? being sent? Then the bottle neck is perhaps storing and loading X number of email addresses and then the processing time of sending those emails. Bulk sending is easily solved with something like sendgrid. You can break it into batches, use a queue solution like p.a.async to send 1k to sendgrid at a time. It will do the mail merge for names etc for you too. Not hard.
Most of the tips here seem to be about storing the watch data... but is that really that much of a problem? Loading 1M small records in a dict, souper, or any of these solutions is not that bad, esp if you do in worker thread like using p.a.async so it doesn't blow your cache. And it seems that if people are watching pages then not everyone is watching the same page so the loads would be much less. But storing in redis or elastic etc is also possible just not sure I see the point.
So the last bottle neck is writes. But you don't say how often users update their watch preferences, just the send rate which are readonly presumably. For 1M sends a week I'm guessing you might only get 10k writes a week. If there is nothing forcing them into the same time period then there aren't many happing at the same time. If you use a btree then you reduce the chance of a conflict a lot since they would have to be in teh same bucket to conflict. If you use ajax then you reduce the change further since this time period of request is less. If you still have a problem you can use p.a.async to send all the writes into a queue so they are serialized since you likely don't have tell the user it was saved. or you can use haproxy to send all those writes to a single instance if you really need to tell the user it was saved.
The activity is definitely much more advanced than the ftw.notification.base. I see that you are storing all the activity on PostgreSQL, that was also our train of thought until @zopyx mentioned arangodb...
Thanks for the sharing this code, we will evaluate it
Thanks, but I don't think it makes sense, if we are about to generate 1M requests to ifft.com just to get back to us to be further processed. That's plain async jobs just done fancier through a website
If we ever allow users to configure themselves what they want to do with the notifications, that might be a good idea, i.e. expose (through the REST API) the notifications.
Our current use case is to either not store the notification at all, sent and email right away, store it locally (in the ZODB) and store it locally and send it as a digest notification once per day.
Wow, thanks for the thoughtful answer, quite a lot of questions
The most offending bottleneck right now, which is about to be fixed, is that although we already use async tasks (collective.taskque, thanks @datakurre!!) generating the notifications for all the users that follow the newspaper issue main author takes a lot of time (it has +30k followers). We are about to split those 30k followers in small batches.
The other most offending problem is that doing a cold start of the front page, without any cache, it literally takes minutes due to loading the relation catalog (we are using zc.relation as our backend to store the relations/notifications).
So, while we already have enough async tasks to split the load, the system keeps getting bigger and bigger, we are planning to add a few more features, and it doesn't feel that the zc.relation is a good place to store all those.
Out of these 1M notifications that I mentioned, when I looked once at the zc.relation storage, I saw that there were +600k stored there, so 1M is the current upper limit, and around 600k~700k are what is being stored.
As we are a weekly newspaper, on Wednesdays the newspaper articles are produced and all the notifications for all the articles are generated (i.e. ~40 articles x 30k followers +1M notifications within a single day already), which around 60~70% of them are stored on the ZODB.
So, the storage is indeed one of the main problems (or that we don't know how to tune zc.relation enough )
There’s a pretty good chance I’m missing something obvious but if you used IFTTT you would be giving your users more control over what they want to do with the notifications. Send it to their phone, queue it in their reader apps, turn on a blue bulb at their desk, speak it on their smart home speaker, etc
If you’re thinking of sending out emails right away, Mailgun could work as a way to offload sending those...
One complexity on our notification handling is that quite a lot of those notifications are not meant to be seen right away, but after some hours/days of delay, thus they need to be stored somewhere until they can be shown to the user.
Eventually, we could provide a private feed end point where then users could hook that to IFTTT
Related question, as you are quite promptly to answer: as you are pointing to IFTTT but do not mention anything regarding CastleCMS, do I have to understand that there is nothing of the sorts on it?
I was trying to answer your question, not sell you CastleCMS which has an internal subscription feature for people to indicate they want to receive messages on specific topics but the emails are prepared and sent manually, not automatically based on (say) changes of content. For automatic notification of content changes, you’d use Plone content rules.
Maybe you could use a queue like Redis to offload those notifications (that’s one component of CastleCMS that I can see applying for your use case)
I would seriously look at sendgrid or similar. We integrated it into singing and dancing but integrating into other parts of plone is not hard. it can use an SNMP interface which means you can send one email and just attach some extra headers and have to it turn into individual emails to 100k addresses with a single email send and all sent in a min or so.
Plus it helps with deliverability. and costs next to nothing.
Sounds like a bug that can be fixed. or via using another notification tool. Not sure why a catalog is needed for a relations but it seems overkill. A page load should just requiring loading watch status for the current user based on userid. so just load a few parts of a btree. You can even minimise that lookup by changing the code to use ajax so the page itself can be cached in varnish but the indicator showing if the user is watching the page is loaded in another request.... or don't show that indicator at all. Just show notification settings on another page.
create a python script that turns the special email address into a list of all the email addresses
hook this script into multimail and use it to batch send to sendgrid and let sendgrid do the mailmerge with names into the alert email
Worked so well we didn't use async in the end. Of course if you used async then the loading of the all teh email addresses would be isolated to a single instance so would help with memory management. But we don't send often.
So effectively you could replace your current alert system with a content rule or similar that sent a single email for a single content update or other event, and then use multimail to turn that into a single email to sendgrid with a header with the emails of the 60k relevant people.
So at the end we decided to build an external solution based on Django and the Django REST framework.
We thought about just using SQLAlchemy to interact directly with the database (from within Plone) but then thought that it might be much better to have it as a separate service (from Plone) so that it could eventually be queried directly from client browsers, or even from other backends: we have www.freitag.de (Plone) and digital.freitag.de (django) so being able to query that data from both servers/clients would be a plus.
If anyone is interested in this project, let us know. So far, we haven't done much but as we have to talk to our bosses if it would be ok to open source it, we still keep it on our private repositories. If there is any interest, that would be a big plus for us to convince our bosses to open source it