How to run a full HTML validation after publishing an item?

hvelarde · June 22, 2017, 4:17pm

Continuing the discussion from GSoC Summer with Plone / Improving AMP support:

As @kakshay21 mentioned previously, we're working on collective.behavior.amp, an add-on to add support for Google's Accelerated Mobile Pages.

We have the current version in production on one site and we're are seeing many pages reporting validation errors, like this one.

After reviewing these errors we came into the conclusion that most of them are related with problems, not in the code, but in the process of creating the content.

We think validation of these pages and feedback to the person creating the content could be a good idea and, in our case, there are different options available, like a command line validator, a web based one and even an embedded validator that return the status on the browser's console (by adding #development=1 at the end of the URL like this).

The main problem I see is that we need to feed those validator with the whole HTML output of the page and I'm not pretty sure how this can be done.

I remember @vangheem and @tkimnguyen mentioned that Castle CMS has a similar feature for quality assurance.

How do you handle an use case like this?

tkimnguyen · June 24, 2017, 1:35pm

Indeed it does. You can see the quality check view at castle.cms/castle/cms/browser/content.py at ff02ed4d25d3d689348644d9d595ce16eb7c0db8 · castlecms/castle.cms · GitHub

It is invoked when someone wants to transition the item by clicking on the state left toolbar button.

It checks various things. If any tests fail, it shows the list of tests and their pass/fail indicator.

A manager can require that all tests pass before the item can be published:

hvelarde · June 26, 2017, 11:17am

thanks! I think this will not work in our case; according to the code you pointed out above, you're making some validations but they fall short of what we need (also, I'm curious about why you didn't run your validation using the canonical way on workflow transitions: using guards).

@kakshay21, while I was answering this I found an alternative solution using Cloudflare's AMP linter API as described below:

create an event subscriber and bind it to the publish transition and object modification events
check if the object has the AMP behavior applied and/or if it's published
send the AMP page to the endpoint like this (not using curl, but using the requests module as stated in Cloudflare's documentation):

curl https://amp.cloudflare.com/q/www.cartacapital.com.br/internacional/jean-luc-melenchon-o-esquerdista-que-sacode-a-campanha-presidencial-francesa/@@amp

the service will respond with something like this:

{
    "version":"1496670637476",
    "source":"http://www.cartacapital.com.br/internacional/jean-luc-melenchon-o-esquerdista-que-sacode-a-campanha-presidencial-francesa/@@amp",
    "valid":true
}

now we know if the AMP code is valid and we can show a warning message to the editor in case it's not.

tkimnguyen · June 26, 2017, 6:21pm

Well excuuuuse us for not guessing what you would need 1-2 years before you thought of it

Probably because we didn't want to trigger the transition just to validate what was there.

tkimnguyen · June 26, 2017, 6:21pm

...but more seriously, I pointed out the mechanism more as a framework for what you could add to.

hvelarde · June 26, 2017, 8:36pm

sure, AFAIK you're currently including the following:

content should not link to non-published content
text headers should be ordered

that's better than nothing but let's agree is not too much

it could be nice to find a check list of additional things that could be tested; title and description length, for instance, could be easily achieved but the way you implemented the validator makes it difficult to extend and not interesting for plain Plone users.

a Plone add-on could be a better idea.

kakshay21 · June 26, 2017, 8:51pm

seems achievable to me.
I don't know about Cloudflare but relying on another provider may seems expensive deal if Cloudflare chooses to increase price of the plans.
isn't it?

hvelarde · June 26, 2017, 8:59pm

I don't think so; Google is behind both AMP and Cloudflare.

besides that, we can't solve a problem we don't have yet.

djay · June 27, 2017, 3:31am

@tkimnguyen looks very nice. Does it have a plugin system? It would be nice place to put in some WCAG tests like table headers, image titles etc.
Also for as a seperate plugin Then we can all improve it

tkimnguyen · June 27, 2017, 2:28pm

https://github.com/castlecms/castle.cms is open to all and accepting PRs

hvelarde · June 27, 2017, 3:15pm

I personally have no interest on using Castle CMS; a Plone add-on will always be a better idea as it can be used on Plone and on any other distribution/fork of it.

Quality Assurance is a difficult topic and you have to take care of it on a granular base; AMP validation is just an example.

yesterday I stumbled upon the need of checking the lead image size against a maximum size (8MB) or Facebook will ignore it. I'm going to implement that on the relevant add-on.

having included metadata on Plone don't change that: it will always be easier and faster to do those things on add-ons.

djay · June 28, 2017, 6:10am

@tkimnguyen not to put it as bluntly as hector but he has a point. The "all in nature" of both quieve or castle prevent them being used by integrators. It just takes one thing that you don't want or one thing you need to add and you can't use it or you have to hack it. Don't get me wrong, I totally understand the reasons why... its much cheaper and faster to develop that way but it's just such a shame to see good open source not used. Some of the features in castle are badly needed in plone. Is the only option reimplementation?

@hvelarde would you consider making your new plugin modular so different checks on save could added via plugins? Perhaps content rules could be made to prevent a save on failure so that it can be used to install checks?

hvelarde · June 28, 2017, 1:06pm

I think this is only true at the beginning of the implementation; at the end it will cost you more to maintain a fork because you have an even smaller team to deal with updates and fixes.

I understand why those projects made those choices, as I've been following this for quite a long time. I think the Quaive approach is more justified than the Castle one; I read this yesterday on IRC (quote from @davisagli):

I concluded that I’m not comfortable using Castle for projects at this time since it uses a mix of forked Plone core packages without even any indication of where the forked repository lives

at the end this is all about how we criticize and how we deal with criticism; I understand @vangheem was tired of trying to keep development going on, but I also feel he was not being very open to criticism, constructive or not, so he decided to go on his own with this distribution/fork; now we all have to live with the consequences of that and we all still need to be open to more criticism.

as I mentioned on another thread, I prefer to have a heated discussion going on for 6 months than having to deal with bad decisions and frustration for over 2 years.

I think the quality assurance plugin is indeed a great idea, but the design of such solution using a plugin architecture is out of my knowledge, let's talk about this on a new thread.

what I was having in mind was only to add that check on sc.social.like, the plugin we use to deal with all social media integration.

djay · June 29, 2017, 2:52am

I agree with this. But I also understand why they have a different brand and why anyone with their right mind tries to avoid adding anything UX related to plone core.

I can completely understand the decision to avoid “design by who shouts loudest". Castle was a way forward with Plone UX where no other way was working. I’ve tried every way I can think of to get reasoned debate in the open on how to improve Plone UX. I have to say none of it has worked.
If wildcard can find a way to get the core-ish UX features from castle into base plone (without having to deal with crap from everyone with an opinion) then Plone would be better for it.
If they can do it in a way that doesn’t disrupt their Castle distribution, that is their brand, then they would be better off since then they only have to maintain the non-core things like elastic search integration, image handling etc.
If we could be supportive rather than make presumptions about the personality of people who actually do things then maybe we can all rise together.

tkimnguyen · June 29, 2017, 3:11pm

@hvelarde if you look later in the IRC discussion you'll see that I explained the origins of those eggs and David (my words) no longer has that concern.

hvelarde · June 29, 2017, 4:32pm

yes, I saw it; that's why I said that my rant made effect for good