Varnish and login page

yurj · January 12, 2022, 2:48pm

If you use varnish, you should not cache /login because of the came_from parameter. Just add this:

    if ( req.url ~ "/login/?($|\?)" ) {
        return (pass);
    }

in the vcl_recv.

@fredvd worth including this in plone.recipe.varnish?

fredvd · January 12, 2022, 3:02pm

I would assume/expect the login page to have the correct Caching headers set using plone.app.caching when it is served from Plone so that the page doesn't end up in the cache. If it doesn't have these headers its a bug we should fix in the default caching profiles in plone.app.caching , it shouldn't be another exception/condition in Varnish....

Maybe the change from login_form to login was never added here?

yurj · January 12, 2022, 3:18pm

I don't know. It happens also with login_form, so I think something is missing. Where should this be set?

fredvd · January 12, 2022, 4:09pm

I just tested this in a customer Plone 5.2 site where I still have access to the old 4.3 version as well, both Plone sites with the default plone.recipe.varnish setup (6.0.0b3 & 6.0.0b1) and Varnish 6.0.x LTS. Varnish first does a cache MISS on a newcame_form varation in the URL and on a reload of the same url return a cache HIT. So the url_params are taken into account correctly and new variations are sent to the backend. The came_from is always filled in.

Did you modify any of the VCL? Which Varnish version are you using?

fredvd · January 12, 2022, 4:10pm

In the caching control panel there i a tab where you can add 'non content' views to a caching class (like /sitemap) but I don't think it is relevant a.t.m. for the issue you have.

yurj · January 13, 2022, 8:14am

The came_from parameter is hidden in the POST form, inside the html output of the login page:

<input id="came_from" name="came_from"
           value="<the actual came_from>"
           class="hidden-widget" type="hidden">

Let me explain that it is the login page itself cached with the parameter inside, the page /login does not contain any url or post parameter. The parameter is set in the view (login.py), so the first time it is rendered as the fist came_from. The second one, the page is in the cache because nobody told Varnish to not cache it.

I mean, if you go on a page and click "login", do you see this hidden input field correctly filled? You've before to do the same on a different page, so login page get cached. Then go on another page, click on login, and you'll see the cached login page, with the former 'came_from' value in the hidden input field.

In the caching control panel there i a tab where you can add 'non content' views to a
caching class (like /sitemap) but I don't think it is relevant a.t.m. for the issue you have.

I think a rule for login in Plone should be available, also because login could contain SSO urls with came_from as link parameter, so caching is not possible because the html output of the login page depend on the HTTP_REFERER:

github.com

plone/Products.CMFPlone/blob/83137764e3e7e4fe60d03c36dfc6ba9c7c543324/Products/CMFPlone/browser/login/login.py#L116-L116


      
          came_from = self.request.get('HTTP_REFERER', None)

Even adding a special rule for varnish, the result would be a cached login for every HTTP_REFERER, which I don't think it is useful.

But I'm not an expert on this, so my solution was to exclude the login page from caching using vcl.

fredvd · January 13, 2022, 10:58am

You are correct that there would be a cached login for every REFERER. But that is exactly how Varnish should work if it sits before Plone.

I'm still responding on your first suggestion, if it would be valuable to exempt /login from any caching as a general rule in the default varnish.vcl that plone.app.caching generates. I think it's not:

We are still talking about GET requests, (Varnish by default doesn't cache results from incoming POST's) and the url params are part of the cache key that gets calculates to store several variants of the page. Another variable is for example the Language header in the http request if that could vary on the same request URL. As long as it is deterministic:

http request(url=bla, x=1,y=2,z=3) -> (varnish) --> (always same cached page as result)

it's no problem.

The primary reason for Varnish to cache anything, is to decrease load on the Application server backend, because it has to dynamically calculate the page. When you add this rule, all login attempts will double the login request load on the backend. But most login pages will referred from from only a few pages in your website. It's not like 1000 users will log in from 1000 different pages on most Plone sites.

Even if the login functions as a paywall on a news/research site, only the current 20-30 top articles will cause 30 variants of the login page to be cached by Varnish for for example an hour. But this is still marginally memory load for Varnish and Varnish is still much better equipped to deal with serving the static variants than that Plone would have to dynamically generate the login page for every /login GET.

Caching on GET url params is highly useful: without it, any click on the 'next 10 news item' on your News overview page (which probably is a collection) would cause a batching calculation hit on the backend server, when the 11-20 news items view can be a very popular page and is the same for everybody. Also: all those variants of the Collection batches get evicted from the cache after 60, 30 or 20 minutes depending on how 'fresh' you wat the news overview to be.

Varnish operates as a black box under a number of rules to create a high cache HIT ratio with use of a certain amount of RAM and feedback from cache headers on results served from the backend. If you as a developer are sure that you can further optimise the HIT ratio by (micro)-tuning cacheability on what comes back from the backend because those pages would use up most of the the available RAM Varnish has, that's a very good reason. But my estimate considering the above that it is not needed as a default rule for Plone sites.

If you do wan to add it to a project, the best place to do is by setting the cache headers on the responses in Plone using plone.app.caching . At the bottom of the caching operations page in the caching control panel you can associate different view methods with content categories that fall under a certain ruleset. It's a bit of abstraction/indirection there though.

item -> content class (content item view) -> rule mapping (Moderate Caching) -> set of caching headers on response.

You can still make use of the generated vcl from plone.recipe.varnish by adding your proposed login rule to the vcl_recv parameter in the configuration, which will get appended to what the recipe already generated for Varnishh its vcl_recv function.

yurj · January 13, 2022, 11:19am

I agree but actually there's no way to generate a GET to login with came_from as parameter (maybe javascript). So, when adding varnish in front of Plone, login get cached and the came_from functionality will lead to last cached came_from url and not actual.

So, I agree it is wrong to bypass the cache but I don't know any other solution now. I think this caching rule for Plone login should be available or the came_from functionality to be converted to javascript (maybe @MrTango has an opinion on this?).

I'll try to play with plone.app.caching. What do you mean by item -> content class (content item view) -> rule mapping (Moderate Caching) -> set of caching headers on response.? I thought to activate the moderate caching on "Content feed" and add login as template, is something resonable?

fredvd · January 13, 2022, 11:46am

If you hit a page/url that is private and you are not logged in, Plone will redirect you to the login form with the came_from as an url_param.

yurj · January 13, 2022, 2:01pm

Yes, but came_from is also designed to return on the same page (url) where you clicked "login".

Suppose you're an editor. You go to a page, then click login and then modify. Otherwise you have to login, search the page again, and modify.

I've opened an issue on CMFPlone. Thank you for explaining the varnish part and the discussion in general

github.com/plone/Products.CMFPlone

login browser view should not be cached by proxy cache because of came_from

opened 02:41PM - 13 Jan 22 UTC

yurj

01 type: bug

For a full discussion: https://community.plone.org/t/varnish-and-login-page/146…92/6 The login browser view html output contains, as hidden input, the parameter `came_from`. When you click on login on a public view, the `came_from` parameter in the form input hidden field is set to the url where you clicked "login". The idea is to login and go back in the same point you were before. If you put a proxy cache (eg varnish) in front of Plone and click on "login", the `login` page is cached , thus also the `came_from` input value. So next time you click on "login", you'll get the cached version (because the url is the same) that, after the successful login, will redirect you to the cached `came_from` wrong url, not the actual correct one. When you just click on "login" in a Plone url, the [login.py](https://github.com/plone/Products.CMFPlone/blob/master/Products/CMFPlone/browser/login/login.py) browser view will populate the `came_from` value to the http_referrer (which is the url you came from when clicking on "login"). This is different from hitting a private view and get redirected to login, because the came_from is in the url, thus the cache entry will be different and all works correctly. Use case: Suppose you're an editor. You go to a page, then click login and then modify to edit it. This simplify a lot the editor experience. If login would not honor the `came_from`, the editor would login, redirected to the home page, and then have to search the page again to modify it. This is Plone 6.0.0a2 but the same problem apply to every Plone version which supports `came_from` from the referrer.

fredvd · January 13, 2022, 4:36pm

I've opened an issue on CMFPlone. Thank you for explaining the varnish part and the discussion in general

You're welcome . Be wary of anyone who says that caching is simple or there is a 'quick fix' for a caching problem. I have only tried to explain what I know, let's continue on the GH issue and ask other community members if they can reproduce your issue.

Every time I have to dive into a varnish caching issue it takes at least multiple hours to first fully understand the problem domain again., then find the issue and afterwards checking possible edge cases I probably forgot about. Double the time if you also have to inspect the varnishlog, debug vcl, are dealing with a browser caching bug/irregularities or load balancers are involved as well.

yurj · January 13, 2022, 4:51pm

I agree. Products.statusmessages is a cache friendly way to pass messages (and its cookie is already allowed in plone.recipe.varnish as a bonus). A similar approach is good to handle this kind of issues in Plone.