Alternatives to health check zope instances via icp in load balancer

frisi · December 11, 2017, 11:12am

i'd like to use zope's icp-server for health checks on haproxy (to save ram compared to additional zserver threads only accessed by haproxy)
seems that icp-server does not work as expected (see below). would like to start discussion or get some pointers how others to health checks on zope instances in their setups

motivation:

zserver-treads = 1
(to save RAM, second thread could load objects in cache (zodb-cache-size))
haproxy:
server instance1 127.0.0.1:8080 check 5s fastinter 2s downinter 2s fall 1 rise 2 maxconn 1

if you've got a long running request (longer then checkintervall + downinter * fall) on the zope innstance that occupies the only available thread, haproxy marks it as down

instead of setting zserver-threads = 2 (so the second thread can handle the health checks) i thought using
zope's icp server would be an elegant solution. (as i can be sure not to load a lot of objects into the cache so the second thread uses RAM that is better spent elsewhere)

setup:

plone.recipe.zope2instance allows to activate it via
icp-address = 8090

configure haproxy to use the icp port

server  instance1 127.0.0.1:8080 check port 8090 5s fastinter 2s downinter 2s fall 1 rise 2 maxconn 1

however, the zope instance is never recognized as up.
reason: zserver is not listening on the icp port:

on startup zope reports the server as running

2017-11-28T15:27:15 INFO ZServer HTTP server started at Tue Nov 28 15:27:15 2017
        Hostname: 0.0.0.0
        Port: 8080
------
2017-11-28T15:27:15 INFO ZServer ICP server started
        Address: 0.0.0.0
        Port: 8090

but telnet can't connect

$ telnet 127.0.0.1 8080
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.

$ telnet 127.0.0.1 8090
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused

if others can confirm this problem i'll write a ticket for zope2

icp doesn't work - what else to do?

i guess as long as i only check http://127.0.0.1:8080 with haproxy (zope root)
the number of objects loaded into the cache will be quite low (compared to check for a path within the plone site)

on the other hand a check on / does not really mean the plone site can be served by this zope server already.
therefore one would need to check /Plone/@@some_view
(option httpchk /Plone/@@some_view)

what do you use for health checks on your setups?

icp server (how/why does it work for you?)
zope with zserver-threads = haproxy.maxonn + 1
(/Plone, or root / for the check?)

or custom view that does not - or does - involve portal_catalog?
custom scripts outside of zope - as HAProxy version 1.7.14 - Configuration Manual suggests:

Using the "port" parameter, it becomes possible to use a different port to
send health-checks. On some servers, it may be desirable to dedicate a port
to a specific component able to perform complex tests which are more suitable
to health-checks than the application. It is common to run a simple script in
inetd for instance. This parameter is ignored if the "check" parameter is not
set. See also the "addr" parameter.

hvelarde · December 11, 2017, 8:21pm

I have a pending blog post on this topic and I think would be interesting for you; just let me see if I can finish it this week.

BTW, I never heard of the ICP support before, but I think you're misunderstanding it (or maybe I don't understood it at all): AFAIK, ICP is an acronym of Internet Cache Protocol, an UDP-based protocol used for coordinating web caches, that seems to be supported by Squid.

correct me if I'm wrong, but I think this can't be used as a health check port for HAProxy, neither other load balancers.

I'm going to try to divide my rationale in a couple of blog posts: one explaining why using the standard HTTP port in Zope is a very bad idea, and then how I successfully solved that using HAProxy and some additional configuration on a couple of sites.

Rotonen · December 11, 2017, 9:38pm

I'm looking into how to plug systemd sdnotify into ZServer. No promises on schedules.

https://www.freedesktop.org/software/systemd/man/systemd-notify.html
https://www.freedesktop.org/software/systemd/man/systemd.unit.html

hvelarde · December 11, 2017, 10:58pm

that would be nice to solve a different issue: instance start/stop/restart.

hvelarde · December 12, 2017, 2:22am

as promised, the first post on this issue:

djay · December 12, 2017, 3:01am

@hvelarde @frisi It's great when someone is already ahead of where you were starting to think Does your healthcheck hector take into account of broken DB connections?

Rotonen · December 12, 2017, 1:26pm

How come? I'm explicitly going after that for the watchdog functionality in systemd sd_notify.

https://www.freedesktop.org/software/systemd/man/sd_notify.html

hvelarde · December 12, 2017, 3:56pm

no, unfortunately I never have time do dig deeper on anything; creating your own probes is quite easy and that opens a whole new world of possibilities.

over the years the guys working with Zope developed many interesting solutions that are now almost forgotten because they are not well documented; that is sad and we need to change it.

what kind of problems are you willing to solve? I've never been aware of broken DB connections.

I was thinking on something to make interprocess communication easier; I had this idea but I was never able to implement it because my scripting knowledge is pretty basic:

github.com/Supervisor/superlance

Add hooks for memmon to run a command before and after restarting a process

opened 10:30PM - 26 Jan 17 UTC

closed 06:02AM - 06 Jun 17 UTC

hvelarde

We have the following typical use case: Plone site with multiple instances runni…ng behind a web accelerator like Varnish, using memmon and httpok. From time to time, memmon will need to restart an instance because of high memory consumption; Varnish has configured backend health check probes that test if the instances are available or not, and this probes have to play very nicely with the instances because requests hitting the backend are typically slow; for instance: ``` probe healthcheck { .interval = 10s; .request = "HEAD / HTTP/1.1"; .timeout = 3s; } ``` As restarting a Plone instance is a time consuming process (typically, around 30 seconds), Varnish will not notice the instance is down for some time and will continue sending requests; the instance then came in and will be flooded with a lot of pending requests from Varnish. The Varnish backend then behaves erratically for some time until the instance stabilizes. What I would like to have? a hook to run a command before the instance is restarted and after is marked as running. Then I would be able to configure something like this: * memmon detects high memory usage * pre-hook is run: `varnishadm backend.set_health instance1 sick` * instance1 is restarted * supervisor waits and marks it as running * post-hook is run: `varnishadm backend.set_health instance1 auto` This could be useful in other use cases also.

frisi · December 12, 2017, 4:06pm

thanks for your clarification @Rotonen.
i'm not (yet) much into systemd - so please apology possibly stupid questions

by skimming over sd_notify i see that this way we could start our zope instance with systemd and it will be able to tell when it has started up completely

what i'm asking myself is

A) (how) does that fit together with supervisord?
are you aiming to replace supervisord and startup the complete stack (zeo, zope instances, haproxy, varnish,) using systemd?

B) how can this be hooked up with haproxy?

i guess we'll need a script in inetd that asks systemd for the status and is accessible at a certain port for varnish

C) does/can zserver also consider to "warm up" the zope instance (similar to a script like the one @smcmahon provides in the ansible playbook ansible-playbook/roles/restart_script/templates/restart_clients.sh.j2 at master · plone/ansible-playbook · GitHub)

this question might seem unrelated but what i'm thinking of is that zope might respond to http://localhost:8080/ immediately but will take a couple of seconds/minutes (depending on size of the plone-site) to respond to http://localhost:8080/Plone (because this involves loading a lot of objects from zodb)

frisi · December 12, 2017, 5:24pm

thanks for your thoughts and input @hvelarde

you are right, icp is listening on udp only (tried with netcat -u localhost port) and therefore won't help a lot for haproxy health checks out of the box.

luckily you came up with a smart alternative for the aim of not needing a zserver thread for the health checks in your blogpost: five.z2monitor

i already considered to use it and port the Products.ZNagios and munin.zope based monitoring checks but forgot/oversaw that this can also be used for health-checks.
eagerly waiting for your 2nd blogpost hector so we can discuss/compare haproxy setups (pleas link it in this topic, too)

i saw that you are also trying to mark instances as down and up when restarting them with memmon. (https://github.com/Supervisor/superlance/issues/102)
i started to use memmon less and less because restarts tend to happen when the instances are neeed most (=under heavy load)
as the longest intervall is hourly (still very often), started to wrap memmon with a script that is run by a nightly cronjob

now i consider to completely replace memmon with https://github.com/plone/ansible-playbook/blob/master/roles/restart_script/templates/restart_if_hot.py.j2 as i need to define a cronjob anyway and this script can also handle haproxy up/down and warm up the zope instance:

background: on bigger sites it might take minutes for the zope instance to be ready to serve content. haproxy will see they are up (zope responds to localhost:8080/ in the health check, also your five.z2monitor probe will report "ok") but the first visitor's request to localhost:8080/Plone/ will take seconds/minutes to be served.

currently i need to do the following to restart a project's zope instances after an update/hotfix w/o downtime:

bin/supervisorctl restart instance1
ssh-tunnel to instance1 and call /Plone there to see if it's ready
repeat for instance2, 3, 4,...

i liked the idea of @smcmahon scripts that also warm up the instances after a restart by visiting a configurable set of urls.
(see https://github.com/plone/ansible-playbook/blob/master/roles/restart_script/templates/restart_clients.sh.j2)
i planned to port this to python using the requests api and install and configure it in my buildouts via zc.recipe.egg

iiuc this is where @Rotonen tries to improve things with sdnotify too, right?

of course - it would be great that all this is done out of the box with supervisor/memmon (and maybe systemd - i'm not just into the topic yet)

maybe we can agree on some best practices and join/coordinate efforts here instead of everyone doing his/her own thing.

use five.z2monitor for health checks. add the "OK" probe to this package or create another one (collective.haproxycheck?)
does the probe need to be smarter (eg. take care of warming up the instance)
use memmon (and add before/after restart scripts to supervisor) or restart_when_hot
(currently i'm in favor of restart_when_hot - see above)
startup instances (all at once) with supervisor and have a graceful_restart_instance(s) script
(still needed to restart multiple instances w/o downtime automatically, even if warming up is done in health-checks)

@smcmahon as you obviously dealed with similar problems and came up with your ansible scripts you might have some useful tipps or comments to share here.

@jensens as you've been using squid years ago (snowsprints ) and also do high-performance setups: maybe you can also share your knowledge on load-balancing (icp-server?) and managing multiple instances in this thread

Rotonen · December 12, 2017, 7:06pm

Correct, and that is one of the bugbears I am chasing. It'd be nice to bring very many ZEO clients up controlledly without completely trashing the system.

For resource limits I'd go with systemd slices, which are just a convenience wrapper over cgroups v2.

https://www.freedesktop.org/software/systemd/man/systemd.slice.html

Those use the OOM killer under the hood, but the rule for when to fire is something you can set up in an arbitrarily complex tree of settings.

You can also set things like CPU shares and IO limits and networking limits.

https://www.kernel.org/doc/Documentation/cgroup-v2.txt

Probably via the socket activated services mechanism.

https://www.freedesktop.org/software/systemd/man/systemd-socket-proxyd.html

The socket activated services mechanism is a replacement for inetd.

You can fire the "I am ready" notify whenever. This should definitely be considered as a config option when diving into implementing this in ZServer (or hooked into and handled on the side of whatever does the warmup).

Likewise the point of sd_notify - you can tell things when you are actually ready.

In a perfect world this would all feed into some service autodiscovery fabric which then registers the ZEO clients for HAProxy when they are actually ready to serve traffic (and likewise unregisters them upon failure). Not many hosting stacks are there yet.

hvelarde · December 12, 2017, 8:07pm

I faced the same issues as you and I (almost) solved all of them; BTW, warming up a Plone instance is not that hard: just visit the front page.

this is the script I use when I have to restart my instances when I'm doing maintenance:

[buildout]
parts =
    …
    restart

[restart]
recipe = collective.recipe.template
input = inline:
    #!/bin/bash
    for i in {1..3}
    do
        echo "disable server plone/instance$i" | socat /run/haproxy/admin.sock stdio
        ${buildout:bin-directory}/supervisorctl restart app:instance$i
        curl -o /dev/null http://localhost:808$i/Plone
        echo "enable server plone/instance$i" | socat /run/haproxy/admin.sock stdio
        sleep 2m
    done
    ${buildout:bin-directory}/supervisorctl status
    varnishadm backend.list
output = ${buildout:bin-directory}/restart
mode = 755

I'll give you more details in the next blog post; stay tuned

datakurre · December 12, 2017, 8:15pm

That sounds interesting. Could you explain all the things this makes possible? Ping if you need help with ZServer and asyncore.

Rotonen · December 12, 2017, 9:45pm

This does not actually enable all that much in the immediate scope:

The systemd service becomes flagged running only when the service is actually ready to handle traffic
Enables one to use the watchdog functionality of systemd.service

https://www.freedesktop.org/software/systemd/man/systemd.service.html
https://www.freedesktop.org/software/systemd/man/sd_notify.html

Having those in place will allow Plone to behave better in systemd environments and take the first baby steps towards certain kinds of orchestration schemes one can build with systemd.

One thing I already mentioned is the ability to build a dependency graph and have that resolve in the way you want it to resolve when you bring up the system. This can solve resource contention issues on systems where all the units previously immediately considered themselves succesfully launched and did not block each other from starting before ready.

One example scenario:

We use relstorage on PostgreSQL
ZEO server should not start before PostgreSQL is succesfully started
The first ZEO client should not start before the ZEO server is succesfully started
Further ZEO clients should not start before the previous ZEO client is up

And as said, in the perfect world the webfront stack is also somehow aware of this and does not serve to the ZEO clients before they've registered themselves as being appropriately up.

We're having an internal hackathon on 2017-12-22 .. 2017-12-23 where I was going to have a go at that.

I think I only need to:

Add an extras_require to setup.py for the feature
Guard for that at runtime
Fire the notify when ZServer is ready to handle traffic
Fire the watchdog ok every main loop

So one packaging change and two places where the code is used.

jone · December 12, 2017, 10:23pm

We are usually running two or more ZEO clients and HaProxy as load balancer.
In order to let HaProxy proactively know which clients are online, I've written a supervisor eventlistener ( https://github.com/4teamwork/supervisor-haproxy ) which manages the HaProxy backend states through a HaProxy stats socket.

When using it, I recommend to:

configure SIGHUP as supervisor stop signal, letting the ZEO client finish pending requests before shutting down
configure the supervisor startsecs large enough, so that the instance is actually ready to handle requests when the startsecs are over.

djay · December 13, 2017, 4:32am

If the instance restarts and can't connect to the zeo server for example due to a network problem or the zeo is also restarting. We have also seen it in ZRS scenarios where failover hasn't worked properly. It would be nice to have the ok response indicate every configured database is properly connected. I guess the downside is that if during a ZRS failover haproxy marked all the instances as down then you would get 503 errors. But I think you can get around that with frontend queueing in haproxy.

fredvd · December 13, 2017, 9:17am

If you want to get data out of zope without activating one of the main zserver-threads used for serving normal requests, take a look at zc.monitor / five.z2monitor. These packages were started by Zope Corp quite some years ago and are used by a number of monitoring add'ons for zope and they run outside the normal threads.

I recently found out about collective.monitor (https://pypi.python.org/pypi/collective.monitor) which bundles a lot of these. You could add your own plugins here or use existing statistics.

frisi · December 13, 2017, 4:00pm

thanks @fredvd. hector already pointed out five.z2montior. but mentioning collective.monitor was very helpful. good to know @bsuttor (kudos!) already did the work of porting/integrating munin.zope and Products.ZNagios checks to five.z2monitor probes

frisi · December 13, 2017, 4:17pm

@Rotonen thanks for your detailed explanations. i now have a much better idea of what this would enable together with systemd. the downside zserver does not yet support it and systemd is not available on all systems. so supervisor and pure python scripts still have their right to exist and are lower hanging fruit for the time being.

@jone thanks for pointing to supervisor-haproxy. iiuc this solves the question of @hvelarde about "Add hooks for memmon to run a command before and after restarting a process" (see ticket)
the downside compared to the restart_clients.sh script linked above is that you only can wait for pre-configured amount of time instead of just wait as long as it takes to complete the warm-up requests. but this could be added to the package too, what do you think?

Rotonen · December 13, 2017, 4:35pm

Systemd won the Linux init wars. The people whom are on something where this is not the default can also manage themselves (*BSD, commercial Unixen, NixOS, Gentoo - missing anything major [within the scope of production-oriented server OS / distro projects]?).

This is a part of the rationale for going after being a good systemd citizen as a project.

On the flip side this is also why it'd be an optional extra.