I have to track down performance problems of a Plone site.
There are some potential performance bottlenecks involved in our side which are discussed in this great blog post: How we got a 10x Performance boost at Radio Free Asia
But I suspect our server setup to also contain some bottlenecks (using a mixture of Docker, ZFS, NFS, QEMU VMs, etc)
Previously we were using munin to monitor our servers.
While working on a new monitoring solution I tried out Prometheus:
The results are promising so far. There is a node_exporter project which exposes a lot of useful system metrics which are collected by Prometheus. Also Prometheus was specifically build with cloud setups in mind, so Docker integration is also a thing.
But there is a lot more work to do: tracking performance of individual services like Varnish, PostgreSQL/RelStorage, haproxy, ... down to some concrete Plone performance metrics.
So what are other folks using to monitor their server environments and specifically Plone performance?