Multiple Zope instances == slow buildout

Hi, sorry for the late response.
Here is a minimal example that i validated at least basically:

2 Likes

so, if I understood correctly, I can now refer to environment variables using the $() format in my buildout.cfg file; for instance, I could do something like:

[zeoserver]
recipe = plone.recipe.zeoserver[zrs]
zeo-address = 8100
replicate-from = $(ZRS_MASTER_SRV):5000
read-only = true

instead of:

[zeoserver]
recipe = plone.recipe.zeoserver[zrs]
zeo-address = 8100
replicate-from = ${settings:zrs-master}:5000
read-only = true

right?

If that is true, this is super useful!
/me goes to test this environment variable thing.

@do3cc
I really like the use of bootstrap.sh, it removes the old python bootstrap.py dance while hiding the new virtualenv, requirements.txt, bin/buildout dance.

I noted that the command for launching the instance wasn't correct. Using $(PWD) and $(INSTANCE_PORT) fail in bash because neither are valid commands.

It should be:

CLIENT_HOME=$(pwd)/instance_${INSTANCE_PORT} ./bin/instance fg

or, if you want to use all environment variables.

CLIENT_HOME=${PWD}/instance_${INSTANCE_PORT} ./bin/instance fg

I updated the README.

That could be, I tried with zsh. Thanks for the fix!

Yes.
This is a slighty simplified instance script we are running:

[instance]
blob-storage= $(BLOB_STORAGE)
client-home= $(CLIENT_HOME)
debug-mode= off
deprecation-warnings= off
eggs= ftw.raven
   munin.plone
   munin.zope
   psycopg2
   python-memcached
   RelStorage
   relstorage-packer
   ...
environment-vars= zope_i18n_compile_mo_files true
   PYTHON_EGG_CACHE $(CLIENT_HOME)/.python-eggs
   MEMCACHED_PDB $(MEMCACHED_CONN_PDB)
   MEMCACHED_MEMOIZE $(MEMCACHED_CONN_MEMOIZE)
   PDB_HOSTNAME  $(PDB_HOSTNAME)
   PDB_USERNAME  $(PDB_USERNAME)
   PDB_PASSWORD  $(PDB_PASSWORD)
   PDB_DATABASE  $(PDB_DATABASE)
   SOLR  $(SOLR_CONN)
   RAVEN_DSN $(RAVEN_DSN)
   RAVEN_TAGS {"deployment": "$(RAVEN_DEPLOYMENT)"}
event-log= disable
http-address= 0.0.0.0:$(INSTANCE_PORT)
http-fast-listen= off
http-force-connection-close= on
lock-file= $(CLIENT_HOME)/instance_$(INSTANCE_PORT).lock
pid-file= $(CLIENT_HOME)/instance_$(INSTANCE_PORT).pid
recipe= plone.recipe.zope2instance
rel-storage= type postgresql
    dsn $(RELSTORAGE_DSN)
    keep-history false
    shared-blob-dir true
    blob-dir $(BLOB_STORAGE)
    blob-cache-size 512mb
    cache-local-mb 0
    cache-prefix zodb_xxx_website
    cache-module-name memcache
    cache-servers $(MEMCACHED_CONN_ZODB)
    commit-lock-timeout 600
shared-blob= yes
user= admin:admin
verbose-security= off
z2-log= disable
zcml= munin.plone
    munin.zope
zeo-client= off
zodb-cache-size= 20000
zope-conf-additional= <zodb_db empty>
    # A database to create the catalog
    mount-point /empty
    cache-size 0
    <relstorage>
        keep-history false
        shared-blob-dir true
        blob-dir $(BLOB_STORAGE)
        blob-cache-size 512mb
        <postgresql>
            dsn $(RELSTORAGE_DSN)
        </postgresql>
    </relstorage>
</zodb_db>
<zodb_db catalog>
    # Catalog database
    mount-point /Plone/portal_catalog:/empty/Catalog/portal_catalog
    cache-size 500000
    <relstorage>
        keep-history false
        shared-blob-dir true
        blob-dir $(BLOB_STORAGE)
        blob-cache-size 512mb
        cache-local-mb 0
        cache-prefix zodb_xxx_websitecatalog
        cache-module-name memcache
        cache-servers $(MEMCACHED_CONN_ZODB_CATALOG)
        commit-lock-timeout 600
        <postgresql>
            dsn $(RELSTORAGE_DSN_CATALOG)
        </postgresql>
    </relstorage>
</zodb_db>
<product-config xxx.search>
    ploneinstance_name Plone
    exec-prefix $(INSTANCE_BINDIR)
    solr.host $(SOLR_HOST)
    solr.port $(SOLR_PORT)
</product-config>
%import collective.zamqp
<amqp-broker-connection>
    connection_id super
    hostname $(RABBITMQ_HOST)
    port $(RABBITMQ_PORT)
    username $(RABBITMQ_USERNAME)
    password $(RABBITMQ_PASSWORD)
    heartbeat 120
    prefetch_count $(RABBITMQ_PFCOUNT)
</amqp-broker-connection>
<amqp-broker-connection>
    connection_id conn_query
    hostname $(RABBITMQ_HOST)
    port $(RABBITMQ_PORT)
    username $(RABBITMQ_USERNAME)
    password $(RABBITMQ_PASSWORD)
    heartbeat 120
    prefetch_count $(RABBITMQ_PFCOUNT)
</amqp-broker-connection>
<amqp-broker-connection>
    connection_id conn_structure
    hostname $(RABBITMQ_HOST)
    port $(RABBITMQ_PORT)
    username $(RABBITMQ_USERNAME)
    password $(RABBITMQ_PASSWORD)
    heartbeat 120
    prefetch_count $(RABBITMQ_PFCOUNT)
</amqp-broker-connection>
<amqp-broker-connection>
    connection_id conn_properties
    hostname $(RABBITMQ_HOST)
    port $(RABBITMQ_PORT)
    username $(RABBITMQ_USERNAME)
    password $(RABBITMQ_PASSWORD)
    heartbeat 120
    prefetch_count $(RABBITMQ_PFCOUNT)
</amqp-broker-connection>
<amqp-broker-connection>
    connection_id conn_media
    hostname $(RABBITMQ_HOST)
    port $(RABBITMQ_PORT)
    username $(RABBITMQ_USERNAME)
    password $(RABBITMQ_PASSWORD)
    heartbeat 120
    prefetch_count $(RABBITMQ_PFCOUNT)
</amqp-broker-connection>
<eventlog>
  level INFO
  <syslog>
    address 127.0.0.1:514
    facility local3
    format %(asctime)s ZopeApp-Server zope[%(process)s]: $(INSTANCE_PORT) [%(levelname)s] %(name)s | %(message)s
    dateformat %b %d %H:%M:%S
    level INFO
  </syslog>
</eventlog>
<product-config munin.zope>
    secret $(MUNIN_SECRET)
</product-config>
zserver-threads= ${plone:threads}

I know remember one issue though.

The generated zope.conf file has still one path reference one needs to avoid.
Since I don't run buildout on production but do a pip install dance, I can fix the generated zope.conf file with awk:

cat parts/instance/etc/zope.conf | awk '{$out = NR>1 ? $0 : "%define INSTANCEHOME $(CLIENT_HOME)"; print $out}' > build/etc/instance.xml

This replaces the first line defining a full path with

%define INSTANCEHOME $(CLIENT_HOME)

This could probably be fixed with a new feature in the instance recipe.

I am fascinated by your pip-based non-buildout approach to production deployments. Can you say more, maybe in another thread?

It is just a shell script on ci that extracts the package requirements out of a buildout generated instance binary that then downloads everything as wheels.
It is pretty specific and I need to use a custom version of some package because of bad assumptions.
I tried to get my deployment to a single executable pex file, but it failed due to the same bad assumptions.
I have no plans to spend more time on this.

@do3cc
What was the motivation, faster deployment, more reliable deployment?
Would you recommend that others look at this approach?

If I would work on a new Plone project in our company again, I would not spend time on getting this to work.

My pain points were:

  • small differences between prod and int buildouts where buildout on prod failed but not on int.
  • Buildout failing on production because of temporary network problems.
  • sloow buildout because I built many instances.

To solve this today, I would build a single docker image with one instance only. This instance would be configurable via env variables for both int and production.

If there would be enough time, I would write a custom supervisor program, that would start an instance, warms its cache up by calling a few urls, and only registers the instance in the load balancer after the warm up. traefik allows adding new instances dynamically. haproxy allows setting instances on maintenance so that traffic does not get directed to these instances. We are using this warmup process from within ansible right now and turn the instances up and down in ansible.

Finally solved in zc.recipe.egg 2.0.4 :grin:

I use Varnish. I know I can issue commands to mark instances as healthy or sick, effectively adding or removing them from the load balancer. However in my current setup this is difficult, for various reasons. So I started writing this: GitHub - collective/collective.statusview: This Plone add-on provides a configurable status view.. It would handle the warmup as well.

Do you think this might work? What do you think of this approach?

I tried this myself in a rewrite of pretaweb.healthcheck. I documented my experiences until I gave up in the Pull request. I linked the pull request in your ticket.
I think it is necessary to do warm up if your site has some traffic. I remember how I paled when I once did an update on server 1, waiting for it to be available before starting to update server 2, just to suddenly see how haproxy slowly considered each instance of server 1 as down again, because they all took ages to answer the first incoming requests.

I want to thank you for this; we were having a weird issue in one customer who has some old Debian 7 (Wheezy) servers running Plone 4.3 instances: every time we had to run buildout, the develop process was taking more than one hour due to hard to debug (at least for me) I/O bottle neck.

we never had time to dig into the problem (and I can't find where I reported it back in the days)

in the beginning I though it was an issue with old Python 2.7.3, but last week I tried new zc.buildout release 2.9.4 and I was shocked when the buildout took less than 2 minutes.

that was awesome!

2 Likes

Lets see if buildout.coredev works with the new stuff https://github.com/plone/buildout.coredev/pull/373

1 Like