Load testing a Plone site

anyone with some experience in load testing a Plone site? What tools do you use and how?
I need to test if some websites can handle a temporary increase of visits.

I recall that some years ago i made some tests with siege, but i have some problems using it with Plone sites and i don't know why.

For example if i try to siege a non-plone website i have the following output:

> siege https://www.wordpress.com -t 1M
HTTP/1.1 200     0.22 secs:   20463 bytes ==> GET  https://wordpress.com/
HTTP/1.1 301     0.20 secs:     162 bytes ==> GET  https://www.wordpress.com
HTTP/1.1 200     0.09 secs:   40259 bytes ==> GET  https://s1.wp.com/home.logged-out/page-jan-2019/js/bundle.js?v=1572517516
HTTP/1.1 200     0.10 secs:   63090 bytes ==> GET  https://s1.wp.com/wp-content/themes/h4/landing/marketing/pages/_common/components/testimonials/media/ann-morgan.png
HTTP/1.1 200     0.08 secs:   40259 bytes ==> GET  https://s1.wp.com/home.logged-out/page-jan-2019/js/bundle.js?v=1572517516
Lifting the server siege...
Transactions:		       12038 hits
Availability:		      100.00 %
Elapsed time:		       59.94 secs
Data transferred:	      252.38 MB
Response time:		        0.12 secs
Transaction rate:	      200.83 trans/sec
Throughput:		        4.21 MB/sec
Concurrency:		       24.29
Successful transactions:       12044
Failed transactions:	           0
Longest transaction:	        0.59
Shortest transaction:	        0.04

But on every Plone website that i tested, i have something like this:

> siege https://www.plone.org -t 30S
HTTP/1.1 200     0.11 secs:   14478 bytes ==> GET  https://www.plone.org:443/
HTTP/1.1 200     0.11 secs:   14478 bytes ==> GET  https://www.plone.org:443/
HTTP/1.1 200     0.14 secs:   14478 bytes ==> GET  https://www.plone.org:443/
HTTP/1.1 200     0.10 secs:   14478 bytes ==> GET  https://www.plone.org:443/
Lifting the server siege...
Transactions:		           0 hits
Availability:		        0.00 %
Elapsed time:		       29.54 secs
Data transferred:	       72.74 MB
Response time:		        0.00 secs
Transaction rate:	        0.00 trans/sec
Throughput:		        2.46 MB/sec
Concurrency:		       24.23
Successful transactions:        5268
Failed transactions:	           0
Longest transaction:	        0.85
Shortest transaction:	        0.08

Strange things are Transactions and Availability with 0 and list of GET with SSL port (and no related resources like images or styles).

btw this is actual siege configuration:

> siege -C
Mozilla/5.0 (apple-x86_64-darwin19.0.0) Siege/4.0.4
Edit the resource file to change the settings.
version:                        4.0.4
verbose:                        true
color:                          true
quiet:                          false
debug:                          false
protocol:                       HTTP/1.0
HTML parser:                    enabled
get method:                     HEAD
connection:                     close
concurrent users:               25
time to run:                    n/a
repetitions:                    n/a
socket timeout:                 30
cache enabled:                  false
accept-encoding:                gzip, deflate
delay:                          0.000 sec
internet simulation:            false
benchmark mode:                 false
failures until abort:           1024
named URL:                      none
URLs file:                      /usr/local/Cellar/siege/4.0.4_2/etc/urls.txt
thread limit:                   255
logging:                        true
log file:                       /foo/var/siege.log
resource file:                  /foo/.siege/siege.conf
timestamped output:             false
comma separated output:         false
allow redirects:                true
allow zero byte data:           true
allow chunked encoding:         true
upload unique files:            true
- ad.doubleclick.net
- pagead2.googlesyndication.com
- ads.pubsqrd.com
- ib.adnxs.com

I never used it myself, but I read of ab (something like Apache Benchmark) gets used for performance tests.

The question is not easy to answer. Usually you are running behind a proxy and don't need to worry much about performance (at least for anonymous visitors). I am usually more concerned about underlaying code (own or 3rd party code) that would run longer than expected. Blocked worker threads are usually more of an issue. At least in Python 3/Plone 5.2 land with Waitress you will see warnings in the lock about the request queue. In general, I usually do not perform any performance tests even for larger sites. A cache plus some ZEO clients usually do the job...checking into performance details when someone complains...which is usually never the case :crazy_face:

Yes, you're right..usually i don't too because as you said there is a Varnish and Plone cache between Plone and anonymous users. And i agree that performance fine-tuning needs to be done on the code.

There will be an important election round next weekend and my customer is going to publish a link for results on a Plone page. I just wanted to see how many connections from anonymous users my infrastructure can handle before having some performance/availability issues and tell them that everything is fine with a possible increase of visitors that night.

The thing that i don't understand is why Plone sites responds in that way to a siege "attack" and i can't have a benchmark :thinking:

thanks @jugmac00 ..will give it a look

In our company we used to use Apache JMeter for complex tests and Apache ab for simple tests. Now we use only Locust. As it was developed in Python, it was quite easy to integrate with Plone.

1 Like

It's awhile since we had to do the load testing ourselves but we've been through the process of having to deal with high load plone sites. My advice is first look carefully at what your anonymous users do now and what they might do different on the big day.
If there are signup processes, complex custom search pages etc that could be popular then you need a tool that load tests a playback log of that includes those interactions. If parts of this are slow then think about what you can offload. For example we got caught out with a registration process that gave out unique ids and generated pdf certificates. In retrospect that we should have done the calculations and forced the client to redesign so that everything happened async and got sent back via email. Also make sure you have no code that makes external api calls. Make this async also or use c.futures. I've seen that bring down the largest plone sites.
If you are worried about a lot just general browsing then the easiest way to solve that is change your caching rules so there is a 1min or 5min caching rule on every single page. At least until the big day is over. It works great with varnish. Works even better with cloudfront.
None of these will help if your common paths include CPU bound tasks like search. The final trick you can do here is use haproxy or similar to seperate that traffic into a different backend queue. This will at least prevent such slow requests slowing down the rest of the site.

1 Like