Hi all
for a few days we migrated our website from ZEO Data.fs to RelStorage (with PostgreSQL). The migration went fine but now we are noticing that the website is almost 2x slow that what it used to be
The configuration looks like this on zope.conf
:
<relstorage>
keep-history false
blob-dir var/blobstorage
blob-cache-size 2gb
cache-local-mb 200
cache-local-dir var/relstorage-cache
<postgresql>
dsn dbname='datafs' user='relstorage' host='aa.bb.cc.dd' password='xxx'
</postgresql>
</relstorage>
I have an assorted list of findings, no conclusions at all so far (that's why I'm here ).
py-spy
I used to inspect the website for performance with py-spy
and when I started doing the same, now with RelStorage in the mix, I get the following warnings from py-spy
:
py-spy> 36.69s behind in sampling, results may be inaccurate. Try reducing the sampling rate
And interestingly enough, trying a request, with py-spy
tracing, takes ages (minutes), stopping py-spy
gets the pages so much more faster.
Looks like py-spy
is somewhat blocking the process?
If I have enough patience and wait for a request to be finished, the flamegraph, oh surprise, it shows that RelStorage
code paths are the ones that take the more samples...
Cython ?
Then I thought that, given that RelStorage
is using Cython
to optimize its code, we might not benefit from that, and instead we get the plain python version... turns out zc.buildout
/pip
installs a wheel file:
Fetching RelStorage 4.0.0a1 from: https://files.pythonhosted.org/packages/9d/26/a49cb7df2be4697ac0b67a7892e06cef8a87b133cf95e4c48e7191faebc7/RelStorage-4.0.0a1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl#sha256=b4f2d4e967c02bfba6370aec38ad4cec86dc89daaa24cc12491a1c9f1a4d03a5
Here I was testing the alpha release, to check if indeed, a wheel is downloaded, we are using RelStorage==3.5.0
with psycopg2-binary = 2.9.8
and PostgreSQL
server is at version 15.4
.
Persistent caching
On the configuration we state cache-local-dir var/relstorage-cache
so that cache is persisted across restarts. Well, turns out that not for us
$ ls -lh var/relstorage-cache/
-rw-r--r-- 1 s-derfreitag service 16K Nov 10 14:58 relstorage-cache2-datafs.sqlite3
16kb
?!?! Am I missing anything to get the persistent caching to work?
I filed an upstream report for this.
Client Caching Tracing
I had a look at the tracing functionality and it is working, +90% hit rate does not sound that bad:
loads hits inv(h) writes hitrate
Nov 10 14:58-58 0 0 0 0 n/a
Nov 10 14:58:59 ==================== Restart ====================
Nov 10 14:58-59 14948 6 0 14942 0.0%
Nov 10 15:00-14 587030 341392 0 245641 58.2%
Nov 10 15:15-29 566485 553804 0 12875 97.8%
Nov 10 15:30-44 557521 518122 0 39837 92.9%
Nov 10 15:45-59 514945 506393 0 8772 98.3%
Nov 10 16:00-14 426124 385536 0 40673 90.5%
Nov 10 16:15-29 430301 422432 0 7900 98.2%
Nov 10 16:30-44 554803 523956 0 30850 94.4%
Nov 10 16:45-59 207864 202747 0 5126 97.5%
Nov 10 17:00-14 574693 530792 0 43904 92.4%
Nov 10 17:15-29 946277 924465 0 21922 97.7%
Nov 10 17:30-44 456647 440533 0 16114 96.5%
Nov 10 17:45-48 183321 173959 0 9365 94.9%
Read 6,518,881 trace records (221,640,602 bytes) in 28.7 seconds
Versions: 0 records used a version
First time: Fri Nov 10 14:58:59 2023
Last time: Fri Nov 10 17:48:47 2023
Duration: 10,188 seconds
Data recs: 6,022,058 (92.4%), average size 442 bytes
Hit rate: 91.7% (load hits / loads)
Count Code Function (action)
1 00 _setup_trace (initialization)
496,822 20 load (miss)
5,524,137 22 load (hit)
497,921 52 store (current, non-version)
So the RAM cache does work, but the persistent one does not, but still it's noticeably slower, and we are using the history free version because we read that it's faster than the history preserving
I'm a bit clueless, any kind soul has any input/ideas to try?