Slow query with created index

Noticed catalog query on "created" index is very slow, below is what I benchmark.

_DT = DateTime()
catalog = api.portal.get_tool(name="portal_catalog")
_query = {
    "object_provides": IPresent.__identifier__,
    'created': {'query': _DT, 'range': 'max'},
}
brains = catalog.searchResults(**_query)
return brains

Above code takes about 40sec on first cold Plone load, second try is about 1sec.

Below code takes about 10sec on first cold Plone load, second try is about 0.004sec

_DT = DateTime()
catalog = api.portal.get_tool(name="portal_catalog")
_query = {
    "object_provides": IPresent.__identifier__,
}
brains = catalog.searchResults(**_query)
items = []
for brain in brains:
    if brain.created <= _DT:
        items.append(brain)
return items

There are about 55,000 items in created index. Actually I only have 5 IPresent objects in db.
Using Plone4.3.18

Never thought query on large DateIndex is so slow...

In a standard setup, datetime related indexes are extremely broad. As you likely know, an index maps an indexed value to the set of objects indexed under this value. For the above indexes, those sets typically have only a single event. As a consequence, the typical range queries for those indexes are effectively huge "or" queries where each term requires at least one ZODB load. A ZODB load may need access to the filesystem (when the object is not cached) and this can take several ms.

There are specialized indexes to index datetimes ("DateTimeIndex", "DateRangeIndex") to optimize. They have a configuration parameter "resolution" which allows you to specify the datetime resolution. A more coarse resolution makes the index less broad and thereby more efficient (the "or" become smaller). Often, you do not need a nanosecond resolution but a day resolution is sufficient.

I suggest, you check what index type is used for your created index. If necessary, you change it to a datetime index. Then you set its resolution to the value you need and retry your benchmark.

If this still does not satisfy your speed requirements, note that ZCatalog is build on top of the ZODB: a very simple and general object database. Unlike other database systems, it has (almost) no knowledge about the objects it manages. The advantage: it can store (almost) any object without configuration; the disadvantage: it must use a general and expensive storage format (Python pickles). A further consequence: it cannot support filtering (querying) at the database level: this must be done at application level. Thus, if query speed is necessary in large installations, you must integrate with a specialized search engine (such as e.g. Sol/r, elasticsearch, ...).

1 Like
  1. On a cold Plone catalog needs to load many objects from the database.
  2. On a cold Plone the catalog query plan is empty and the first query is not optimized.

How long does it take for the second and the followoing queries?

To help with (1) - if the database connection is "slow" (read: not as fast as local storage) - a local storage persistent database cache can help to bridge restarts after updates etc.

"Above code"
Cold start: 31.4403 sec
2nd try: 0.9816 sec
3rd try: 1.2044 sec
4th try: 0.9805 sec
5th try: 0.9821 sec

"Below code"
Cold start: 10.3342 sec
2nd try: 0.0057 sec
3rd try: 0.0053 sec
4th try: 0.0042 sec
5th try: 0.0048 sec

The benchmark is done on my local laptop, all local storage.

Indeed way to slow. I am curious what Py-Spy or a profiler would detect....