How to go faster when importing new content with the Plone api

I've created an importer that iterates over a json file and uses the Plone api
to create objects.
Any tips on how I can make it run faster?

It's something like this:

def json_create_result(_result,target_folder,request_id):
    ...  <snip a bit of preparatory code> ...
    obj = api.content.create(
        type="Custom Content",
        title=request_id,
        first_name=_result["patient_first_name"],
        last_name=_result["patient_last_name"],
        gender=lab_result["gender"],
        container=target_folder
        )
     return obj

I'm getting about 10,000 objects per hour which feels slow (and not fun for 100,000 objects)

two hints:

  1. Prepare and pass the id. If you don't pass a id then plone.api will rename/move the object after creating which is very slow because it will re-trigger many events.
  2. Disable versioning during import. You can re-enable it afterwards.
1 Like

Thanks @pbauer,
Would using two instances and splitting the source file between them have any impact?

Splitting may help if not in the same container, nor doesn't do any portal_catalog related work as that will conflict at some point.
More time might be gained by disabling any indexing until after.

Please note: If your importer does 10k/hour and you have 100k objects to do you can do this over night and be done.

Agreed.
This is of course a "guesstimate". According to the log file the import has created all the new objects but it has been "stuck" for at least 3 hours.
The instance that I'm doing the importing with is constantly at 104.6% CPU 57.3 Mem. I'm assuming that the content is being indexed at this point :man_shrugging:t5: :crossed_fingers:t5:
Not excited by the idea of starting this again :frowning: