Populating a new site via API - how can I generate a slug from title?

I’ve been running a python script to populate my new site, but it’s not idempotent.

I’m extracting data from a WordPress site and converting it to JSON to upload to plone, and that’s going well. But I’m also adding a bunch of media files from Google Drive, and many of them end up with a short name other than what I expect, so that if I run the migration twice, it creates two objects.

I can assume that every new title I add is unique (or, more precisely, I don’t care if I lose the odd one!) so I am converting the title to a slug, checking for the existence of a Plone object with that slug, and if it doesn’t exist I add it. However, Plone’s rules for converting a title to a slug are not obvious and every time I code a new rule, something else breaks it and I end up getting something posted twice if I rerun the migration.

So, I need to know how I can either:

  • search the site for an EXACT title with the REST API (when I’ve tried searching I get numerous matches on content rather than title), or
  • how exactly does Plone generate a slug from a title, so that I can do the same

Your post contains a lot of information, but misses details. I’ll try to give some general feedback first, to see if that helps.

so that if I run the migration twice, it creates two objects.

  1. If you try to create an object in the site content tree and the id (which is part of the URL) already exists, the id normalizer will kick in, and create a non conflicting name.

  2. If you create an object without explicitly setting/passing the ID, the id will be calculated from the Title. If that ID already exists, see 1.

Searching: if you don’t specify the id to search on, the restapi search endpoint will likely use the SearchableText index, which contains a full text index over multiple fields from every content item. If you specify the id index, you can search on the object id instead. The restapi search endpoint documentation is not a full spec of the catalog, it links to the older backend Plone 5 documentation, which is still valid:

1 Like

I tried explicitly setting the ID. Am I using the right field: “@id”? Should it be set to the URL, which is what is returned in a query, or should it just be the short-name?

@id” seemed to be ignored when I supplied the full URL.

As for searching, I can’t search on the ID, because I don’t necessarily know what Plone is going to set it to. That’s why I needed to search just the titles. Looking at portal_catalog, I see a Title index, but it’s not browsable like sortable_title(which has the same problem as id: I don’t know the rules used in the conversion!), so I’ll just have to experiment to see if I can get something from that.

You need to set it in the `id` field.

To do the searches, you also need to uset the `id` field.

1 Like

OK, there's no documentation I could find about putting an id in a POST. So, is it the slug or the full URL?

I can’t search on the ‘id’ field: my whole point here, is that I have no way of knowing what the ID will be (I suppose I could save the id after successfully posting). I’d rather just slugify the title before saving it, but that wasn’t working for me because I used ‘@id’.

the id field had just the slug, not the full url

I just tried it, and that’s working perfectly. Thank you!