Making an index of all nouns

espenmn · August 14, 2019, 9:51am

Someone asked if it was possible to make a 'keyword index' of some long texts. I have them saved as mark-down if that matters.

So, basically what I want to do is to get every noun and every word that starts with a capital letter that is used more than 3 times. ( Unfortunately, the text is in Norwegian not in German … (we dont use capital letters for nouns ))

Any suggestion on how to do this (are there any pythons libs, maybe it could be done with a portal transform) ?

If it was possible to link to the words it would be even better

zopyx · August 14, 2019, 11:12am

Use POS (part-of-speech tagging). The most common solution for Python is NLTK or Spacy nowadays.

Spacy is clearly the modern variant but you need a language specific model (not sure about support for norwegian).

Or perhaps something more lightweight:

espenmn · August 14, 2019, 12:05pm

Thanks.

Norwegian comes in two 'languages', looks like there is support for one of them (the most used one). I will check if that is enough here.