ParseError when search term contains parenthesis in ZCatalog search

I get errors like this

... my code ...
Module Products.ZCatalog.ZCatalog, line 625, in searchResults
Module Products.ZCatalog.Catalog, line 1091, in searchResults
Module Products.ZCatalog.Catalog, line 634, in search
Module Products.ZCatalog.Catalog, line 564, in _search_index
Module Products.ZCTextIndex.ZCTextIndex, line 209, in query_index
Module Products.ZCTextIndex.QueryParser, line 154, in parseQuery
Module Products.ZCTextIndex.QueryParser, line 174, in _require
Products.ZCTextIndex.ParseTree.ParseError: Token <Token:EOF> required, '(' found

when a user enters a search term with parenthesis, ie ( or ).

Is there any chance to enable search with parenthesis? Do I have to escape them manually or is there a helper function somewhere?

Background

A user searched for a company like Foo automotive (bar) S.R.L..

Thanks for any tips!

P.S.: Moving away from ZCatalog is no option for now and the foreseeable future.

ZCTextIndex is just dumb and stupid in dealing with such queries (and completely behind the expectations when dealing with search engines like google). Long story short: ensure that your query is clean and (basically spoken) contains clean and proper search terms when you work with ZCTextIndex.
My query parser in TextIndexNG3 would deal with such queries through a custom query parser. The functionality of ZCTextIndex is outdated technology, two decades old...for serious searching in Plone: Elastic or Solr

Hi @zopyx,

actually we are using TextIndexNG3 and have the same issue:

Traceback (innermost last):
  Module ZPublisher.WSGIPublisher, line 162, in transaction_pubevents
  Module ZPublisher.WSGIPublisher, line 371, in publish_module
  Module ZPublisher.WSGIPublisher, line 274, in publish
  Module ZPublisher.mapply, line 85, in mapply
  Module ZPublisher.WSGIPublisher, line 63, in call_object
  Module senaite.app.listing.view, line 215, in __call__
  Module senaite.app.listing.ajax, line 111, in handle_subpath
  Module senaite.core.decorators, line 20, in decorator
  Module senaite.app.listing.decorators, line 63, in wrapper
  Module senaite.app.listing.decorators, line 50, in wrapper
  Module senaite.app.listing.decorators, line 100, in wrapper
  Module senaite.app.listing.ajax, line 532, in ajax_folderitems
  Module senaite.app.listing.decorators, line 88, in wrapper
  Module senaite.app.listing.ajax, line 315, in get_folderitems
  Module senaite.app.listing.view, line 884, in folderitems
  Module senaite.app.listing.view, line 721, in _fetch_brains
  Module senaite.app.listing.view, line 764, in search
  Module senaite.app.listing.view, line 714, in ng3_index_search
  Module Products.CMFPlone.CatalogTool, line 463, in searchResults
  Module Products.ZCatalog.ZCatalog, line 625, in searchResults
  Module Products.ZCatalog.Catalog, line 1091, in searchResults
  Module Products.ZCatalog.Catalog, line 634, in search
  Module Products.ZCatalog.Catalog, line 569, in _search_index
  Module Products.TextIndexNG3.TextIndexNG3, line 158, in _apply_index
  Module zopyx.txng3.core.index, line 344, in search
  Module zopyx.txng3.core.parsers.english, line 92, in parse
QueryParserError: Unable to parse query: *oel (hydraulik; gelb* 

Not sure from your last answer if this is something that should work out of the box, something we would need to implement a custom query parser or simply clean the query string beforehand?

Also are there any plans to update this add-on to Python 3 or would you rather recommend to drop it and go for Elastic or Solr?

Thanks
Ramon

Hi @jugmac00,

once upon a time I was also struggling with this issue and simply stripped away these characters from the search:

However, I thought this time that it only happens when the search string started with one of these characters, this is why I used the lstrip.

1 Like

Thanks!

As stated in the original post, I cannot move to another search technology (time / effort wise), I already have some search text sanitation in place, which I will try to improve a bit.

Thanks again for the feedback.

I updated my search term sanitation and now it works as expected.

@ramonski Could you elaborate why you exclude so many characters?

qs = q.lstrip("*.!$%&/()=#-+:'´^")`

Are they all "bad"?

I only experienced problems with parenthesis.

That said I did not try to crash the search with all these characters.

The list of characters is random. You bascially want to strip off everything that is not a number or an "standard" character. This step is called normalization and should applied during indexing and querying in the same way to terms to be indexed and to query terms.

1 Like

It is like @zopyx said that I've chosen the characters randomly just because I wanted to remove some common characters that makes no sense when querying the text catalog. But there are indeed better ways to do the normalization.

1 Like