Wednesday, February 27, 2008

Why can you not use a Wildcard as the first character in a Lucene search?

Keywords:
apache lucene wildcard search index "why not" leading first

Problem:
Reading the Lucene query parser syntax documentation for their latest release, there's a line that says "Note: You cannot use a * or ? symbol as the first character of a search.".

Why not? Sure it may be inefficient, but what if you're willing to wear that. Is it something that is technically not possible, is it something that's up to users of the library to work around or will it eventually be implemented in Lucene?

Solution:
Out of curiosity for why this is a limitation I found notes in the Lucene Wiki that indicate that you can do it and it is possible.

Seems the Query Syntax guide in the Lucene release simply needs updating(?).

queryParser.setAllowLeadingWildcard(true);


Notes:
Note the Wiki says this feature is available as of 2.1. But it seems there was a bug in this release. I've tested this with lucene 2.3.1 and it works fine.