The Zorba XQuery engine implements the XQuery and XPath Full Text 1.0 specification that, among other things, adds the ability to use stemming for text-matching via the stemming option. For example, the query:
returns true
because $x
contains "Improvment" that has the same stem as "improve".
The initial implementation of the stemming option uses the Snowball stemmers and therefore can stem words in the following languages: Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, and Turkish.
Using the Zorba C++ API, you can provide your own stemmer by deriving from two classes: Stemmer
and StemmerProvider
.
The Stemmer
class is:
For details about the ptr
type, the destroy()
function, and why the destructor is protected
, see the Memory Management document.
To implement the Stemmer
, you need to implement the stem()
function where:
word | The word to be stemmed. |
lang | The language of the word. |
result | The stemmed word goes here. |
Note that result
should always be set to something. If your stemmer doesn't know how to stem the given word, you should set result
to word
. You also need to implement the properties()
function and set the identifying URI of your stemmer.
A very simple stemmer that stems the word "foobar" to "foo" can be implemented as:
A real stemmer would either use a stemming algorithm or a dictionary look-up to stem many words, of course. Although not used in this simple example, lang
can be used to allow a single stemmer instance to stem words in more than one language.
In addition to a Stemmer
, you must also implement a StemmerProvider
that, given a language, provides a Stemmer
for that language:
The getStemmer()
function should return true
only if it can provide a Stemmer
for the given language; false
otherwise. If the Stemmer::ptr
argument is null
, the caller wants to check only whether the provider can provide a stemmer for the given language and doesn't want a Stemmer
instance created or returned.
A simple StemmerProvider
for our simple stemmer can be implemented as:
To enable your stemmer to be used, you need to register it with the XmlDataManager: