Help exploits capabilities of the Lucene search engine, that allows indexing of token streams (streams of words). Analyzers create tokens from the character stream. They examine text content and provide tokens for use with the index. The text stream can be tokenized in many unique ways. A trivial analyzer can tokenize streams at white space, a different one can perform filtering of tokens, based on the application needs. Since the documentation is mostly human-readable text, it is desired that analyzers used by the help system perform language and grammar aware tokenization and normalization of indexed text. For some languages, the quality of search increases significantly if stop word removal and stemming is performed on the indexed text.
The analyzer contributed to this extension point will override the one provided by the Eclipse help system for a given locale.
<!ELEMENT extension (analyzer*)>
point CDATA #REQUIRED
id CDATA #IMPLIED
name CDATA #IMPLIED>
<!ELEMENT analyzer EMPTY>
locale CDATA #REQUIRED
class CDATA #REQUIRED>
<extension id="com.xyx.XYZ" point="org.eclipse.help.base.luceneAnalyzer"> <analyzer locale="ll_CC" class="com.xyz.ll_CCAnalyzer"/> </extension>
The value of the class attribute must represent a class that extends org.apache.lucene.analysis.Analyzer. It is recommended that this analyzer performs lowercase filtering for languages where it is possible to increase number of search hits by making search case-sensitive.
Copyright (c) 2000, 2005 IBM Corporation and others.
All rights reserved. This program and the accompanying materials are made available under the terms of the Eclipse Public License v1.0 which accompanies this distribution, and is available at http://www.eclipse.org/legal/epl-v10.html