Provides a framework for rule based text scanning and uses the framework to provide rule-driven default implementations of
RuleBasedScanneris a document-based scanner controlled by
IRuleobjects. When evaluated an
IRulealways returns an
IToken. The package provides a set of rules whereby
PatternRuleis the most important one.
PatternRuledefines a pattern-configurable rule.
Interface Summary Interface Description ICharacterScannerDefines the interface of a character scanner used by rules. IPartitionTokenScannerA partition token scanner returns tokens that represent partitions. IPredicateRuleDefines the interface for a rule used in the scanning of text for the purpose of document partitioning or text styling. IRuleDefines the interface for a rule used in the scanning of text for the purpose of document partitioning or text styling. ITokenA token to be returned by a rule. ITokenScannerA token scanner scans a range of a document and reports about the token it finds. IWhitespaceDetectorDefines the interface by which
WhitespaceRuledetermines whether a given character is to be considered whitespace in the current context.
IWordDetectorDefines the interface by which
WordRuledetermines whether a given character is valid as part of a word in the current context.
Class Summary Class Description BufferedRuleBasedScannerA buffered rule based scanner. DefaultDamagerRepairerA standard implementation of a syntax driven presentation damager and presentation repairer. DefaultPartitioner Deprecated.As of 3.1, replaced by
EndOfLineRuleA specific configuration of a single line rule whereby the pattern begins with a specific sequence but is only ended by a line delimiter. FastPartitionerA standard implementation of a document partitioner. MultiLineRuleA rule for detecting patterns which begin with a given sequence and may end with a given sequence thereby spanning multiple lines. NumberRuleAn implementation of
IRuledetecting a numerical value.
PatternRuleStandard implementation of
RuleBasedPartitionScannerScanner that exclusively uses predicate rules. RuleBasedScannerA generic scanner which can be "programmed" with a sequence of rules. SingleLineRuleA specific configuration of pattern rule whereby the pattern begins with a specific sequence and may end with a specific sequence, but will not span more than a single line. TokenStandard implementation of
WhitespaceRuleAn implementation of
IRulecapable of detecting whitespace.
WordPatternRuleA specific single line rule which stipulates that the start and end sequence occur within a single word, as defined by a word detector. WordRuleAn implementation of
IRulecapable of detecting words.