There are various types of structured text. Each type should be handled by a specific type handler. A number of standard type handlers are supplied with this package.
Introduction to Structured Text
Bidirectional text offers interesting challenges to presentation systems. For plain text, the Unicode Bidirectional Algorithm (UBA) generally specifies satisfactorily how to reorder bidirectional text for display. This algorithm is implemented in many presentation and operating systems, like Java/Swing, Windows, Linux.
However, all bidirectional text is not necessarily plain text. There are also instances of text structured to follow a given syntax, which should be reflected in the display order. The general algorithm, which has no awareness of these special cases, often gives incorrect results when displaying such structured text.
The general idea in handling structured text in this package is to add directional formatting characters at proper locations in the text to supplement the standard algorithm, so that the final result is correctly displayed using the UBA.
A class which handles structured text is thus essentially a transformation engine which receives text without directional formatting characters as input and produces as output the same text with added directional formatting characters, hopefully in the minimum quantity which is sufficient to ensure correct display, considering the type of structured text involved.
In this package, text without directional formatting characters is called lean text while the text with added directional formatting characters is called full text.
- comma delimited list
- e-mail address
- directory and file path
- Java code
- regular expression
- SQL statements
- compound name (xxx_yy_zzzz)
For each of these types, an identifier is defined in StructuredTextTypeHandlerFactory :
These identifiers can be used as argument in some methods of StructuredTextProcessor to specify the type of handler to apply.
The classes included in this package are intended for users who need to process structured text in the most straightforward manner, when the following conditions are satisfied:
- There exists an appropriate handler for the type of the structured text.
- There is no need to specify non-default conditions related to the environment.
- The only operations needed are to transform lean text into full text or vice versa.
- There is no interdependence between the processing of a given string and the processing of preceding or succeeding strings.
When their needs go beyond the conditions above, users can use classes in the org.eclipse.equinox.bidi.advanced}package.
Developers who want to develop new handlers to support types of structured text not currently supported can use components of the package org.eclipse.equinox.bidi.custom. The source code of packages org.eclipse.equinox.bidi.* can serve as example of how to develop processors for currently unsupported types of structured text.
However, users wishing to process the currently supported types of structured text typically don't need to deal with the org.eclipse.equinox.bidi.custom package.
Abbreviations used in the documentation of this package
- Unicode Bidirectional Algorithm
- Graphical User Interface
- User Interface
- Left to Right
- Right to Left
- Left-to-Right Mark
- Right-to-Left Mark
- Left-to-Right Embedding
- Right-to-Left Embedding
- Pop Directional Formatting
The proposed solution is making extensive usage of LRM, RLM, LRE, RLE and PDF directional controls which are invisible but affect the way bidi text is displayed. The following related key points merit special attention:
- Implementations of the UBA on various platforms (e.g., Windows and Linux) are very similar but nevertheless have known differences. Those differences are minor and will not have a visible effect in most cases. However there might be cases in which the same bidi text on two platforms will look different. These differences will surface in Java applications when they use the platform visual components for their UI (e.g., AWT, SWT).
- It is assumed that the presentation engine supports LRE, RLE and PDF directional formatting characters.
- Because some presentation engines are not strictly conformant to the UBA, the implementation of structured text in this package adds LRM or RLM characters in association with LRE, RLE or PDF in cases where this would not be needed if the presentation engine was fully conformant to the UBA. Such added marks will not have harmful effects on conformant presentation engines and will help less conformant engines to achieve the desired presentation.