Package net.loomchild.segment.srx.legacy
Class FastTextIterator
java.lang.Object
net.loomchild.segment.AbstractTextIterator
net.loomchild.segment.srx.legacy.FastTextIterator
- All Implemented Interfaces:
Iterator<String>
,TextIterator
Represents fast text iterator that splits text according to SRX rules.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate ReaderMatcher
private int
private MergedPattern
private String
private int
private CharSequence
-
Constructor Summary
ConstructorsConstructorDescriptionFastTextIterator
(SrxDocument document, String languageCode, Reader reader) Creates streaming text iterator with no additional parameters.FastTextIterator
(SrxDocument document, String languageCode, Reader reader, Map<String, Object> parameterMap) Creates streaming text iterator that obtains language rules form given document using given language code.FastTextIterator
(SrxDocument document, String languageCode, CharSequence text) Creates text iterator with no additional parameters.FastTextIterator
(SrxDocument document, String languageCode, CharSequence text, Map<String, Object> parameterMap) Creates text iterator that obtains language rules form given document using given language code. -
Method Summary
Methods inherited from class net.loomchild.segment.AbstractTextIterator
remove, toString
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface java.util.Iterator
forEachRemaining
-
Field Details
-
text
-
segment
-
mergedPattern
-
breakingMatcher
-
startPosition
private int startPosition -
endPosition
private int endPosition
-
-
Constructor Details
-
FastTextIterator
public FastTextIterator(SrxDocument document, String languageCode, CharSequence text, Map<String, Object> parameterMap) Creates text iterator that obtains language rules form given document using given language code. To retrieve language rules callsSrxDocument.getLanguageRuleList(String)
. Supported parameters:SrxTextIterator.MAX_LOOKBEHIND_CONSTRUCT_LENGTH_PARAMETER
.- Parameters:
document
- document containing language ruleslanguageCode
- language code to select the ruletext
-parameterMap
- additional segmentation parameters
-
FastTextIterator
Creates text iterator with no additional parameters.- Parameters:
document
- document containing language ruleslanguageCode
- language code to select the ruletext
-- See Also:
-
FastTextIterator
public FastTextIterator(SrxDocument document, String languageCode, Reader reader, Map<String, Object> parameterMap) Creates streaming text iterator that obtains language rules form given document using given language code. To retrieve language rules callsSrxDocument.getLanguageRuleList(String)
. To handle streams uses ReaderCharSequence, so not all possible regular expressions are accepted. SeeReaderCharSequence
for details. Supported parameters:SrxTextIterator.BUFFER_LENGTH_PARAMETER
,SrxTextIterator.MAX_LOOKBEHIND_CONSTRUCT_LENGTH_PARAMETER
.- Parameters:
document
- document containing language ruleslanguageCode
- language code to select the rulesreader
- reader from which text will be readparameterMap
- additional segmentation parameters
-
FastTextIterator
Creates streaming text iterator with no additional parameters.- Parameters:
document
- document containing language ruleslanguageCode
- language code to select the rulesreader
- reader from which text will be read- See Also:
-
-
Method Details
-
next
- Returns:
- next segment in text, or null if end of text has been reached.
-
hasNext
public boolean hasNext()- Returns:
- true if there are more segments
-