Package morfologik.stemming
Class TrimInfixAndSuffixEncoder
java.lang.Object
morfologik.stemming.TrimInfixAndSuffixEncoder
- All Implemented Interfaces:
ISequenceEncoder
Encodes
dst
relative to src
by trimming whatever
non-equal suffix and infix src
and dst
have. The
output code is (bytes):
{X}{L}{K}{suffix}where
src's
infix at position (X
- 'A') and of
length (L
- 'A') should be removed, then (K
-
'A') bytes should be trimmed from the end and then the suffix
should be appended to the resulting byte sequence.
Examples:
src: ayz dst: abc encoded: AACbc src: aillent dst: aller encoded: BBCr
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int
Maximum encodable single-byte code.private ByteBuffer
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondecode
(ByteBuffer reuse, ByteBuffer source, ByteBuffer encoded) encode
(ByteBuffer reuse, ByteBuffer source, ByteBuffer target) int
The number of encoded form's prefix bytes that should be ignored (needed for separator lookup).toString()
-
Field Details
-
REMOVE_EVERYTHING
private static final int REMOVE_EVERYTHINGMaximum encodable single-byte code.- See Also:
-
scratch
-
-
Constructor Details
-
TrimInfixAndSuffixEncoder
public TrimInfixAndSuffixEncoder()
-
-
Method Details
-
encode
Description copied from interface:ISequenceEncoder
- Specified by:
encode
in interfaceISequenceEncoder
- Parameters:
reuse
- Reuses the providedByteBuffer
or allocates a new one if there is not enough remaining space.source
- The source byte sequence.target
- The target byte sequence to encode relative tosource
- Returns:
- Returns the
ByteBuffer
with encodedtarget
.
-
prefixBytes
public int prefixBytes()Description copied from interface:ISequenceEncoder
The number of encoded form's prefix bytes that should be ignored (needed for separator lookup). An ugly workaround for GH-85, should be fixed by prior knowledge of whether the dictionary contains tags; then we can scan for separator right-to-left.- Specified by:
prefixBytes
in interfaceISequenceEncoder
- See Also:
-
decode
Description copied from interface:ISequenceEncoder
- Specified by:
decode
in interfaceISequenceEncoder
- Parameters:
reuse
- Reuses the providedByteBuffer
or allocates a new one if there is not enough remaining space.source
- The source byte sequence.encoded
- The previously encoded byte sequence.- Returns:
- Returns the
ByteBuffer
with decodedtarget
.
-
toString
-