Package edu.berkeley.nlp.lm
Class StringWordIndexer
java.lang.Object
edu.berkeley.nlp.lm.StringWordIndexer
- All Implemented Interfaces:
WordIndexer<String>
,Serializable
Implementation of a WordIndexer in which words are represented as strings.
- Author:
- adampauls
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.WordIndexer
WordIndexer.StaticMethods
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionReturns the start symbol (usually something like </s>int
getIndexPossiblyUnk
(String word) Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.int
getOrAddIndex
(String word) Gets the index for a word, adding if necessary.int
Returns the start symbol (usually something like <s>Returns the unk symbol (usually something like <unk>getWord
(int index) Gets the word object for an index.int
numWords()
Number of words that have been added so farvoid
setEndSymbol
(String sym) void
setStartSymbol
(String sym) void
setUnkSymbol
(String sym) void
Informs the implementation that no more words can be added to the vocabulary.
-
Constructor Details
-
StringWordIndexer
public StringWordIndexer()
-
-
Method Details
-
getOrAddIndex
Description copied from interface:WordIndexer
Gets the index for a word, adding if necessary.- Specified by:
getOrAddIndex
in interfaceWordIndexer<String>
- Returns:
-
getWord
Description copied from interface:WordIndexer
Gets the word object for an index.- Specified by:
getWord
in interfaceWordIndexer<String>
- Returns:
-
numWords
public int numWords()Description copied from interface:WordIndexer
Number of words that have been added so far- Specified by:
numWords
in interfaceWordIndexer<String>
- Returns:
-
getStartSymbol
Description copied from interface:WordIndexer
Returns the start symbol (usually something like <s>- Specified by:
getStartSymbol
in interfaceWordIndexer<String>
- Returns:
-
getEndSymbol
Description copied from interface:WordIndexer
Returns the start symbol (usually something like </s>- Specified by:
getEndSymbol
in interfaceWordIndexer<String>
- Returns:
-
getUnkSymbol
Description copied from interface:WordIndexer
Returns the unk symbol (usually something like <unk>- Specified by:
getUnkSymbol
in interfaceWordIndexer<String>
- Returns:
-
getOrAddIndexFromString
- Specified by:
getOrAddIndexFromString
in interfaceWordIndexer<String>
-
setStartSymbol
- Specified by:
setStartSymbol
in interfaceWordIndexer<String>
-
setEndSymbol
- Specified by:
setEndSymbol
in interfaceWordIndexer<String>
-
setUnkSymbol
- Specified by:
setUnkSymbol
in interfaceWordIndexer<String>
-
trimAndLock
public void trimAndLock()Description copied from interface:WordIndexer
Informs the implementation that no more words can be added to the vocabulary. Implementations may perform some space optimization, and should trigger an error if an attempt is made to add a word after this point.- Specified by:
trimAndLock
in interfaceWordIndexer<String>
-
getIndexPossiblyUnk
Description copied from interface:WordIndexer
Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.- Specified by:
getIndexPossiblyUnk
in interfaceWordIndexer<String>
- Returns:
-