Package edu.berkeley.nlp.lm
Interface WordIndexer<W>
- Type Parameters:
W
- A type representing words in the language. Can be aString
, or something more complex if needed
- All Superinterfaces:
Serializable
- All Known Implementing Classes:
StringWordIndexer
Enumerates words in the vocabulary of a language model. Stores a two-way
mapping between integers and words.
- Author:
- adampauls
-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptionReturns the start symbol (usually something like </s>int
getIndexPossiblyUnk
(W word) Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.int
getOrAddIndex
(W word) Gets the index for a word, adding if necessary.int
Returns the start symbol (usually something like <s>Returns the unk symbol (usually something like <unk>getWord
(int index) Gets the word object for an index.int
numWords()
Number of words that have been added so farvoid
setEndSymbol
(W sym) void
setStartSymbol
(W sym) void
setUnkSymbol
(W sym) void
Informs the implementation that no more words can be added to the vocabulary.
-
Method Details
-
getOrAddIndex
Gets the index for a word, adding if necessary.- Parameters:
word
-- Returns:
-
getOrAddIndexFromString
-
getIndexPossiblyUnk
Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.- Parameters:
word
-- Returns:
-
getWord
Gets the word object for an index.- Parameters:
index
-- Returns:
-
numWords
int numWords()Number of words that have been added so far- Returns:
-
getStartSymbol
W getStartSymbol()Returns the start symbol (usually something like <s>- Returns:
-
setStartSymbol
-
getEndSymbol
W getEndSymbol()Returns the start symbol (usually something like </s>- Returns:
-
setEndSymbol
-
getUnkSymbol
W getUnkSymbol()Returns the unk symbol (usually something like <unk>- Returns:
-
setUnkSymbol
-
trimAndLock
void trimAndLock()Informs the implementation that no more words can be added to the vocabulary. Implementations may perform some space optimization, and should trigger an error if an attempt is made to add a word after this point.
-