All Classes and Interfaces
Class
Description
Default implementation of all NGramLanguageModel functionality except
AbstractArrayEncodedNgramLanguageModel.getLogProb(int[], int, int)
.Default implementation of all ContextEncodedNgramLanguageModel functionality
except
AbstractContextEncodedNgramLanguageModel.getLogProb(long, int, int, LmContextInfo)
,
{@link #getOffsetForNgram(int[], int, int), and {Contains some limited shared functionality between Custom[type]Maps
Just a fancy-pants comment.
Fields annotated with this annotation will have their memory usage added
to the memory usage map returned by countApproximateMemoryUsage.
A parser for ARPA LM files.
Callback that is called for each n-gram in the collection
This class wraps
ArrayEncodedNgramLanguageModel
with a cache.A direct-mapped cache.
Top-level interface for an n-gram language model which accepts n-gram in an
array-of-integers encoding.
Language model implementation which uses Kneser-Ney-style backoff
computation.
Wraps a portion of a long[] array with iterator-like functionality over a
stream of bits.
List which returns special boundary symbols when get() is called outside the
range of the list.
Computes the log probability of a list of files.
Stores some configuration options, with useful defaults.
This class wraps
ContextEncodedNgramLanguageModel
with a cache.Interface for language models which expose the internal context-encoding for
more efficient queries.
Simple class for returning context offsets
Language model implementation which uses Kneser-Ney style backoff
computation.
A map from objects to doubles.
An array with a custom word "width" in bits.
Reader callback which adds n-grams to an NgramMap
Reads in n-gram count collections in the format that the Google n-grams Web1T
corpus comes in.
Maintains a two-way map between a set of objects and contiguous integers from
0 to the number of objects.
Some IO utility functions.
Utilities for dealing with Iterators
Wraps a two-level iteration scenario in an iterator.
Wraps a base iterator with a transformation function.
Stored type and token counts necessary for estimating a Kneser-Ney language
model
Warning: type counts are stored internally as 32-bit ints.
Class for producing a Kneser-Ney language model in ARPA format from raw text.
Class for producing a Kneser-Ney language model in ARPA format from raw text.
Callback that is called for each n-gram in the collection
This class contains a number of static methods for reading/writing/estimating
n-gram language models.
Basic logging singleton class.
Convenience class for stringing together loggers.
Logging interface.
Default logging goes nowhere.
Logs to System.out and System.err
Open address hash map with linear probing.
Open address hash map with linear probing.
Estimates a Kneser-Ney language model from raw text, and writes the language
model out in ARPA-format.
Given a language model in ARPA format, builds a binary representation of the
language model and writes it to disk.
Given a directory in Google n-grams format, builds a binary representation of
a stupid-backoff language model language model and writes it to disk.
Like
MakeLmBinaryFromGoogle
, except it only writes the NgramMap
portion of the LM, meaning the binary does not contain the vocabulary.Experimental class for reading Moses phrase tables and storing them
efficiently in memory using a trie.
Class for representing phrase tables efficiently in memory.
Taken/modified from
http://d3s.mff.cuni.cz/~holub/sw/javamurmurhash/MurmurHash.java
Wraps an NgramMap as an Iterable, so it is easy to iterate over the n-grams
and associated values.
Base interface for an n-gram language model, which exposes only inefficient
convenience methods.
Reader callback which adds n-grams to an NgramMap
Wraps an NgramMap as a Java Map, with ngrams of all orders mixed together.
Callback that is called for each n-gram in the collection
Wraps an NgramMap as an Iterable, so it is easy to iterate over the n-grams
of a particular order.
Wraps an NgramMap as a Java Map, but only ngrams of a particular order.
A generic-typed pair of objects.
Stored type and token counts necessary for estimating a Kneser-Ney language
model
Implementation of a WordIndexer in which words are represented as strings.
Language model implementation which uses stupid backoff (Brants et al., 2007)
computation.
Class for reading raw text files.
Provides a map from objects to non-negative integers.
Manages storage of arbitrary values in an NgramMap
Enumerates words in the vocabulary of a language model.