Package edu.berkeley.nlp.lm.io
Class MakeLmBinaryFromGoogle
java.lang.Object
edu.berkeley.nlp.lm.io.MakeLmBinaryFromGoogle
Given a directory in Google n-grams format, builds a binary representation of
a stupid-backoff language model language model and writes it to disk.
Language model binaries are significantly smaller and faster to load. Note:
actually running this code on the full Google-ngrams corpus can be very slow
and memory intensive -- on our machines, it takes about 32GB of memory and 15
hours.
Note that if the input/output files have a .gz
suffix, they will
be unzipped/zipped as necessary.
- Author:
- adampauls
-
Constructor Summary
Constructors -
Method Summary
-
Constructor Details
-
MakeLmBinaryFromGoogle
public MakeLmBinaryFromGoogle()
-
-
Method Details
-
main
-