Package com.optimaize.langdetect
Class LanguageDetectorBuilder
java.lang.Object
com.optimaize.langdetect.LanguageDetectorBuilder
Builder for
LanguageDetector
.
This class does no internal synchronization.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate double
private static final double
private final @NotNull Set<LanguageProfile>
private double
private final @NotNull NgramExtractor
private double
private double
private com.google.common.base.Optional<Long>
private int
private double
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
LanguageDetectorBuilder
(@NotNull NgramExtractor ngramExtractor) -
Method Summary
Modifier and TypeMethodDescriptionaffixFactor
(double affixFactor) Sets prefixFactor() and suffixFactor() both to the given value.alpha
(double alpha) build()
static LanguageDetectorBuilder
create
(@NotNull NgramExtractor ngramExtractor) languagePriorities
(@Nullable Map<LdLocale, Double> langWeightingMap) TODO document exactly.minimalConfidence
(double minimalConfidence) LanguageDetector.detect(java.lang.CharSequence)
returns a language if the best detected language has at least this probability.prefixFactor
(double prefixFactor) To weight n-grams that are on the left border of a word differently from n-grams in the middle of words, assign a value here.probabilityThreshold
(double probabilityThreshold) LanguageDetector.getProbabilities(java.lang.CharSequence)
does not return languages with less probability than this.seed
(long seed) shortTextAlgorithm
(int shortTextAlgorithm) Defaults to 0, which means don't use this feature.suffixFactor
(double suffixFactor) Defaults to 1.0, which means don't use this feature.withProfile
(LanguageProfile languageProfile) withProfiles
(Iterable<LanguageProfile> languageProfiles)
-
Field Details
-
ALPHA_DEFAULT
private static final double ALPHA_DEFAULT- See Also:
-
ngramExtractor
-
alpha
private double alpha -
seed
-
shortTextAlgorithm
private int shortTextAlgorithm -
prefixFactor
private double prefixFactor -
suffixFactor
private double suffixFactor -
probabilityThreshold
private double probabilityThreshold -
minimalConfidence
private double minimalConfidence -
langWeightingMap
-
languageProfiles
-
langsAdded
-
-
Constructor Details
-
LanguageDetectorBuilder
-
-
Method Details
-
create
-
alpha
-
seed
-
seed
-
shortTextAlgorithm
Defaults to 0, which means don't use this feature. That's the old behavior. -
affixFactor
Sets prefixFactor() and suffixFactor() both to the given value.- See Also:
-
prefixFactor
To weight n-grams that are on the left border of a word differently from n-grams in the middle of words, assign a value here. Affixes (prefixes and suffixes) often distinguish the specific features of languages. Giving a value greater than 1.0 weights these n-grams higher. A 2.0 weights them double. Defaults to 1.0, which means don't use this feature.- Parameters:
prefixFactor
- 0.0 to 10.0, a suggested value is 1.5
-
suffixFactor
Defaults to 1.0, which means don't use this feature.- Parameters:
suffixFactor
- 0.0 to 10.0, a suggested value is 2.0- See Also:
-
probabilityThreshold
LanguageDetector.getProbabilities(java.lang.CharSequence)
does not return languages with less probability than this. The default currently is 0.1 (the old hardcoded value), but don't rely on it, if you need to be sure then set one. -
minimalConfidence
LanguageDetector.detect(java.lang.CharSequence)
returns a language if the best detected language has at least this probability. The default currently is 0.9999d, but don't rely on it, if you need to be sure then set one. -
languagePriorities
public LanguageDetectorBuilder languagePriorities(@Nullable @Nullable Map<LdLocale, Double> langWeightingMap) TODO document exactly. Also explain how it influences the results. Maybe check for unsupported languages at some point, or not, but document whether it does throw or ignore. String key = language, Double value = priority (probably 0-1). -
withProfile
public LanguageDetectorBuilder withProfile(LanguageProfile languageProfile) throws IllegalStateException - Throws:
IllegalStateException
- if a profile for the same language was added already (must be a userland bug).
-
withProfiles
public LanguageDetectorBuilder withProfiles(Iterable<LanguageProfile> languageProfiles) throws IllegalStateException - Throws:
IllegalStateException
- if a profile for the same language was added already (must be a userland bug).
-
build
- Throws:
IllegalStateException
- if no LanguageProfile wasadded
.
-