Class DictionaryMetadata

java.lang.Object
morfologik.stemming.DictionaryMetadata

public final class DictionaryMetadata extends Object
Description of attributes, their types and default values.
  • Field Details

    • DEFAULT_ATTRIBUTES

      private static Map<DictionaryAttribute,String> DEFAULT_ATTRIBUTES
      Default attribute values.
    • REQUIRED_ATTRIBUTES

      private static EnumSet<DictionaryAttribute> REQUIRED_ATTRIBUTES
      Required attributes.
    • separator

      private byte separator
      A separator character between fields (stem, lemma, form). The character must be within byte range (FSA uses bytes internally).
    • separatorChar

      private char separatorChar
    • encoding

      private String encoding
      Encoding used for converting bytes to characters and vice versa.
    • charset

      private Charset charset
    • locale

      private Locale locale
    • replacementPairs

      private LinkedHashMap<String,List<String>> replacementPairs
      Replacement pairs for non-obvious candidate search in a speller dictionary.
    • inputConversion

      private LinkedHashMap<String,String> inputConversion
      Conversion pairs for input conversion, for example to replace ligatures.
    • outputConversion

      private LinkedHashMap<String,String> outputConversion
      Conversion pairs for output conversion, for example to replace ligatures.
    • equivalentChars

      private LinkedHashMap<Character,List<Character>> equivalentChars
      Equivalent characters (treated similarly as equivalent chars with and without diacritics). For example, Polish ł can be specified as equivalent to l. This implements a feature similar to hunspell MAP in the affix file.
    • attributes

      private final EnumMap<DictionaryAttribute,String> attributes
      All attributes.
    • boolAttributes

      private final EnumMap<DictionaryAttribute,Boolean> boolAttributes
      All "enabled" boolean attributes.
    • encoderType

      private EncoderType encoderType
      Sequence encoder.
    • METADATA_FILE_EXTENSION

      public static final String METADATA_FILE_EXTENSION
      Expected metadata file extension.
      See Also:
  • Constructor Details

  • Method Details

    • getAttributes

      public Map<DictionaryAttribute,String> getAttributes()
      Returns:
      Return all metadata attributes.
    • getEncoding

      public String getEncoding()
    • getSeparator

      public byte getSeparator()
    • getLocale

      public Locale getLocale()
    • getInputConversionPairs

      public LinkedHashMap<String,String> getInputConversionPairs()
    • getOutputConversionPairs

      public LinkedHashMap<String,String> getOutputConversionPairs()
    • getReplacementPairs

      public LinkedHashMap<String,List<String>> getReplacementPairs()
    • getEquivalentChars

      public LinkedHashMap<Character,List<Character>> getEquivalentChars()
    • isFrequencyIncluded

      public boolean isFrequencyIncluded()
    • isIgnoringPunctuation

      public boolean isIgnoringPunctuation()
    • isIgnoringNumbers

      public boolean isIgnoringNumbers()
    • isIgnoringCamelCase

      public boolean isIgnoringCamelCase()
    • isIgnoringAllUppercase

      public boolean isIgnoringAllUppercase()
    • isIgnoringDiacritics

      public boolean isIgnoringDiacritics()
    • isConvertingCase

      public boolean isConvertingCase()
    • isSupportingRunOnWords

      public boolean isSupportingRunOnWords()
    • getDecoder

      public CharsetDecoder getDecoder()
      Returns:
      Returns a new CharsetDecoder for the encoding.
    • getEncoder

      public CharsetEncoder getEncoder()
      Returns:
      Returns a new CharsetEncoder for the encoding.
    • getSequenceEncoderType

      public EncoderType getSequenceEncoderType()
      Returns:
      Return sequence encoder type.
    • getSeparatorAsChar

      public char getSeparatorAsChar()
      Returns:
      Returns the separator byte converted to a single char.
      Throws:
      RuntimeException - if this conversion is for some reason impossible (the byte is a surrogate pair, FSA's encoding is not available).
    • builder

      public static DictionaryMetadataBuilder builder()
      Returns:
      A shortcut returning DictionaryMetadataBuilder.
    • getExpectedMetadataFileName

      public static String getExpectedMetadataFileName(String dictionaryFile)
      Returns the expected name of the metadata file, based on the name of the dictionary file. The expected name is resolved by truncating any file extension of name and appending METADATA_FILE_EXTENSION.
      Parameters:
      dictionaryFile - The name of the dictionary (*.dict) file.
      Returns:
      Returns the expected name of the metadata file.
    • getExpectedMetadataLocation

      public static Path getExpectedMetadataLocation(Path dictionary)
      Parameters:
      dictionary - The location of the dictionary file.
      Returns:
      Returns the expected location of a metadata file.
    • read

      public static DictionaryMetadata read(InputStream metadataStream) throws IOException
      Read dictionary metadata from a property file (stream).
      Parameters:
      metadataStream - The stream with metadata.
      Returns:
      Returns DictionaryMetadata read from a the stream (property file).
      Throws:
      IOException - Thrown if an I/O exception occurs.
    • write

      public void write(Writer writer) throws IOException
      Write dictionary attributes (metadata).
      Parameters:
      writer - The writer to write to.
      Throws:
      IOException - Thrown when an I/O error occurs.