Class FSA5

All Implemented Interfaces:
Iterable<ByteBuffer>

public final class FSA5 extends FSA
FSA binary format implementation for version 5.

Version 5 indicates the dictionary was built with these flags: FSAFlags.FLEXIBLE, FSAFlags.STOPBIT and FSAFlags.NEXTBIT. The internal representation of the FSA must therefore follow this description (please note this format describes only a single transition (arc), not the entire dictionary file).

 ---- this node header present only if automaton was compiled with NUMBERS option.
 Byte
        +-+-+-+-+-+-+-+-+\
      0 | | | | | | | | | \  LSB
        +-+-+-+-+-+-+-+-+  +
      1 | | | | | | | | |  |      number of strings recognized
        +-+-+-+-+-+-+-+-+  +----- by the automaton starting
        : : : : : : : : :  |      from this node.
        +-+-+-+-+-+-+-+-+  +
  ctl-1 | | | | | | | | | /  MSB
        +-+-+-+-+-+-+-+-+/
        
 ---- remaining part of the node
 
 Byte
       +-+-+-+-+-+-+-+-+\
     0 | | | | | | | | | +------ label
       +-+-+-+-+-+-+-+-+/
 
                  +------------- node pointed to is next
                  | +----------- the last arc of the node
                  | | +--------- the arc is final
                  | | |
             +-----------+
             |    | | |  |
         ___+___  | | |  |
        /       \ | | |  |
       MSB           LSB |
        7 6 5 4 3 2 1 0  |
       +-+-+-+-+-+-+-+-+ |
     1 | | | | | | | | | \ \
       +-+-+-+-+-+-+-+-+  \ \  LSB
       +-+-+-+-+-+-+-+-+     +
     2 | | | | | | | | |     |
       +-+-+-+-+-+-+-+-+     |
     3 | | | | | | | | |     +----- target node address (in bytes)
       +-+-+-+-+-+-+-+-+     |      (not present except for the byte
       : : : : : : : : :     |       with flags if the node pointed to
       +-+-+-+-+-+-+-+-+     +       is next)
   gtl | | | | | | | | |    /  MSB
       +-+-+-+-+-+-+-+-+   /
 gtl+1                           (gtl = gotoLength)
 
  • Field Details

    • DEFAULT_FILLER

      public static final byte DEFAULT_FILLER
      Default filler byte.
      See Also:
    • DEFAULT_ANNOTATION

      public static final byte DEFAULT_ANNOTATION
      Default annotation byte.
      See Also:
    • VERSION

      public static final byte VERSION
      Automaton version as in the file header.
      See Also:
    • BIT_FINAL_ARC

      public static final int BIT_FINAL_ARC
      Bit indicating that an arc corresponds to the last character of a sequence available when building the automaton.
      See Also:
    • BIT_LAST_ARC

      public static final int BIT_LAST_ARC
      Bit indicating that an arc is the last one of the node's list and the following one belongs to another node.
      See Also:
    • BIT_TARGET_NEXT

      public static final int BIT_TARGET_NEXT
      Bit indicating that the target node of this arc follows it in the compressed automaton structure (no goto field).
      See Also:
    • ADDRESS_OFFSET

      public static final int ADDRESS_OFFSET
      An offset in the arc structure, where the address and flags field begins. In version 5 of FSA automata, this value is constant (1, skip label).
      See Also:
    • arcs

      public final byte[] arcs
      An array of bytes with the internal representation of the automaton. Please see the documentation of this class for more information on how this structure is organized.
    • nodeDataLength

      public final int nodeDataLength
      The length of the node header structure (if the automaton was compiled with NUMBERS option). Otherwise zero.
    • flags

      private Set<FSAFlags> flags
      Flags for this automaton version.
    • gtl

      public final int gtl
      Number of bytes each address takes in full, expanded form (goto length).
    • filler

      public final byte filler
      Filler character.
    • annotation

      public final byte annotation
      Annotation character.
  • Constructor Details

  • Method Details

    • getRootNode

      public int getRootNode()
      Returns the start node of this automaton.
      Specified by:
      getRootNode in class FSA
      Returns:
      Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).
    • getFirstArc

      public final int getFirstArc(int node)
      Specified by:
      getFirstArc in class FSA
      Parameters:
      node - Identifier of the node.
      Returns:
      Returns the identifier of the first arc leaving node or 0 if the node has no outgoing arcs.
    • getNextArc

      public final int getNextArc(int arc)
      Specified by:
      getNextArc in class FSA
      Parameters:
      arc - The arc's identifier.
      Returns:
      Returns the identifier of the next arc after arc and leaving node. Zero is returned if no more arcs are available for the node.
    • getArc

      public int getArc(int node, byte label)
      Specified by:
      getArc in class FSA
      Parameters:
      node - Identifier of the node.
      label - The arc's label.
      Returns:
      Returns the identifier of an arc leaving node and labeled with label. An identifier equal to 0 means the node has no outgoing arc labeled label.
    • getEndNode

      public int getEndNode(int arc)
      Specified by:
      getEndNode in class FSA
      Parameters:
      arc - The arc's identifier.
      Returns:
      Return the end node pointed to by a given arc. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.
    • getArcLabel

      public byte getArcLabel(int arc)
      Specified by:
      getArcLabel in class FSA
      Parameters:
      arc - The arc's identifier.
      Returns:
      Return the label associated with a given arc.
    • isArcFinal

      public boolean isArcFinal(int arc)
      Specified by:
      isArcFinal in class FSA
      Parameters:
      arc - The arc's identifier.
      Returns:
      Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.
    • isArcTerminal

      public boolean isArcTerminal(int arc)
      Specified by:
      isArcTerminal in class FSA
      Parameters:
      arc - The arc's identifier.
      Returns:
      Returns true if this arc does not have a terminating node (@link FSA.getEndNode(int) will throw an exception). Implies FSA.isArcFinal(int).
    • getRightLanguageCount

      public int getRightLanguageCount(int node)
      Returns the number encoded at the given node. The number equals the count of the set of suffixes reachable from node (called its right language).
      Overrides:
      getRightLanguageCount in class FSA
      Parameters:
      node - Identifier of the node.
      Returns:
      Returns the number of sequences reachable from the given state if the automaton was compiled with FSAFlags.NUMBERS. The size of the right language of the state, in other words.
    • getFlags

      public Set<FSAFlags> getFlags()

      For this automaton version, an additional FSAFlags.NUMBERS flag may be set to indicate the automaton contains extra fields for each node.

      Specified by:
      getFlags in class FSA
      Returns:
      Returns a set of flags for this FSA instance.
    • isArcLast

      public boolean isArcLast(int arc)
      Returns true if this arc has NEXT bit set.
      Parameters:
      arc - The node's arc identifier.
      Returns:
      Returns true if the argument is the last arc of a node.
      See Also:
    • isNextSet

      public boolean isNextSet(int arc)
      Parameters:
      arc - The node's arc identifier.
      Returns:
      Returns true if BIT_TARGET_NEXT is set for this arc.
      See Also:
    • decodeFromBytes

      static final int decodeFromBytes(byte[] arcs, int start, int n)
      Returns an n-byte integer encoded in byte-packed representation.
    • getDestinationNodeOffset

      final int getDestinationNodeOffset(int arc)
      Returns the address of the node pointed to by this arc.
    • skipArc

      private int skipArc(int offset)
      Read the arc's layout and skip as many bytes, as needed.