Class CESU8Encoding

All Implemented Interfaces:
Cloneable

public final class CESU8Encoding extends UnicodeEncoding
  • Field Details

    • USE_INVALID_CODE_SCHEME

      static final boolean USE_INVALID_CODE_SCHEME
      See Also:
    • INVALID_CODE_FE

      private static final int INVALID_CODE_FE
      See Also:
    • INVALID_CODE_FF

      private static final int INVALID_CODE_FF
      See Also:
    • VALID_CODE_LIMIT

      private static final int VALID_CODE_LIMIT
      See Also:
    • CESU8EncLen

      private static final int[] CESU8EncLen
    • CESU8Trans

      static final int[][] CESU8Trans
    • INSTANCE

      public static final CESU8Encoding INSTANCE
  • Constructor Details

    • CESU8Encoding

      protected CESU8Encoding()
  • Method Details

    • getCharsetName

      public String getCharsetName()
      Description copied from class: Encoding
      The name of the equivalent Java Charset for this encoding. Defaults to the name of the encoding. Subclasses can override this to provide a different name.
      Overrides:
      getCharsetName in class UnicodeEncoding
      Returns:
      the name of the equivalent Java Charset for this encoding
    • length

      public int length(byte[] bytes, int p, int end)
      Description copied from class: Encoding
      Returns character length given stream, character position and stream end returns 1 for singlebyte encodings or performs sanity validations for multibyte ones and returns the character length, missing characters in the stream otherwise
      Specified by:
      length in class Encoding
      Returns:
      0 Never > 0 Valid character, length returned -1 Illegal/malformed character < -1 (-1 - n) Number of missing bytes for character in p...end range Oniguruma equivalent: mbc_enc_len modified for 1.9 purposes,
    • lengthForOneUptoSix

      private int lengthForOneUptoSix(byte[] bytes, int p, int end, int b, int s)
    • isNewLine

      public boolean isNewLine(byte[] bytes, int p, int end)
      Description copied from class: AbstractEncoding
      onigenc_is_mbc_newline_0x0a / used also by multibyte encodings
      Overrides:
      isNewLine in class AbstractEncoding
    • codeToMbcLength

      public int codeToMbcLength(int code)
      Description copied from class: Encoding
      Returns character length given a code point Oniguruma equivalent: code_to_mbclen
      Specified by:
      codeToMbcLength in class Encoding
    • mbcToCode

      public int mbcToCode(byte[] bytes, int p, int end)
      Description copied from class: Encoding
      Returns code point for a character Oniguruma equivalent: mbc_to_code
      Specified by:
      mbcToCode in class Encoding
    • trailS

      static byte trailS(int code, int shift)
    • trail0

      static byte trail0(int code)
    • trailS

      static byte trailS(long code, int shift)
    • trail0

      static byte trail0(long code)
    • codeToMbc

      public int codeToMbc(int code, byte[] bytes, int p)
      Description copied from class: Encoding
      Extracts code point into it's multibyte representation
      Specified by:
      codeToMbc in class Encoding
      Returns:
      character length for the given code point Oniguruma equivalent: code_to_mbc
    • mbcCaseFold

      public int mbcCaseFold(int flag, byte[] bytes, IntHolder pp, int end, byte[] fold)
      Description copied from class: AbstractEncoding
      onigenc_ascii_mbc_case_fold
      Overrides:
      mbcCaseFold in class UnicodeEncoding
      Parameters:
      flag - case fold flag
      pp - an IntHolder that points at character head
      fold - a buffer where to extract case folded character Oniguruma equivalent: mbc_case_fold
    • ctypeCodeRange

      public int[] ctypeCodeRange(int ctype, IntHolder sbOut)
      Description copied from class: Encoding
      Returns code range for a given character type Oniguruma equivalent: get_ctype_code_range
      Specified by:
      ctypeCodeRange in class Encoding
    • utf8IsLead

      private static boolean utf8IsLead(int c)
    • leftAdjustCharHead

      public int leftAdjustCharHead(byte[] bytes, int p, int s, int end)
      Description copied from class: Encoding
      Seeks the previous character head in a stream Oniguruma equivalent: left_adjust_char_head
      Specified by:
      leftAdjustCharHead in class Encoding
      Parameters:
      bytes - byte stream
      p - position
      s - stop
      end - end
    • isReverseMatchAllowed

      public boolean isReverseMatchAllowed(byte[] bytes, int p, int end)
      Description copied from class: Encoding
      Returns true if it's safe to use reversal Boyer-Moore search fail fast algorithm Oniguruma equivalent: is_allowed_reverse_match
      Specified by:
      isReverseMatchAllowed in class Encoding