Class CharsetInfoManager

java.lang.Object
ghidra.util.charset.CharsetInfoManager

public class CharsetInfoManager extends Object
Maintains a list of charsets and info about each charset. More common charsets are ordered toward the beginning of the list.

Created instances are immutable, but the "INSTANCE" singleton can be replaced by a new value when reinitializeWithUserDefinedCharsets() is called. (This is done to avoid reading the user config file and causing slow downs during certain stages of the startup)

  • Field Details

  • Method Details

    • getInstance

      public static CharsetInfoManager getInstance()
      Get the global singleton instance of this CharsetInfoManager.

      This singleton will only have generic information until reinitializeWithUserDefinedCharsets() is called.

      Returns:
      global singleton instance
    • isBOMCharset

      public static boolean isBOMCharset(String charsetName)
      Returns true if the specified charset needs additional care for handling byte-order-mark byte values (eg. UTF-16/32). If the charset is a LE/BE variant, no extra care is needed..
      Parameters:
      charsetName - name of charset
      Returns:
      true if the specified charset needs additional care for handling byte-order-mark byte values (eg. UTF-16/32). If the charset is a LE/BE variant, no extra care is needed.
    • getCharsetNames

      public List<String> getCharsetNames()
      Returns List of names of current configured charsets.
      Returns:
      List of names of current configured charsets
    • getCharsets

      public List<CharsetInfo> getCharsets()
      Returns list of all available charsets.
      Returns:
      list of all available charsets
    • getCharsetCharSize

      public int getCharsetCharSize(String charsetName)
      Returns the number of bytes that the specified charset needs to specify a character.
      Parameters:
      charsetName - charset name
      Returns:
      number of bytes in a character, ie. 1, 2, 4, etc, defaults to 1 if charset is unknown or not specified in config file.
    • getCharsetNamesWithCharSize

      public List<String> getCharsetNamesWithCharSize(int size)
      Returns list of Charsets that encode with the number of bytes specified.
      Parameters:
      size - the number of bytes for the Charset encoding.
      Returns:
      Charsets that encode one byte characters.
    • get

      public CharsetInfo get(Charset cs)
      Returns charset info object that represents the specified charset.
      Parameters:
      cs - charset
      Returns:
      charset info object that represents the specified charset
    • get

      public CharsetInfo get(String name)
      Returns charset info object that represents the specified charset.
      Parameters:
      name - charset name
      Returns:
      charset info object that represents the specified charset
    • get

      public CharsetInfo get(String name, Charset defaultCS)
      Returns charset info object that represents the specified charset, and if not found, returning the defaultCS value.
      Parameters:
      name - charset name
      defaultCS - default value to return if not found
      Returns:
      charset info object that represents the specified charset, and if not found, returning the defaultCS value
    • getMostImplementedScripts

      public List<Character.UnicodeScript> getMostImplementedScripts()
      Returns a hopefully short list of non-LATIN UnicodeScripts that are supported by a charset that is present in this jvm. (ignoring any charsets that support all scripts). This list of scripts can be useful when presenting the user with a list of scripts or things related to a script. Typically the list will contain: ARABIC, BOPOMOFO, CYRILLIC, DEVANAGARI, HANGUL, HAN, HEBREW, HIRAGANA, KATAKANA, THAI .
      Returns:
      a hopefully short list of non-LATIN UnicodeScripts that are supported by a charset that is present in this jvm. (ignoring any charsets that support all scripts). This list of scripts can be useful when presenting the user with a list of scripts or things related to a script. Typically the list will contain: ARABIC, BOPOMOFO, CYRILLIC, DEVANAGARI, HANGUL, HAN, HEBREW, HIRAGANA, KATAKANA, THAI
    • getStandardCharsetNames

      public static List<String> getStandardCharsetNames()
    • reinitializeWithUserDefinedCharsets

      public static void reinitializeWithUserDefinedCharsets()
      Replaces the current singleton with a new singleton that has been initialized with the optional information found in the charset_info.json file.
    • getConfigFileLocation

      public static ResourceFile getConfigFileLocation()
      Returns filename of the config file.
      Returns:
      filename of the config file