Package ghidra.util.charset
Class CharsetInfoManager
java.lang.Object
ghidra.util.charset.CharsetInfoManager
Maintains a list of charsets and info about each charset. More common charsets are ordered
toward the beginning of the list.
Created instances are immutable, but the "INSTANCE" singleton can be replaced by a new value
when reinitializeWithUserDefinedCharsets() is called. (This is done to avoid reading
the user config file and causing slow downs during certain stages of the startup)
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classClass to represent the charsetinfo json configuration file. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic Comparator<CharsetInfo> Comparator that ignores charset name "x-" prefixesstatic Comparator<String> Comparator that ignores charset name "x-" prefixesstatic final Stringstatic final Stringstatic final Stringstatic final String -
Method Summary
Modifier and TypeMethodDescriptionReturns charset info object that represents the specified charset.Returns charset info object that represents the specified charset, and if not found, returning the defaultCS value.Returns charset info object that represents the specified charset.intgetCharsetCharSize(String charsetName) Returns the number of bytes that the specified charset needs to specify a character.Returns List of names of current configured charsets.getCharsetNamesWithCharSize(int size) Returns list ofCharsets that encode with the number of bytes specified.Returns list of all available charsets.static ResourceFileReturns filename of the config file.static CharsetInfoManagerGet the global singleton instance of thisCharsetInfoManager.Returns a hopefully short list of non-LATIN UnicodeScripts that are supported by a charset that is present in this jvm. (ignoring any charsets that support all scripts). This list of scripts can be useful when presenting the user with a list of scripts or things related to a script. Typically the list will contain: ARABIC, BOPOMOFO, CYRILLIC, DEVANAGARI, HANGUL, HAN, HEBREW, HIRAGANA, KATAKANA, THAI.static booleanisBOMCharset(String charsetName) Returns true if the specified charset needs additional care for handling byte-order-mark byte values (eg. UTF-16/32). If the charset is a LE/BE variant, no extra care is needed..static voidReplaces the current singleton with a new singleton that has been initialized with the optional information found in the charset_info.json file.
-
Field Details
-
UTF8
- See Also:
-
UTF16
- See Also:
-
UTF32
- See Also:
-
USASCII
- See Also:
-
CHARSET_NAME_COMP
Comparator that ignores charset name "x-" prefixes -
CHARSET_COMP
Comparator that ignores charset name "x-" prefixes
-
-
Method Details
-
getInstance
Get the global singleton instance of thisCharsetInfoManager.This singleton will only have generic information until
reinitializeWithUserDefinedCharsets()is called.- Returns:
- global singleton instance
-
isBOMCharset
Returns true if the specified charset needs additional care for handling byte-order-mark byte values (eg. UTF-16/32). If the charset is a LE/BE variant, no extra care is needed..- Parameters:
charsetName- name of charset- Returns:
- true if the specified charset needs additional care for handling byte-order-mark byte values (eg. UTF-16/32). If the charset is a LE/BE variant, no extra care is needed.
-
getCharsetNames
Returns List of names of current configured charsets.- Returns:
- List of names of current configured charsets
-
getCharsets
Returns list of all available charsets.- Returns:
- list of all available charsets
-
getCharsetCharSize
Returns the number of bytes that the specified charset needs to specify a character.- Parameters:
charsetName- charset name- Returns:
- number of bytes in a character, ie. 1, 2, 4, etc, defaults to 1 if charset is unknown or not specified in config file.
-
getCharsetNamesWithCharSize
Returns list ofCharsets that encode with the number of bytes specified.- Parameters:
size- the number of bytes for theCharsetencoding.- Returns:
- Charsets that encode one byte characters.
-
get
Returns charset info object that represents the specified charset.- Parameters:
cs- charset- Returns:
- charset info object that represents the specified charset
-
get
Returns charset info object that represents the specified charset.- Parameters:
name- charset name- Returns:
- charset info object that represents the specified charset
-
get
Returns charset info object that represents the specified charset, and if not found, returning the defaultCS value.- Parameters:
name- charset namedefaultCS- default value to return if not found- Returns:
- charset info object that represents the specified charset, and if not found, returning the defaultCS value
-
getMostImplementedScripts
Returns a hopefully short list of non-LATIN UnicodeScripts that are supported by a charset that is present in this jvm. (ignoring any charsets that support all scripts). This list of scripts can be useful when presenting the user with a list of scripts or things related to a script. Typically the list will contain: ARABIC, BOPOMOFO, CYRILLIC, DEVANAGARI, HANGUL, HAN, HEBREW, HIRAGANA, KATAKANA, THAI .- Returns:
- a hopefully short list of non-LATIN UnicodeScripts that are supported by a charset that is present in this jvm. (ignoring any charsets that support all scripts). This list of scripts can be useful when presenting the user with a list of scripts or things related to a script. Typically the list will contain: ARABIC, BOPOMOFO, CYRILLIC, DEVANAGARI, HANGUL, HAN, HEBREW, HIRAGANA, KATAKANA, THAI
-
getStandardCharsetNames
-
reinitializeWithUserDefinedCharsets
public static void reinitializeWithUserDefinedCharsets()Replaces the current singleton with a new singleton that has been initialized with the optional information found in the charset_info.json file. -
getConfigFileLocation
Returns filename of the config file.- Returns:
- filename of the config file
-