_title: Strings _section: Strings Tra la la _subsection: Bytes32String @ A string in Solidity is length prefixed with its 256-bit (32 byte) length, which means that even short strings require 2 words (64 bytes) of storage. In many cases, we deal with short strings, so instead of prefixing the string with its length, we can null-terminate it and fit it in a single word (32 bytes). Since we need only a single byte for the null termination, we can store strings up to 31 bytes long in a word. _note: Note Strings that are 31 __//bytes//__ long may contain fewer than 31 __//characters//__, since UTF-8 requires multiple bytes to encode international characters. _property: utils.parseBytes32String(aBytesLike) => string @ @TS Returns the decoded string represented by the ``Bytes32`` encoded data. _property: utils.formatBytes32String(text) => string @ @TS Returns a ``bytes32`` string representation of //text//. If the length of //text// exceeds 31 bytes, it will throw an error. _subsection: UTF-8 Strings @ _property: utils.toUtf8Bytes(text [ , form = current ] ) => Uint8Array @ @TS Returns the UTF-8 bytes of //text//, optionally normalizing it using the [[unicode-normalization-form]] //form//. _property: utils.toUtf8CodePoints(aBytesLike [ , form = current ] ) => Array @ @TS Returns the Array of codepoints of //aBytesLike//, optionally normalizing it using the [[unicode-normalization-form]] //form//. **Note:** This function correctly splits each user-perceived character into its codepoint, accounting for surrogate pairs. This should not be confused with ``string.split("")``, which destroys surrogate pairs, spliting between each UTF-16 codeunit instead. _property: utils.toUtf8String(aBytesLike [ , ignoreErrors = false ] ) => string @ @TS Returns the string represented by the UTF-8 bytes of //aBytesLike//. This will throw an error for invalid surrogates, overlong sequences or other UTF-8 issues, unless //ignoreErrors// is specified. _heading: UnicodeNormalizationForm @ There are several [commonly used forms](https://en.wikipedia.org/wiki/Unicode_equivalence) when normalizing UTF-8 data, which allow strings to be compared or hashed in a stable way. _property: utils.UnicodeNormalizationForm.current Maintain the current normalization form. _property: utils.UnicodeNormalizationForm.NFC The Composed Normalization Form. This form uses single codepoints which represent the fully composed character. For example, the **é** is a single codepoint, ``0x00e9``. _property: utils.UnicodeNormalizationForm.NFD The Decomposed Normalization Form. This form uses multiple codepoints (when necessary) to compose a character. For example, the **é** is made up of two codepoints, ``"0x0065"`` (which is the letter ``"e"``) and ``"0x0301"`` which is a special diacritic UTF-8 codepoint which indicates the previous character should have an acute accent. _property: utils.UnicodeNormalizationForm.NFKC The Composed Normalization Form with Canonical Equivalence. The Canonical representation folds characters which have the same syntactic representation but different semantic meaning. For example, the Roman Numeral **I**, which has a UTF-8 codepoint ``"0x2160"``, is folded into the capital letter I, ``"0x0049"``. _property: utils.UnicodeNormalizationForm.NFKD The Decomposed Normalization Form with Canonical Equivalence. See NFKC for more an example. _note: Note Only certain specified characters are folded in Canonical Equivalence, and thus it should **not** be considered a method to acheive //any// level of security from [homoglyph attacks](https://en.wikipedia.org/wiki/IDN_homograph_attack).