History

Richard Moore 0333a76f4f Added initial flatworm documentation stubs.		2019-08-21 01:53:47 -04:00
..
index.html	Added initial flatworm documentation stubs.	2019-08-21 01:53:47 -04:00
README.md	Added initial flatworm documentation stubs.	2019-08-21 01:53:47 -04:00

README.md

Strings

Tra la la

Bytes32String

A string in Solidity is length prefixed with its 256-bit (32 byte) length, which means that even short strings require 2 words (64 bytes) of storage.

In many cases, we deal with short strings, so instead of prefixing the string with its length, we can null-terminate it and fit it in a single word (32 bytes). Since we need only a single byte for the null termination, we can store strings up to 31 bytes long in a word.

Note:

Strings that are 31 bytes long may contain fewer than 31 characters, since UTF-8 requires multiple bytes to encode international characters.

utils . parseBytes32String ( aBytesLike ) => string

Returns the decoded string represented by the Bytes32 encoded data.

utils . formatBytes32String ( text ) => string

Returns a bytes32 string representation of text. If the length of text exceeds 31 bytes, it will throw an error.

UTF-8 Strings

utils . toUtf8Bytes ( text [ , form=current ] ) => Uint8Array

Returns the UTF-8 bytes of text, optionally normalizing it using the UnicodeNormalizationForm form.

utils . toUtf8CodePoints ( aBytesLike [ , form=current ] ) => Array< number >

Returns the Array of codepoints of aBytesLike, optionally normalizing it using the UnicodeNormalizationForm form.

Note: This function correctly splits each user-perceived character into its codepoint, accounting for surrogate pairs. This should not be confused with string.split(""), which destroys surrogate pairs, spliting between each UTF-16 codeunit instead.

utils . toUtf8String ( aBytesLike [ , ignoreErrors=false ] ) => string

Returns the string represented by the UTF-8 bytes of aBytesLike. This will throw an error for invalid surrogates, overlong sequences or other UTF-8 issues, unless ignoreErrors is specified.

UnicodeNormalizationForm

There are several commonly used forms when normalizing UTF-8 data, which allow strings to be compared or hashed in a stable way.

utils . UnicodeNormalizationForm . current

Maintain the current normalization form.

utils . UnicodeNormalizationForm . NFC

The Composed Normalization Form. This form uses single codepoints which represent the fully composed character.

For example, the é is a single codepoint, 0x00e9.

utils . UnicodeNormalizationForm . NFD

The Decomposed Normalization Form. This form uses multiple codepoints (when necessary) to compose a character.

For example, the é is made up of two codepoints, "0x0065" (which is the letter "e") and "0x0301" which is a special diacritic UTF-8 codepoint which indicates the previous character should have an acute accent.

utils . UnicodeNormalizationForm . NFKC

The Composed Normalization Form with Canonical Equivalence. The Canonical representation folds characters which have the same syntactic representation but different semantic meaning.

For example, the Roman Numeral I, which has a UTF-8 codepoint "0x2160", is folded into the capital letter I, "0x0049".

utils . UnicodeNormalizationForm . NFKD

The Decomposed Normalization Form with Canonical Equivalence. See NFKC for more an example.

Note:

Only certain specified characters are folded in Canonical Equivalence, and thus it should not be considered a method to acheive any level of security from homoglyph attacks.

Content Hash: f6a51816edca0ae4b74c16012629f26108f16204ff9d3aa3879fd44adb8d0d7f