ethers.js/docs/v5/api/utils/strings/index.html

<!DOCTYPE html>
<html class="paged">
  <head>
    <title>Strings</title>
    <link rel="stylesheet" type="text/css" href="/v5/static/style.css">
  </head>
  <body>
    <div class="sidebar">
      <div class="header">
        <div class="logo"><a href="/v5/"><div class="image"></div><div class="name">ethers</div><div class="version">v5.0</div></a></div>
      </div>
      <div class="toc"><div>
        <div class="link title"><a href="/v5/">Documentation</a></div><div class="base show link depth-1"><a href="/v5/getting-started/">Getting Started</a></div><div class="base show link depth-1"><a href="/v5/concepts/">Ethereum Basics</a></div><div class="hide link depth-2"><a href="/v5/concepts/events/">Events</a></div><div class="hide link depth-2"><a href="/v5/concepts/gas/">Gas</a></div><div class="hide link depth-2"><a href="/v5/concepts/security/">Security</a></div><div class="base ancestor show link depth-1"><a href="/v5/api/">Application Programming Interface</a></div><div class="show link depth-2"><a href="/v5/api/contract/">Contract Interaction</a></div><div class="hide link depth-3"><a href="/v5/api/contract/contract/">Contract</a></div><div class="hide link depth-3"><a href="/v5/api/contract/contract-factory/">ContractFactory</a></div><div class="hide link depth-3"><a href="/v5/api/contract/example/">Example: ERC-20 Contract</a></div><div class="show link depth-2"><a href="/v5/api/signer/">Signers</a></div><div class="show link depth-2"><a href="/v5/api/providers/">Providers</a></div><div class="hide link depth-3"><a href="/v5/api/providers/provider/">Provider</a></div><div class="hide link depth-3"><a href="/v5/api/providers/jsonrpc-provider/">JsonRpcProvider</a></div><div class="hide link depth-3"><a href="/v5/api/providers/api-providers/">API Providers</a></div><div class="hide link depth-3"><a href="/v5/api/providers/other/">Other Providers</a></div><div class="hide link depth-3"><a href="/v5/api/providers/types/">Types</a></div><div class="ancestor show link depth-2"><a href="/v5/api/utils/">Utilities</a></div><div class="show link depth-3"><a href="/v5/api/utils/abi/">Application Binary Interface</a></div><div class="hide link depth-4"><a href="/v5/api/utils/abi/coder/">AbiCoder</a></div><div class="hide link depth-4"><a href="/v5/api/utils/abi/formats/">ABI Formats</a></div><div class="hide link depth-4"><a href="/v5/api/utils/abi/fragments/">Fragments</a></div><div class="hide link depth-4"><a href="/v5/api/utils/abi/interface/">Interface</a></div><div class="show link depth-3"><a href="/v5/api/utils/address/">Addresses</a></div><div class="show link depth-3"><a href="/v5/api/utils/bignumber/">BigNumber</a></div><div class="show link depth-3"><a href="/v5/api/utils/bytes/">Byte Manipulation</a></div><div class="show link depth-3"><a href="/v5/api/utils/constants/">Constants</a></div><div class="show link depth-3"><a href="/v5/api/utils/display-logic/">Display Logic and Input</a></div><div class="show link depth-3"><a href="/v5/api/utils/encoding/">Encoding Utilities</a></div><div class="show link depth-3"><a href="/v5/api/utils/fixednumber/">FixedNumber</a></div><div class="show link depth-3"><a href="/v5/api/utils/hashing/">Hashing Algorithms</a></div><div class="show link depth-3"><a href="/v5/api/utils/hdnode/">HD Wallet</a></div><div class="show link depth-3"><a href="/v5/api/utils/logger/">Logging</a></div><div class="show link depth-3"><a href="/v5/api/utils/properties/">Property Utilities</a></div><div class="show link depth-3"><a href="/v5/api/utils/signing-key/">Signing Key</a></div><div class="myself ancestor ancestor show link depth-3"><a href="/v5/api/utils/strings/">Strings</a></div><div class="show link depth-3"><a href="/v5/api/utils/transactions/">Transactions</a></div><div class="show link depth-3"><a href="/v5/api/utils/web/">Web Utilities</a></div><div class="show link depth-3"><a href="/v5/api/utils/wordlists/">Wordlists</a></div><div class="show link depth-2"><a href="/v5/api/other/">Other Libraries</a></div><div class="hide link depth-3"><a href="/v5/api/other/assembly/">Assembly</a></div><div class="hide link depth-4"><a href="/v5/api/other/assembly/dialect/">Ethers ASM Dialect</a></div><div class="hide link depth-4"><a href="/v5/api/other/assembly/api/">Utilities</a></div><div class="hide link depth-4"><a href="/v5/api/other/assembly/ast/">Abstract Syntax Tree</a></div><div class="hide link depth-3"><a href="/v5/api/other/hardware/">Hardware Wallets</a></div><div class="sho
      </div></div>
    </div>
    <div class="content">
      <div class="breadcrumbs"><a href="/v5/">Documentation</a>&nbsp;&nbsp;&raquo;&nbsp;&nbsp;<a href="/v5/api/">API</a>&nbsp;&nbsp;&raquo;&nbsp;&nbsp;<a href="/v5/api/utils/">Utilities</a>&nbsp;&nbsp;&raquo;&nbsp;&nbsp;<span class="current">Strings</span></div>

<a name="strings"></a><a name="strings"></a><h1 class="show-anchors"><div>Strings<div class="anchors"><a class="self" href="/v5/api/utils/strings/#strings"></a></div></div></h1><p>Tra la la</p>

<a name="Bytes32String"></a><a name="strings--Bytes32String"></a><h2 class="show-anchors"><div>Bytes32String<div class="anchors"><a class="self" href="/v5/api/utils/strings/#Bytes32String"></a></div></div></h2><p>A string in Solidity is length prefixed with its 256-bit (32 byte) length, which means that even short strings require 2 words (64 bytes) of storage.</p>

<p>In many cases, we deal with short strings, so instead of prefixing the string with its length, we can null-terminate it and fit it in a single word (32 bytes). Since we need only a single byte for the null termination, we can store strings up to 31 bytes long in a word.</p>

<div class="definition container-box note"><div class="term">Note</div><div class="body"><p>Strings that are 31 <u><i>bytes</i></u> long may contain fewer than 31 <u><i>characters</i></u>, since UTF-8 requires multiple bytes to encode international characters.</p>

</div></div><a name="utils-parseBytes32"></a><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="method">parseBytes32String</span><span class="symbol">(</span> <span class="param">aBytesLike</span> <span class="symbol">)</span> <span class="arrow">&rArr;</span> <span class="returns">string</span><div class="anchors"><a class="self" href="/v5/api/utils/strings/#utils-parseBytes32"></a><a class="source" href="https://github.com/ethers-io/ethers.js/blob/master/packages/strings/src.ts/bytes32.ts#L21">source</a></div></div><div class="body"><p>Returns the decoded string represented by the <code class="inline">Bytes32</code> encoded data.</p>

</div></div><a name="utils-formatBytes32"></a><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="method">formatBytes32String</span><span class="symbol">(</span> <span class="param">text</span> <span class="symbol">)</span> <span class="arrow">&rArr;</span> <span class="returns">string&lt; <a href="/v5/api/utils/bytes/#DataHexString">DataHexString</a>&lt; 32 &gt; &gt;</span><div class="anchors"><a class="self" href="/v5/api/utils/strings/#utils-formatBytes32"></a><a class="source" href="https://github.com/ethers-io/ethers.js/blob/master/packages/strings/src.ts/bytes32.ts#L9">source</a></div></div><div class="body"><p>Returns a <code class="inline">bytes32</code> string representation of <i>text</i>. If the length of <i>text</i> exceeds 31 bytes, it will throw an error.</p>

</div></div><a name="strings-utf8"></a><a name="strings--strings-utf8"></a><h2 class="show-anchors"><div>UTF-8 Strings<div class="anchors"><a class="self" href="/v5/api/utils/strings/#strings-utf8"></a></div></div></h2>
<a name="utils-toUtf8Bytes"></a><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="method">toUtf8Bytes</span><span class="symbol">(</span> <span class="param">text</span> <span class="symbol">[</span> <span class="symbol">,</span> <span class="param">form</span> = <span class="param">current</span> <span class="symbol">]</span> <span class="symbol">)</span> <span class="arrow">&rArr;</span> <span class="returns">Uint8Array</span><div class="anchors"><a class="self" href="/v5/api/utils/strings/#utils-toUtf8Bytes"></a><a class="source" href="https://github.com/ethers-io/ethers.js/blob/master/packages/strings/src.ts/utf8.ts#L202">source</a></div></div><div class="body"><p>Returns the UTF-8 bytes of <i>text</i>, optionally normalizing it using the <a href="/v5/api/utils/strings/#strings--unicode-normalization-form">UnicodeNormalizationForm</a> <i>form</i>.</p>

</div></div><a name="utils-toUtf8CodePoints"></a><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="method">toUtf8CodePoints</span><span class="symbol">(</span> <span class="param">text</span> <span class="symbol">[</span> <span class="symbol">,</span> <span class="param">form</span> = <span class="param">current</span> <span class="symbol">]</span> <span class="symbol">)</span> <span class="arrow">&rArr;</span> <span class="returns">Array&lt; number &gt;</span><div class="anchors"><a class="self" href="/v5/api/utils/strings/#utils-toUtf8CodePoints"></a><a class="source" href="https://github.com/ethers-io/ethers.js/blob/master/packages/strings/src.ts/utf8.ts#L293">source</a></div></div><div class="body"><p>Returns the Array of codepoints of <i>text</i>, optionally normalized using the <a href="/v5/api/utils/strings/#strings--unicode-normalization-form">UnicodeNormalizationForm</a> <i>form</i>.</p>

</div></div><div class="definition container-box note"><div class="term">Note</div><div class="body"><p>This function correctly splits each <b>user-perceived character</b> into its codepoint, accounting for surrogate pairs. This should not be confused with <code class="inline">string.split("")</code>, which destroys surrogate pairs, spliting between each UTF-16 codeunit instead.</p>

</div></div><a name="utils-toUtf8String"></a><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="method">toUtf8String</span><span class="symbol">(</span> <span class="param">aBytesLike</span> <span class="symbol">[</span> <span class="symbol">,</span> <span class="param">onError</span> = <span class="param">error</span> <span class="symbol">]</span> <span class="symbol">)</span> <span class="arrow">&rArr;</span> <span class="returns">string</span><div class="anchors"><a class="self" href="/v5/api/utils/strings/#utils-toUtf8String"></a><a class="source" href="https://github.com/ethers-io/ethers.js/blob/master/packages/strings/src.ts/utf8.ts#L289">source</a></div></div><div class="body"><p>Returns the string represented by the UTF-8 bytes of <i>aBytesLike</i>.</p>

<p>The <i>onError</i> is a <a href="/v5/api/utils/strings/#strings--error-handling">Custom UTF-8 Error function</a> and if not specified it defaults to the <a href="/v5/api/utils/strings/#strings--Utf8Error">error</a> function, which throws an error on <b>any</b> UTF-8 error.</p>

</div></div><a name="strings--unicode-normalization-form"></a><a name="strings--strings--unicode-normalization-form"></a><h2 class="show-anchors"><div>UnicodeNormalizationForm<div class="anchors"><a class="self" href="/v5/api/utils/strings/#strings--unicode-normalization-form"></a><a class="source" href="https://github.com/ethers-io/ethers.js/blob/master/packages/strings/src.ts/utf8.ts#L11">source</a></div></div></h2><p>There are several <a href="https://en.wikipedia.org/wiki/Unicode_equivalence">commonly used forms</a> when normalizing UTF-8 data, which allow strings to be compared or hashed in a stable way.</p>

<div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">UnicodeNormalizationForm</span><span class="symbol">.</span><span class="method">current</span><div class="anchors"></div></div><div class="body"><p>Maintain the current normalization form.</p>

</div></div><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">UnicodeNormalizationForm</span><span class="symbol">.</span><span class="method">NFC</span><div class="anchors"></div></div><div class="body"><p>The Composed Normalization Form. This form uses single codepoints which represent the fully composed character.</p>

<p>For example, the <b>&eacute;</b> is a single codepoint, <code class="inline">0x00e9</code>.</p>

</div></div><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">UnicodeNormalizationForm</span><span class="symbol">.</span><span class="method">NFD</span><div class="anchors"></div></div><div class="body"><p>The Decomposed Normalization Form. This form uses multiple codepoints (when necessary) to compose a character.</p>

<p>For example, the <b>&eacute;</b> is made up of two codepoints, <code class="inline">"0x0065"</code> (which is the letter <code class="inline">"e"</code>) and <code class="inline">"0x0301"</code> which is a special diacritic UTF-8 codepoint which indicates the previous character should have an acute accent.</p>

</div></div><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">UnicodeNormalizationForm</span><span class="symbol">.</span><span class="method">NFKC</span><div class="anchors"></div></div><div class="body"><p>The Composed Normalization Form with Canonical Equivalence. The Canonical representation folds characters which have the same syntactic representation but different semantic meaning.</p>

<p>For example, the Roman Numeral <b>I</b>, which has a UTF-8 codepoint <code class="inline">"0x2160"</code>, is folded into the capital letter I, <code class="inline">"0x0049"</code>.</p>

</div></div><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">UnicodeNormalizationForm</span><span class="symbol">.</span><span class="method">NFKD</span><div class="anchors"></div></div><div class="body"><p>The Decomposed Normalization Form with Canonical Equivalence. See NFKC for more an example.</p>

</div></div><div class="definition container-box note"><div class="term">Note</div><div class="body"><p>Only certain specified characters are folded in Canonical Equivalence, and thus it should <b>not</b> be considered a method to acheive <i>any</i> level of security from <a href="https://en.wikipedia.org/wiki/IDN_homograph_attack">homoglyph attacks</a>.</p>

</div></div><a name="strings--error-handling"></a><a name="strings--strings--error-handling"></a><h2 class="show-anchors"><div>Custom UTF-8 Error Handling<div class="anchors"><a class="self" href="/v5/api/utils/strings/#strings--error-handling"></a></div></div></h2><p>When converting a string to its codepoints, there is the possibility of invalid byte sequences. Since certain situations may need specific ways to handle UTF-8 errors, a custom error handling function can be used, which has the signature:</p>

<div class="property show-anchors"><div class="signature"><span class="method">errorFunction</span><span class="symbol">(</span> <span class="param">reason</span> <span class="symbol">,</span> <span class="param">offset</span> <span class="symbol">,</span> <span class="param">bytes</span> <span class="symbol">,</span> <span class="param">output</span> <span class="symbol">[</span> <span class="symbol">,</span> <span class="param">badCodepoint</span> <span class="symbol">]</span> <span class="symbol">)</span> <span class="arrow">&rArr;</span> <span class="returns">number</span><div class="anchors"></div></div><div class="body"><p>The <i>reason</i> is one of the <a href="/v5/api/utils/strings/#strings--error-reasons">UTF-8 Error Reasons</a>, <i>offset</i> is the index into <i>bytes</i> where the error was first encountered, output is the list of codepoints already processed (and may be modified) and in certain Error Reasons, the <i>badCodepoint</i> indicates the currently computed codepoint, but which would be rejected because its value is invalid.</p>

<p>This function should return the number of bytes to skip past keeping in mind the value at <i>offset</i> will already be consumed.</p>

</div></div><a name="strings--error-reasons"></a><a name="strings--strings--error-handling--strings--error-reasons"></a><h3 class="show-anchors"><div>UTF-8 Error Reasons<div class="anchors"><a class="self" href="/v5/api/utils/strings/#strings--error-reasons"></a><a class="source" href="https://github.com/ethers-io/ethers.js/blob/master/packages/strings/src.ts/utf8.ts#L19">source</a></div></div></h3>
<div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">Utf8ErrorReason</span><span class="symbol">.</span><span class="method">BAD_PREFIX</span><div class="anchors"></div></div><div class="body"><p>A byte was encountered which is invalid to begin a UTF-8 byte sequence with.</p>

</div></div><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">Utf8ErrorReason</span><span class="symbol">.</span><span class="method">MISSING_CONTINUE</span><div class="anchors"></div></div><div class="body"><p>A UTF-8 sequence was begun, but did not have enough continuation bytes for the sequence. For this error the <i>ofset</i> is the index at which a continuation byte was expected.</p>

</div></div><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">Utf8ErrorReason</span><span class="symbol">.</span><span class="method">OUT_OF_RANGE</span><div class="anchors"></div></div><div class="body"><p>The computed codepoint is outside the range for valid UTF-8 codepoints (i.e. the codepoint is greater than 0x10ffff). This reason will pass the computed <i>badCountpoint</i> into the custom error function.</p>

</div></div><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">Utf8ErrorReason</span><span class="symbol">.</span><span class="method">OVERLONG</span><div class="anchors"></div></div><div class="body"><p>Due to the way UTF-8 allows variable length byte sequences to be used, it is possible to have multiple representations of the same character, which means <a href="https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings">overlong sequences</a> allow for a non-distinguished string to be formed, which can impact security as multiple strings that are otherwise equal can have different hashes.</p>

<p>Generally, overlong sequences are an attempt to circumvent some part of security, but in rare cases may be produced by lazy libraries or used to encode the null terminating character in a way that is safe to include in a <code class="inline">char*</code>.</p>

<p>This reason will pass the computed <i>badCountpoint</i> into the custom error function, which is actually a valid codepoint, just one that was arrived at through unsafe methods.</p>

</div></div><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">Utf8ErrorReason</span><span class="symbol">.</span><span class="method">OVERRUN</span><div class="anchors"></div></div><div class="body"><p>The string does not have enough characters remaining for the length of this sequence.</p>

</div></div><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">Utf8ErrorReason</span><span class="symbol">.</span><span class="method">UNEXPECTED_CONTINUE</span><div class="anchors"></div></div><div class="body"><p>This error is similar to BAD_PREFIX, since a continuation byte cannot begin a valid sequence, but many may wish to process this differently. However, most developers would want to trap this and perform the same operation as a BAD_PREFIX.</p>

</div></div><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">Utf8ErrorReason</span><span class="symbol">.</span><span class="method">UTF16_SURROGATE</span><div class="anchors"></div></div><div class="body"><p>The computed codepoint represents a value reserved for UTF-16 surrogate pairs. This reason will pass the computed surrogate half <i>badCountpoint</i> into the custom error function.</p>

</div></div><a name="strings--strings--error-handling--provided-utf-8-error-handling-functions"></a><h3 class="show-anchors"><div>Provided UTF-8 Error Handling Functions<div class="anchors"><a class="self" href="/v5/api/utils/strings/#strings--strings--error-handling--provided-utf-8-error-handling-functions"></a></div></div></h3><p>There are already several functions available for the most common situations.</p>

<a name="strings--Utf8Error"></a><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">Utf8ErrorFuncs</span><span class="symbol">.</span><span class="method">error</span><div class="anchors"><a class="self" href="/v5/api/utils/strings/#strings--Utf8Error"></a><a class="source" href="https://github.com/ethers-io/ethers.js/blob/master/packages/strings/src.ts/utf8.ts#L55">source</a></div></div><div class="body"><p>The will throw an error on <b>any</b> error with a UTF-8 sequence, including invalid prefix bytes, overlong sequences, UTF-16 surrogate pairs.</p>

</div></div><a name="strings--Utf8Ignore"></a><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">Utf8ErrorFuncs</span><span class="symbol">.</span><span class="method">ignore</span><div class="anchors"><a class="self" href="/v5/api/utils/strings/#strings--Utf8Ignore"></a><a class="source" href="https://github.com/ethers-io/ethers.js/blob/master/packages/strings/src.ts/utf8.ts#L59">source</a></div></div><div class="body"><p>This will drop all invalid sequences (by consuming invalid prefix bytes and any following continuation bytes) from the final string as well as permit overlong sequences to be converted to their equivalent string.</p>

</div></div><a name="strings--Utf8Replace"></a><div class="property show-anchors"><div class="signature"><span class="path">ethers</span><span class="symbol">.</span><span class="path">utils</span><span class="symbol">.</span><span class="path">Utf8ErrorFuncs</span><span class="symbol">.</span><span class="method">replace</span><div class="anchors"><a class="self" href="/v5/api/utils/strings/#strings--Utf8Replace"></a><a class="source" href="https://github.com/ethers-io/ethers.js/blob/master/packages/strings/src.ts/utf8.ts#L81">source</a></div></div><div class="body"><p>This will replace all invalid sequences (by consuming invalid prefix bytes and any following continuation bytes) with the <a href="https://en.wikipedia.org/wiki/Specials_%28Unicode_block%29#Replacement_character">UTF-8 Replacement Character</a>, (i.e. U+FFFD).</p>

</div></div>

      <div class="footer">
        <div class="nav previous"><a href="/v5/api/utils/signing-key/"><span class="arrow">&larr;</span>Signing Key</a></div>
        <div class="nav next"><a href="/v5/api/utils/transactions/">Transactions<span class="arrow">&rarr;</span></a></div>
      </div>
      <div class="copyright">The content of this site is licensed under the <a href="https://choosealicense.com/licenses/cc-by-4.0/">Creative Commons License</a>. Generated on July 5, 2020, 12:0am.</div>
    </div>
    <script src="/v5/static/script.js" type="text/javascript"></script>
  </body>
</html>