Network Working Group S. Josefsson Internet-Draft February 2003 Expires: August 2, 2003 Nameprep and IDNA Test Vectors draft-josefsson-idn-test-vectors Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 2, 2003. Abstract This document contains test vectors for Nameprep and IDNA. Josefsson Expires August 2, 2003 [Page 1] Internet-Draft Nameprep and IDNA Test Vectors February 2003 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Format of Nameprep Test Vectors . . . . . . . . . . . . . . 5 3. Format of IDNA Test Vectors . . . . . . . . . . . . . . . . 6 4. Nameprep Test Vectors . . . . . . . . . . . . . . . . . . . 7 4.1 Map to nothing . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 Case folding ASCII U+0043 U+0041 U+0046 U+0045 . . . . . . . 8 4.3 Case folding 8bit U+00DF (german sharp s) . . . . . . . . . 8 4.4 Case folding U+0130 (turkish capital I with dot) . . . . . . 9 4.5 Case folding multibyte U+0143 U+037A . . . . . . . . . . . . 9 4.6 Case folding U+2121 U+33C6 U+1D7BB . . . . . . . . . . . . . 10 4.7 Normalization of U+006a U+030c U+00A0 U+00AA . . . . . . . . 10 4.8 Case folding U+1FB7 and normalization . . . . . . . . . . . 11 4.9 Self-reverting case folding U+01F0 and normalization . . . . 11 4.10 Self-reverting case folding U+0390 and normalization . . . . 12 4.11 Self-reverting case folding U+03B0 and normalization . . . . 12 4.12 Self-reverting case folding U+1E96 and normalization . . . . 13 4.13 Self-reverting case folding U+1F56 and normalization . . . . 13 4.14 ASCII space character U+0020 . . . . . . . . . . . . . . . . 13 4.15 Non-ASCII 8bit space character U+00A0 . . . . . . . . . . . 14 4.16 Non-ASCII multibyte space character U+1680 . . . . . . . . . 14 4.17 Non-ASCII multibyte space character U+2000 . . . . . . . . . 14 4.18 Zero Width Space U+200b . . . . . . . . . . . . . . . . . . 15 4.19 Non-ASCII multibyte space character U+3000 . . . . . . . . . 15 4.20 ASCII control characters U+0010 U+007F . . . . . . . . . . . 15 4.21 Non-ASCII 8bit control character U+0085 . . . . . . . . . . 16 4.22 Non-ASCII multibyte control character U+180E . . . . . . . . 16 4.23 Zero Width No-Break Space U+FEFF . . . . . . . . . . . . . . 16 4.24 Non-ASCII control character U+1D175 . . . . . . . . . . . . 16 4.25 Plane 0 private use character U+F123 . . . . . . . . . . . . 17 4.26 Plane 15 private use character U+F1234 . . . . . . . . . . . 17 4.27 Plane 16 private use character U+10F234 . . . . . . . . . . 17 4.28 Non-character code point U+8FFFE . . . . . . . . . . . . . . 17 4.29 Non-character code point U+10FFFF . . . . . . . . . . . . . 18 4.30 Surrogate code U+DF42 . . . . . . . . . . . . . . . . . . . 18 4.31 Non-plain text character U+FFFD . . . . . . . . . . . . . . 18 4.32 Ideographic description character U+2FF5 . . . . . . . . . . 18 4.33 Display property character U+0341 . . . . . . . . . . . . . 19 4.34 Left-to-right mark U+200E . . . . . . . . . . . . . . . . . 19 4.35 Deprecated U+202A . . . . . . . . . . . . . . . . . . . . . 19 4.36 Language tagging character U+E0001 . . . . . . . . . . . . . 19 4.37 Language tagging character U+E0042 . . . . . . . . . . . . . 20 4.38 Bidi: RandALCat character U+05BE and LCat characters . . . . 20 4.39 Bidi: RandALCat character U+FD50 and LCat characters . . . . 20 4.40 Bidi: RandALCat character U+FB38 and LCat characters . . . . 21 4.41 Bidi: RandALCat without trailing RandALCat U+0627 U+0031 . . 21 4.42 Bidi: RandALCat character U+0627 U+0031 U+0628 . . . . . . . 21 Josefsson Expires August 2, 2003 [Page 2] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.43 Unassigned code point U+E0002 . . . . . . . . . . . . . . . 22 4.44 Larger test (shrinking) . . . . . . . . . . . . . . . . . . 22 4.45 Larger test (expanding) . . . . . . . . . . . . . . . . . . 23 5. IDNA Test Vectors . . . . . . . . . . . . . . . . . . . . . 23 5.1 Arabic (Egyptian) . . . . . . . . . . . . . . . . . . . . . 23 5.2 Chinese (simplified) . . . . . . . . . . . . . . . . . . . . 24 5.3 Chinese (traditional) . . . . . . . . . . . . . . . . . . . 24 5.4 Czech . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.5 Hebrew . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.6 Hindi (Devanagari) . . . . . . . . . . . . . . . . . . . . . 25 5.7 Japanese (kanji and hiragana) . . . . . . . . . . . . . . . 25 5.8 Russian (Cyrillic) . . . . . . . . . . . . . . . . . . . . . 26 5.9 Spanish . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.10 Vietnamese . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.11 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.12 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.13 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.14 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.15 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.16 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.17 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.18 Greek . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.19 Maltese (Malti) . . . . . . . . . . . . . . . . . . . . . . 29 5.20 Russian (Cyrillic) . . . . . . . . . . . . . . . . . . . . . 30 6. Security Considerations . . . . . . . . . . . . . . . . . . 30 Author's Address . . . . . . . . . . . . . . . . . . . . . . 31 Normative References . . . . . . . . . . . . . . . . . . . . 30 Informative References . . . . . . . . . . . . . . . . . . . 30 A. Nameprep test vectors in C syntax . . . . . . . . . . . . . 31 B. IDNA test vectors in C syntax . . . . . . . . . . . . . . . 36 Intellectual Property and Copyright Statements . . . . . . . 40 Josefsson Expires August 2, 2003 [Page 3] Internet-Draft Nameprep and IDNA Test Vectors February 2003 1. Introduction The Nameprep and IDNA specifications lack thorough examples that would have aided in implementing them. This document act as a complement to those specifications providing such examples. It should be pointed out that this document is not normative, and thus any errors in this document should not be treated as gospel that defines Nameprep nor IDNA. When conforming to the specification and generating output corresponding to values in this document is in conflict, implementations should conform to the specification. Josefsson Expires August 2, 2003 [Page 4] Internet-Draft Nameprep and IDNA Test Vectors February 2003 2. Format of Nameprep Test Vectors The tests follow a certain syntax, described here by showing one complete example with comments intermixed. The comments are prefixed with the '#' character. # First the (UTF-8) string is printed as a C octet string, with # characters [A-Za-z .0-9] shown inline and other characters shown # escaped with \xAB where AB is the hex sequence of that octet. The # number of octets are also shown. in (length 3 bytes): \xE1\xBE\xB7 # The input is also printed as Unicode codepoints. input (length 1): U+1fb7 # After printing the input, the nameprep steps starts. When the # string is modified, the specific operation that caused it is printed # along with the new string of Unicode code points. # 1) Map -- For each character in the input, check if it has a mapping # and, if so, replace it with its mapping. This is described in # section 3. Table B.2 maps U+1fb7 to U+03b1 U+0342 U+03b9. U+03b1 U+0342 U+03b9 # 2) Normalize -- Possibly normalize the result of step 1 using Unicode # normalization. This is described in section 4. Unicode normalization with form KC maps string into: U+1fb6 U+03b9 # 3) Prohibit -- Check for any characters that are not allowed in the # output. If any are found, return an error. This is described in # section 5. # 4) Check bidi -- Possibly check for right-to-left characters, and if # any are found, make sure that the whole string satisfies the # requirements for bidirectional strings. If the string does not # satisfy the requirements for bidirectional strings, return an # error. This is described in section 6. # # 1) The characters in section 5.8 MUST be prohibited. Josefsson Expires August 2, 2003 [Page 5] Internet-Draft Nameprep and IDNA Test Vectors February 2003 # 2) If a string contains any RandALCat character, the string MUST NOT # contain any LCat character. # 3) If a string contains any RandALCat character, a RandALCat # character MUST be the first character of the string, and a # RandALCat character MUST be the last character of the string. # The output is printed as Unicode codepoints. output (length 2): U+1fb6 U+03b9 # And finally the output is printed as UTF-8 out (length 5 bytes): \xE1\xBE\xB6\xCE\xB9 3. Format of IDNA Test Vectors The tests follow a certain syntax, described here by showing one complete example with comments intermixed. The comments are prefixed with the '#' character. # First the (UTF-8) string is printed as a C octet string, with # characters [A-Za-z .0-9] shown inline and other characters shown # escaped with \xAB where AB is the hex sequence of that octet. The # number of octets are also shown. in (length 39 bytes): 'Hello\x2DAnother\x2DWa' 'y\x2D\xE3\x81\x9D\xE3\x82\x8C\xE3\x81\x9E\xE3\x82\x8C\xE3\x81' '\xAE\xE5\xA0\xB4\xE6\x89\x80 # The input is also printed as Unicode codepoints. input (length 39): U+0048 U+0065 U+006c U+006c U+006f U+002d U+0041 U+006e U+006f U+0074 U+0068 U+0065 U+0072 U+002d U+0057 U+0061 U+0079 U+002d U+305d U+308c U+305e U+308c U+306e U+5834 U+6240 # After printing the input, the IDNA ToASCII step starts. The output # is printed as an ASCII string. out: xn--hello-another-way--fc4qua05auwb3674vfr0b Josefsson Expires August 2, 2003 [Page 6] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4. Nameprep Test Vectors 4.1 Map to nothing in (length 37 bytes): foo\xC2\xAD\xCD\x8F\xE1\xA0\x86\xE1\xA0\x8Bbar' '\xE2\x80\x8B\xE2\x81\xA0baz\xEF\xB8\x80\xEF\xB8\x88\xEF' '\xB8\x8F\xEF\xBB\xBF input (length 19): U+0066 U+006f U+006f U+00ad U+034f U+1806 U+180b U+0062 U+0061 U+0072 U+200b U+2060 U+0062 U+0061 U+007a U+fe00 U+fe08 U+fe0f U+feff Table B.1 maps U+00ad to nothing. Table B.1 maps U+034f to nothing. Table B.1 maps U+1806 to nothing. Table B.1 maps U+180b to nothing. Table B.1 maps U+200b to nothing. Table B.1 maps U+2060 to nothing. Table B.1 maps U+fe00 to nothing. Table B.1 maps U+fe08 to nothing. Table B.1 maps U+fe0f to nothing. Table B.1 maps U+feff to nothing. U+0066 U+006f U+006f U+0062 U+0061 U+0072 U+0062 U+0061 U+007a output (length 9): U+0066 U+006f U+006f U+0062 U+0061 U+0072 U+0062 U+0061 U+007a out (length 9 bytes): foobarbaz Josefsson Expires August 2, 2003 [Page 7] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.2 Case folding ASCII U+0043 U+0041 U+0046 U+0045 in (length 4 bytes): CAFE input (length 4): U+0043 U+0041 U+0046 U+0045 Table B.2 maps U+0043 to U+0063. Table B.2 maps U+0041 to U+0061. Table B.2 maps U+0046 to U+0066. Table B.2 maps U+0045 to U+0065. U+0063 U+0061 U+0066 U+0065 output (length 4): U+0063 U+0061 U+0066 U+0065 out (length 4 bytes): cafe 4.3 Case folding 8bit U+00DF (german sharp s) in (length 2 bytes): \xC3\xDF input (length 1): U+00df Table B.2 maps U+00df to U+0073 U+0073. U+0073 U+0073 output (length 2): U+0073 U+0073 out (length 2 bytes): ss Josefsson Expires August 2, 2003 [Page 8] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.4 Case folding U+0130 (turkish capital I with dot) in (length 2 bytes): \xC4\xB0 input (length 1): U+0130 Table B.2 maps U+0130 to U+0069 U+0307. U+0069 U+0307 output (length 2): U+0069 U+0307 out (length 3 bytes): i\xCC\x87 4.5 Case folding multibyte U+0143 U+037A in (length 4 bytes): \xC5\x83\xCD\xBA input (length 2): U+0143 U+037a Table B.2 maps U+0143 to U+0144. Table B.2 maps U+037a to U+0020 U+03b9. U+0144 U+0020 U+03b9 output (length 3): U+0144 U+0020 U+03b9 out (length 5 bytes): \xC5\x84 \xCE\xB9 Josefsson Expires August 2, 2003 [Page 9] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.6 Case folding U+2121 U+33C6 U+1D7BB in (length 10 bytes): \xE2\x84\xA1\xE3\x8F\x86\xF0\x9D\x9E\xBB input (length 3): U+2121 U+33c6 U+1d7bb Table B.2 maps U+2121 to U+0074 U+0065 U+006c. Table B.2 maps U+33c6 to U+0063 U+2215 U+006b U+0067. Table B.2 maps U+1d7bb to U+03c3. U+0074 U+0065 U+006c U+0063 U+2215 U+006b U+0067 U+03c3 output (length 8): U+0074 U+0065 U+006c U+0063 U+2215 U+006b U+0067 U+03c3 out (length 11 bytes): telc\xE2\x88\x95kg\xCF\x83 4.7 Normalization of U+006a U+030c U+00A0 U+00AA in (length 7 bytes): j\xCC\x8C\xC2\xA0\xC2\xAA input (length 4): U+006a U+030c U+00a0 U+00aa Unicode normalization with form KC maps string into: U+01f0 U+0020 U+0061 output (length 3): U+01f0 U+0020 U+0061 out (length 4 bytes): \xC7\xB0 a Josefsson Expires August 2, 2003 [Page 10] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.8 Case folding U+1FB7 and normalization in (length 3 bytes): \xE1\xBE\xB7 input (length 1): U+1fb7 Table B.2 maps U+1fb7 to U+03b1 U+0342 U+03b9. U+03b1 U+0342 U+03b9 Unicode normalization with form KC maps string into: U+1fb6 U+03b9 output (length 2): U+1fb6 U+03b9 out (length 5 bytes): \xE1\xBE\xB6\xCE\xB9 4.9 Self-reverting case folding U+01F0 and normalization in (length 2 bytes): \xC7\xF0 input (length 1): U+01f0 Table B.2 maps U+01f0 to U+006a U+030c. U+006a U+030c Unicode normalization with form KC maps string into: U+01f0 output (length 1): U+01f0 out (length 2 bytes): \xC7\xB0 Josefsson Expires August 2, 2003 [Page 11] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.10 Self-reverting case folding U+0390 and normalization in (length 2 bytes): \xCE\x90 input (length 1): U+0390 Table B.2 maps U+0390 to U+03b9 U+0308 U+0301. U+03b9 U+0308 U+0301 Unicode normalization with form KC maps string into: U+0390 output (length 1): U+0390 out (length 2 bytes): \xCE\x90 4.11 Self-reverting case folding U+03B0 and normalization in (length 2 bytes): \xCE\xB0 input (length 1): U+03b0 Table B.2 maps U+03b0 to U+03c5 U+0308 U+0301. U+03c5 U+0308 U+0301 Unicode normalization with form KC maps string into: U+03b0 output (length 1): U+03b0 out (length 2 bytes): \xCE\xB0 Josefsson Expires August 2, 2003 [Page 12] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.12 Self-reverting case folding U+1E96 and normalization in (length 3 bytes): \xE1\xBA\x96 input (length 1): U+1e96 Table B.2 maps U+1e96 to U+0068 U+0331. U+0068 U+0331 Unicode normalization with form KC maps string into: U+1e96 output (length 1): U+1e96 out (length 3 bytes): \xE1\xBA\x96 4.13 Self-reverting case folding U+1F56 and normalization in (length 3 bytes): \xE1\xBD\x96 input (length 1): U+1f56 Table B.2 maps U+1f56 to U+03c5 U+0313 U+0342. U+03c5 U+0313 U+0342 Unicode normalization with form KC maps string into: U+1f56 output (length 1): U+1f56 out (length 3 bytes): \xE1\xBD\x96 4.14 ASCII space character U+0020 in (length 1 bytes): input (length 1): U+0020 output (length 1): U+0020 out (length 1 bytes): Josefsson Expires August 2, 2003 [Page 13] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.15 Non-ASCII 8bit space character U+00A0 in (length 2 bytes): \xC2\xA0 input (length 1): U+00a0 Unicode normalization with form KC maps string into: U+0020 output (length 1): U+0020 out (length 1 bytes): 4.16 Non-ASCII multibyte space character U+1680 in (length 3 bytes): \xE1\x9A\x80 input (length 1): U+1680 Table C.1.2 prohibits string (character U+1680). 4.17 Non-ASCII multibyte space character U+2000 in (length 3 bytes): \xE2\x80\x80 input (length 1): U+2000 Unicode normalization with form KC maps string into: U+0020 output (length 1): U+0020 out (length 1 bytes): Josefsson Expires August 2, 2003 [Page 14] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.18 Zero Width Space U+200b in (length 3 bytes): \xE2\x80\x8B input (length 1): U+200b Table B.1 maps U+200b to nothing. output (length 0): out (length 0 bytes): 4.19 Non-ASCII multibyte space character U+3000 in (length 3 bytes): \xE3\x80\x80 input (length 1): U+3000 Unicode normalization with form KC maps string into: U+0020 output (length 1): U+0020 out (length 1 bytes): 4.20 ASCII control characters U+0010 U+007F in (length 2 bytes): \x10\x7F input (length 2): U+0010 U+007f output (length 2): U+0010 U+007f out (length 2 bytes): \x10\x7F Josefsson Expires August 2, 2003 [Page 15] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.21 Non-ASCII 8bit control character U+0085 in (length 2 bytes): \xC2\x85 input (length 1): U+0085 Table C.2.2 prohibits string (character U+0085). 4.22 Non-ASCII multibyte control character U+180E in (length 3 bytes): \xE1\xA0\x8E input (length 1): U+180e Table C.2.2 prohibits string (character U+180e). 4.23 Zero Width No-Break Space U+FEFF in (length 3 bytes): \xEF\xBB\xBF input (length 1): U+feff Table B.1 maps U+feff to nothing. output (length 0): out (length 0 bytes): 4.24 Non-ASCII control character U+1D175 in (length 4 bytes): \xF0\x9D\x85\xB5 input (length 1): U+1d175 Table C.2.2 prohibits string (character U+1d175). Josefsson Expires August 2, 2003 [Page 16] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.25 Plane 0 private use character U+F123 in (length 3 bytes): \xEF\x84\xA3 input (length 1): U+f123 Table C.3 prohibits string (character U+f123). 4.26 Plane 15 private use character U+F1234 in (length 4 bytes): \xF3\xB1\x88\xB4 input (length 1): U+f1234 Table C.3 prohibits string (character U+f1234). 4.27 Plane 16 private use character U+10F234 in (length 4 bytes): \xF4\x8F\x88\xB4 input (length 1): U+10f234 Table C.3 prohibits string (character U+10f234). 4.28 Non-character code point U+8FFFE in (length 4 bytes): \xF2\x8F\xBF\xBE input (length 1): U+8fffe Table C.4 prohibits string (character U+8fffe). Josefsson Expires August 2, 2003 [Page 17] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.29 Non-character code point U+10FFFF in (length 4 bytes): \xF4\x8F\xBF\xBF input (length 1): U+10ffff Table C.4 prohibits string (character U+10ffff). 4.30 Surrogate code U+DF42 in (length 3 bytes): \xED\xBD\x82 input (length 1): U+df42 Table C.5 prohibits string (character U+df42). 4.31 Non-plain text character U+FFFD in (length 3 bytes): \xEF\xBF\xBD input (length 1): U+fffd Table C.6 prohibits string (character U+fffd). 4.32 Ideographic description character U+2FF5 in (length 3 bytes): \xE2\xBF\xB5 input (length 1): U+2ff5 Table C.7 prohibits string (character U+2ff5). Josefsson Expires August 2, 2003 [Page 18] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.33 Display property character U+0341 in (length 2 bytes): \xCD\x81 input (length 1): U+0341 Unicode normalization with form KC maps string into: U+0301 output (length 1): U+0301 out (length 2 bytes): \xCC\x81 4.34 Left-to-right mark U+200E in (length 3 bytes): \xE2\x80\x8E input (length 1): U+200e Table C.8 prohibits string (character U+200e). 4.35 Deprecated U+202A in (length 3 bytes): \xE2\x80\xAA input (length 1): U+202a Table C.8 prohibits string (character U+202a). 4.36 Language tagging character U+E0001 in (length 4 bytes): \xF3\xA0\x80\x81 input (length 1): U+e0001 Table C.9 prohibits string (character U+e0001). Josefsson Expires August 2, 2003 [Page 19] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.37 Language tagging character U+E0042 in (length 4 bytes): \xF3\xA0\x81\x82 input (length 1): U+e0042 Table C.9 prohibits string (character U+e0042). 4.38 Bidi: RandALCat character U+05BE and LCat characters in (length 8 bytes): foo\xD6\xBEbar input (length 7): U+0066 U+006f U+006f U+05be U+0062 U+0061 U+0072 String contains both L and RAL characters. 4.39 Bidi: RandALCat character U+FD50 and LCat characters in (length 9 bytes): foo\xEF\xB5\x90bar input (length 7): U+0066 U+006f U+006f U+fd50 U+0062 U+0061 U+0072 Unicode normalization with form KC maps string into: U+0066 U+006f U+006f U+062a U+062c U+0645 U+0062 U+0061 U+0072 String contains both L and RAL characters. Josefsson Expires August 2, 2003 [Page 20] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.40 Bidi: RandALCat character U+FB38 and LCat characters in (length 9 bytes): foo\xEF\xB9\xB6bar input (length 7): U+0066 U+006f U+006f U+fe76 U+0062 U+0061 U+0072 Unicode normalization with form KC maps string into: U+0066 U+006f U+006f U+0020 U+064e U+0062 U+0061 U+0072 output (length 8): U+0066 U+006f U+006f U+0020 U+064e U+0062 U+0061 U+0072 out (length 9 bytes): foo \xD9\x8Ebar 4.41 Bidi: RandALCat without trailing RandALCat U+0627 U+0031 in (length 3 bytes): \xD8\xA71 input (length 2): U+0627 U+0031 Bidi string does not start/end with RAL characters. 4.42 Bidi: RandALCat character U+0627 U+0031 U+0628 in (length 5 bytes): \xD8\xA71\xD8\xA8 input (length 3): U+0627 U+0031 U+0628 output (length 3): U+0627 U+0031 U+0628 out (length 5 bytes): \xD8\xA71\xD8\xA8 Josefsson Expires August 2, 2003 [Page 21] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.43 Unassigned code point U+E0002 in (length 4 bytes): \xF3\xA0\x80\x82 input (length 1): U+e0002 Table A.1 prohibits string (unassigned character U+e0002). 4.44 Larger test (shrinking) in (length 22 bytes): X\xC2\xAD\xC3\xDF\xC4\xB0\xE2\x84\xA1j\xCC\x8C\xC2\xA0\xC2' '\xAA\xCE\xB0\xE2\x80\x80 input (length 11): U+0058 U+00ad U+00df U+0130 U+2121 U+006a U+030c U+00a0 U+00aa U+03b0 U+2000 Table B.1 maps U+00ad to nothing. U+0058 U+00df U+0130 U+2121 U+006a U+030c U+00a0 U+00aa U+03b0 U+2000 Table B.2 maps U+0058 to U+0078. Table B.2 maps U+00df to U+0073 U+0073. Table B.2 maps U+0130 to U+0069 U+0307. Table B.2 maps U+2121 to U+0074 U+0065 U+006c. Table B.2 maps U+03b0 to U+03c5 U+0308 U+0301. U+0078 U+0073 U+0073 U+0069 U+0307 U+0074 U+0065 U+006c U+006a U+030c U+00a0 U+00aa U+03c5 U+0308 U+0301 U+2000 Unicode normalization with form KC maps string into: U+0078 U+0073 U+0073 U+0069 U+0307 U+0074 U+0065 U+006c U+01f0 U+0020 U+0061 U+03b0 U+0020 output (length 13): U+0078 U+0073 U+0073 U+0069 U+0307 U+0074 U+0065 U+006c U+01f0 U+0020 U+0061 U+03b0 U+0020 out (length 16 bytes): xssi\xCC\x87tel\xC7\xB0 a\xCE\xB0 Josefsson Expires August 2, 2003 [Page 22] Internet-Draft Nameprep and IDNA Test Vectors February 2003 4.45 Larger test (expanding) in (length 17 bytes): X\xC3\xDF\xE3\x8C\x96\xC4\xB0\xE2\x84\xA1\xE2\x92\x9F\xE3\x8C' '\x80 input (length 7): U+0058 U+00df U+3316 U+0130 U+2121 U+249f U+3300 Table B.2 maps U+0058 to U+0078. Table B.2 maps U+00df to U+0073 U+0073. Table B.2 maps U+0130 to U+0069 U+0307. Table B.2 maps U+2121 to U+0074 U+0065 U+006c. U+0078 U+0073 U+0073 U+3316 U+0069 U+0307 U+0074 U+0065 U+006c U+249f U+3300 Unicode normalization with form KC maps string into: U+0078 U+0073 U+0073 U+30ad U+30ed U+30e1 U+30fc U+30c8 U+30eb U+0069 U+0307 U+0074 U+0065 U+006c U+0028 U+0064 U+0029 U+30a2 U+30d1 U+30fc U+30c8 output (length 21): U+0078 U+0073 U+0073 U+30ad U+30ed U+30e1 U+30fc U+30c8 U+30eb U+0069 U+0307 U+0074 U+0065 U+006c U+0028 U+0064 U+0029 U+30a2 U+30d1 U+30fc U+30c8 out (length 42 bytes): xss\xE3\x82\xAD\xE3\x83\xAD\xE3\x83\xA1\xE3\x83\xBC\xE3' '\x83\x88\xE3\x83\xABi\xCC\x87tel\x28d\x29\xE3\x82' '\xA2\xE3\x83\x91\xE3\x83\xBC\xE3\x83\x88 5. IDNA Test Vectors 5.1 Arabic (Egyptian) in (length 34 bytes): '\xD9\x84\xD9\x8A\xD9\x87\xD9\x85\xD8\xA7\xD8\xA8\xD8\xAA\xD9\x83' '\xD9\x84\xD9\x85\xD9\x88\xD8\xB4\xD8\xB9\xD8\xB1\xD8\xA8\xD9\x8A' '\xD8\x9F input (length 34): U+0644 U+064a U+0647 U+0645 U+0627 U+0628 U+062a U+0643 U+0644 U+0645 U+0648 U+0634 U+0639 U+0631 U+0628 U+064a U+061f out: xn--egbpdaj6bu4bxfgehfvwxn Josefsson Expires August 2, 2003 [Page 23] Internet-Draft Nameprep and IDNA Test Vectors February 2003 5.2 Chinese (simplified) in (length 27 bytes): '\xE4\xBB\x96\xE4\xBB\xAC\xE4\xB8\xBA\xE4\xBB\x80\xE4\xB9\x88\xE4' '\xB8\x8D\xE8\xAF\xB4\xE4\xB8\xAD\xE6\x96\x87 input (length 27): U+4ed6 U+4eec U+4e3a U+4ec0 U+4e48 U+4e0d U+8bf4 U+4e2d U+6587 out: xn--ihqwcrb4cv8a8dqg056pqjye 5.3 Chinese (traditional) in (length 27 bytes): '\xE4\xBB\x96\xE5\x80\x91\xE7\x88\xB2\xE4\xBB\x80\xE9\xBA\xBD\xE4' '\xB8\x8D\xE8\xAA\xAA\xE4\xB8\xAD\xE6\x96\x87 input (length 27): U+4ed6 U+5011 U+7232 U+4ec0 U+9ebd U+4e0d U+8aaa U+4e2d U+6587 out: xn--ihqwctvzc91f659drss3x8bo0yb 5.4 Czech in (length 26 bytes): 'Pro\xC4\x8Dprost\xC4\x9Bneml' 'uv\xC3\xAD\xC4\x8Desky input (length 26): U+0050 U+0072 U+006f U+010d U+0070 U+0072 U+006f U+0073 U+0074 U+011b U+006e U+0065 U+006d U+006c U+0075 U+0076 U+00ed U+010d U+0065 U+0073 U+006b U+0079 out: xn--proprostnemluvesky-uyb24dma41a Josefsson Expires August 2, 2003 [Page 24] Internet-Draft Nameprep and IDNA Test Vectors February 2003 5.5 Hebrew in (length 44 bytes): '\xD7\x9C\xD7\x9E\xD7\x94\xD7\x94\xD7\x9D\xD7\xA4\xD7\xA9\xD7\x95' '\xD7\x98\xD7\x9C\xD7\x90\xD7\x9E\xD7\x93\xD7\x91\xD7\xA8\xD7\x99' '\xD7\x9D\xD7\xA2\xD7\x91\xD7\xA8\xD7\x99\xD7\xAA input (length 44): U+05dc U+05de U+05d4 U+05d4 U+05dd U+05e4 U+05e9 U+05d5 U+05d8 U+05dc U+05d0 U+05de U+05d3 U+05d1 U+05e8 U+05d9 U+05dd U+05e2 U+05d1 U+05e8 U+05d9 U+05ea out: xn--4dbcagdahymbxekheh6e0a7fei0b 5.6 Hindi (Devanagari) in (length 90 bytes): '\xE0\xA4\xAF\xE0\xA4\xB9\xE0\xA4\xB2\xE0\xA5\x8B\xE0\xA4\x97\xE0' '\xA4\xB9\xE0\xA4\xBF\xE0\xA4\xA8\xE0\xA5\x8D\xE0\xA4\xA6\xE0\xA5' '\x80\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA4\xAF\xE0\xA5\x8B\xE0\xA4\x82' '\xE0\xA4\xA8\xE0\xA4\xB9\xE0\xA5\x80\xE0\xA4\x82\xE0\xA4\xAC\xE0' '\xA5\x8B\xE0\xA4\xB2\xE0\xA4\xB8\xE0\xA4\x95\xE0\xA4\xA4\xE0\xA5' '\x87\xE0\xA4\xB9\xE0\xA5\x88\xE0\xA4\x82 input (length 90): U+092f U+0939 U+0932 U+094b U+0917 U+0939 U+093f U+0928 U+094d U+0926 U+0940 U+0915 U+094d U+092f U+094b U+0902 U+0928 U+0939 U+0940 U+0902 U+092c U+094b U+0932 U+0938 U+0915 U+0924 U+0947 U+0939 U+0948 U+0902 out: xn--i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd 5.7 Japanese (kanji and hiragana) in (length 54 bytes): '\xE3\x81\xAA\xE3\x81\x9C\xE3\x81\xBF\xE3\x82\x93\xE3\x81\xAA\xE6' '\x97\xA5\xE6\x9C\xAC\xE8\xAA\x9E\xE3\x82\x92\xE8\xA9\xB1\xE3\x81' '\x97\xE3\x81\xA6\xE3\x81\x8F\xE3\x82\x8C\xE3\x81\xAA\xE3\x81\x84' '\xE3\x81\xAE\xE3\x81\x8B input (length 54): U+306a U+305c U+307f U+3093 U+306a U+65e5 U+672c U+8a9e U+3092 U+8a71 U+3057 U+3066 U+304f U+308c U+306a U+3044 U+306e U+304b out: xn--n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa Josefsson Expires August 2, 2003 [Page 25] Internet-Draft Nameprep and IDNA Test Vectors February 2003 5.8 Russian (Cyrillic) in (length 56 bytes): '\xD0\xBF\xD0\xBE\xD1\x87\xD0\xB5\xD0\xBC\xD1\x83\xD0\xB6\xD0\xB5' '\xD0\xBE\xD0\xBD\xD0\xB8\xD0\xBD\xD0\xB5\xD0\xB3\xD0\xBE\xD0\xB2' '\xD0\xBE\xD1\x80\xD1\x8F\xD1\x82\xD0\xBF\xD0\xBE\xD1\x80\xD1\x83' '\xD1\x81\xD1\x81\xD0\xBA\xD0\xB8 input (length 56): U+043f U+043e U+0447 U+0435 U+043c U+0443 U+0436 U+0435 U+043e U+043d U+0438 U+043d U+0435 U+0433 U+043e U+0432 U+043e U+0440 U+044f U+0442 U+043f U+043e U+0440 U+0443 U+0441 U+0441 U+043a U+0438 out: xn--b1abfaaepdrnnbgefbadotcwatmq2g4l 5.9 Spanish in (length 42 bytes): 'Porqu\xC3\xA9nopuedens' 'implementehablar' 'enEspa\xC3\xB1ol input (length 42): U+0050 U+006f U+0072 U+0071 U+0075 U+00e9 U+006e U+006f U+0070 U+0075 U+0065 U+0064 U+0065 U+006e U+0073 U+0069 U+006d U+0070 U+006c U+0065 U+006d U+0065 U+006e U+0074 U+0065 U+0068 U+0061 U+0062 U+006c U+0061 U+0072 U+0065 U+006e U+0045 U+0073 U+0070 U+0061 U+00f1 U+006f U+006c out: xn--porqunopuedensimplementehablarenespaol-fmd56a Josefsson Expires August 2, 2003 [Page 26] Internet-Draft Nameprep and IDNA Test Vectors February 2003 5.10 Vietnamese in (length 45 bytes): 'T\xE1\xBA\xA1isaoh\xE1\xBB\x8Dkh\xC3\xB4' 'ngth\xE1\xBB\x83ch\xE1\xBB\x89n\xC3\xB3i' 'ti\xE1\xBA\xBFngVi\xE1\xBB\x87t input (length 45): U+0054 U+1ea1 U+0069 U+0073 U+0061 U+006f U+0068 U+1ecd U+006b U+0068 U+00f4 U+006e U+0067 U+0074 U+0068 U+1ec3 U+0063 U+0068 U+1ec9 U+006e U+00f3 U+0069 U+0074 U+0069 U+1ebf U+006e U+0067 U+0056 U+0069 U+1ec7 U+0074 out: xn--tisaohkhngthchnitingvit-kjcr8268qyxafd2f1b9g 5.11 Japanese in (length 20 bytes): '3\xE5\xB9\xB4B\xE7\xB5\x84\xE9\x87\x91\xE5\x85\xAB\xE5\x85' '\x88\xE7\x94\x9F input (length 20): U+0033 U+5e74 U+0042 U+7d44 U+91d1 U+516b U+5148 U+751f out: xn--3b-ww4c5e180e575a65lsy2b 5.12 Japanese in (length 34 bytes): '\xE5\xAE\x89\xE5\xAE\xA4\xE5\xA5\x88\xE7\xBE\x8E\xE6\x81\xB5\x2D' 'with\x2DSUPER\x2DMONKE' 'YS input (length 34): U+5b89 U+5ba4 U+5948 U+7f8e U+6075 U+002d U+0077 U+0069 U+0074 U+0068 U+002d U+0053 U+0055 U+0050 U+0045 U+0052 U+002d U+004d U+004f U+004e U+004b U+0045 U+0059 U+0053 out: xn---with-super-monkeys-pc58ag80a8qai00g7n9n Josefsson Expires August 2, 2003 [Page 27] Internet-Draft Nameprep and IDNA Test Vectors February 2003 5.13 Japanese in (length 39 bytes): 'Hello\x2DAnother\x2DWa' 'y\x2D\xE3\x81\x9D\xE3\x82\x8C\xE3\x81\x9E\xE3\x82\x8C\xE3\x81' '\xAE\xE5\xA0\xB4\xE6\x89\x80 input (length 39): U+0048 U+0065 U+006c U+006c U+006f U+002d U+0041 U+006e U+006f U+0074 U+0068 U+0065 U+0072 U+002d U+0057 U+0061 U+0079 U+002d U+305d U+308c U+305e U+308c U+306e U+5834 U+6240 out: xn--hello-another-way--fc4qua05auwb3674vfr0b 5.14 Japanese in (length 22 bytes): '\xE3\x81\xB2\xE3\x81\xA8\xE3\x81\xA4\xE5\xB1\x8B\xE6\xA0\xB9\xE3' '\x81\xAE\xE4\xB8\x8B2 input (length 22): U+3072 U+3068 U+3064 U+5c4b U+6839 U+306e U+4e0b U+0032 out: xn--2-u9tlzr9756bt3uc0v 5.15 Japanese in (length 23 bytes): 'Maji\xE3\x81\xA7Koi\xE3\x81\x99\xE3\x82\x8B' '5\xE7\xA7\x92\xE5\x89\x8D input (length 23): U+004d U+0061 U+006a U+0069 U+3067 U+004b U+006f U+0069 U+3059 U+308b U+0035 U+79d2 U+524d out: xn--majikoi5-783gue6qz075azm5e Josefsson Expires August 2, 2003 [Page 28] Internet-Draft Nameprep and IDNA Test Vectors February 2003 5.16 Japanese in (length 23 bytes): '\xE3\x83\x91\xE3\x83\x95\xE3\x82\xA3\xE3\x83\xBCde\xE3\x83' '\xAB\xE3\x83\xB3\xE3\x83\x90 input (length 23): U+30d1 U+30d5 U+30a3 U+30fc U+0064 U+0065 U+30eb U+30f3 U+30d0 out: xn--de-jg4avhby1noc0d 5.17 Japanese in (length 21 bytes): '\xE3\x81\x9D\xE3\x81\xAE\xE3\x82\xB9\xE3\x83\x94\xE3\x83\xBC\xE3' '\x83\x89\xE3\x81\xA7 input (length 21): U+305d U+306e U+30b9 U+30d4 U+30fc U+30c9 U+3067 out: xn--d9juau41awczczp 5.18 Greek in (length 16 bytes): '\xCE\xB5\xCE\xBB\xCE\xBB\xCE\xB7\xCE\xBD\xCE\xB9\xCE\xBA\xCE\xAC input (length 16): U+03b5 U+03bb U+03bb U+03b7 U+03bd U+03b9 U+03ba U+03ac out: xn--hxargifdar 5.19 Maltese (Malti) in (length 13 bytes): 'bon\xC4\xA1usa\xC4\xA7\xC4\xA7a input (length 13): U+0062 U+006f U+006e U+0121 U+0075 U+0073 U+0061 U+0127 U+0127 U+0061 out: xn--bonusaa-5bb1da Josefsson Expires August 2, 2003 [Page 29] Internet-Draft Nameprep and IDNA Test Vectors February 2003 5.20 Russian (Cyrillic) in (length 56 bytes): '\xD0\xBF\xD0\xBE\xD1\x87\xD0\xB5\xD0\xBC\xD1\x83\xD0\xB6\xD0\xB5' '\xD0\xBE\xD0\xBD\xD0\xB8\xD0\xBD\xD0\xB5\xD0\xB3\xD0\xBE\xD0\xB2' '\xD0\xBE\xD1\x80\xD1\x8F\xD1\x82\xD0\xBF\xD0\xBE\xD1\x80\xD1\x83' '\xD1\x81\xD1\x81\xD0\xBA\xD0\xB8 input (length 56): U+043f U+043e U+0447 U+0435 U+043c U+0443 U+0436 U+0435 U+043e U+043d U+0438 U+043d U+0435 U+0433 U+043e U+0432 U+043e U+0440 U+044f U+0442 U+043f U+043e U+0440 U+0443 U+0441 U+0441 U+043a U+0438 out: xn--b1abfaaepdrnnbgefbadotcwatmq2g4l 6. Security Considerations The security considerations from Nameprep and IDNA are inherited. These test vectors are not believed to introduce new security considerations nor disrupt the operation of the Internet, but may expose security weaknesses in existing implementations. Any such incident should not be regarded as a problem with this document, though, but rather taken as evidence that this document served its purpose. Normative References [1] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003. [2] Faltstrom, P., Hoffman, P. and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003. Informative References [3] Costello, A., "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", RFC 3492, March 2003. Josefsson Expires August 2, 2003 [Page 30] Internet-Draft Nameprep and IDNA Test Vectors February 2003 Author's Address Simon Josefsson Drottningholmsv. 70 Stockholm 112 42 Sweden EMail: simon@josefsson.org Acknowledgments Some IDNA test vectors were borrowed from Punycode [3]. Appendix A. Nameprep test vectors in C syntax In order to avoid having implementors type in the test vectors above, a C structure with the data is provided. The comment field is the section titles used in this document. The in field contains UTF-8 encoded strings. The out field contains expected output, or NULL if the expected result is an error. The profile field can be ignored. The only significant setting for the flags field is STRINGPREP_NO_UNASSIGNED which signals to the Nameprep implementation that it should perform unassigned code point checking, aka the "AllowUnassigned" flag. The rc field contains expected error codes, where 0 indicates success and the other flags should be self explanatory. struct stringprep { char *comment; char *in; char *out; char *profile; int flags; int rc; } strprep[] = { { "Map to nothing", "foo\xC2\xAD\xCD\x8F\xE1\xA0\x86\xE1\xA0\x8B" "bar""\xE2\x80\x8B\xE2\x81\xA0""baz\xEF\xB8\x80\xEF\xB8\x88" "\xEF\xB8\x8F\xEF\xBB\xBF", "foobarbaz" }, { "Case folding ASCII U+0043 U+0041 U+0046 U+0045", "CAFE", "cafe" Josefsson Expires August 2, 2003 [Page 31] Internet-Draft Nameprep and IDNA Test Vectors February 2003 }, { "Case folding 8bit U+00DF (german sharp s)", "\xC3\xDF", "ss" }, { "Case folding U+0130 (turkish capital I with dot)", "\xC4\xB0", "i\xcc\x87" }, { "Case folding multibyte U+0143 U+037A", "\xC5\x83\xCD\xBA", "\xC5\x84 \xCE\xB9" }, { "Case folding U+2121 U+33C6 U+1D7BB", "\xE2\x84\xA1\xE3\x8F\x86\xF0\x9D\x9E\xBB", "telc\xE2\x88\x95""kg\xCF\x83" }, { "Normalization of U+006a U+030c U+00A0 U+00AA", "\x6A\xCC\x8C\xC2\xA0\xC2\xAA", "\xC7\xB0 a" }, { "Case folding U+1FB7 and normalization", "\xE1\xBE\xB7", "\xE1\xBE\xB6\xCE\xB9" }, { "Self-reverting case folding U+01F0 and normalization", "\xC7\xF0", "\xC7\xB0" }, { "Self-reverting case folding U+0390 and normalization", "\xCE\x90", "\xCE\x90" }, { "Self-reverting case folding U+03B0 and normalization", "\xCE\xB0", "\xCE\xB0" }, { "Self-reverting case folding U+1E96 and normalization", "\xE1\xBA\x96", "\xE1\xBA\x96" }, { "Self-reverting case folding U+1F56 and normalization", "\xE1\xBD\x96", "\xE1\xBD\x96" }, { "ASCII space character U+0020", Josefsson Expires August 2, 2003 [Page 32] Internet-Draft Nameprep and IDNA Test Vectors February 2003 "\x20", "\x20" }, { "Non-ASCII 8bit space character U+00A0", "\xC2\xA0", "\x20" }, { "Non-ASCII multibyte space character U+1680", "\xE1\x9A\x80", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Non-ASCII multibyte space character U+2000", "\xE2\x80\x80", "\x20" }, { "Zero Width Space U+200b", "\xE2\x80\x8b", "" }, { "Non-ASCII multibyte space character U+3000", "\xE3\x80\x80", "\x20" }, { "ASCII control characters U+0010 U+007F", "\x10\x7F", "\x10\x7F" }, { "Non-ASCII 8bit control character U+0085", "\xC2\x85", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Non-ASCII multibyte control character U+180E", "\xE1\xA0\x8E", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Zero Width No-Break Space U+FEFF", "\xEF\xBB\xBF", "" }, { "Non-ASCII control character U+1D175", "\xF0\x9D\x85\xB5", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Plane 0 private use character U+F123", Josefsson Expires August 2, 2003 [Page 33] Internet-Draft Nameprep and IDNA Test Vectors February 2003 "\xEF\x84\xA3", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Plane 15 private use character U+F1234", "\xF3\xB1\x88\xB4", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Plane 16 private use character U+10F234", "\xF4\x8F\x88\xB4", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Non-character code point U+8FFFE", "\xF2\x8F\xBF\xBE", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Non-character code point U+10FFFF", "\xF4\x8F\xBF\xBF", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Surrogate code U+DF42", "\xED\xBD\x82", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Non-plain text character U+FFFD", "\xEF\xBF\xBD", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Ideographic description character U+2FF5", "\xE2\xBF\xB5", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Display property character U+0341", "\xCD\x81", "\xCC\x81" }, { "Left-to-right mark U+200E", "\xE2\x80\x8E", "\xCC\x81", "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { Josefsson Expires August 2, 2003 [Page 34] Internet-Draft Nameprep and IDNA Test Vectors February 2003 "Deprecated U+202A", "\xE2\x80\xAA", "\xCC\x81", "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Language tagging character U+E0001", "\xF3\xA0\x80\x81", "\xCC\x81", "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Language tagging character U+E0042", "\xF3\xA0\x81\x82", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Bidi: RandALCat character U+05BE and LCat characters", "foo\xD6\xBE""bar", NULL, "Nameprep", 0, STRINGPREP_BIDI_BOTH_L_AND_RAL }, { "Bidi: RandALCat character U+FD50 and LCat characters", "foo\xEF\xB5\x90""bar", NULL, "Nameprep", 0, STRINGPREP_BIDI_BOTH_L_AND_RAL }, { "Bidi: RandALCat character U+FB38 and LCat characters", "foo\xEF\xB9\xB6""bar", "foo \xd9\x8e""bar" }, { "Bidi: RandALCat without trailing RandALCat U+0627 U+0031", "\xD8\xA7\x31", NULL, "Nameprep", 0, STRINGPREP_BIDI_LEADTRAIL_NOT_RAL} , { "Bidi: RandALCat character U+0627 U+0031 U+0628", "\xD8\xA7\x31\xD8\xA8", "\xD8\xA7\x31\xD8\xA8" }, { "Unassigned code point U+E0002", "\xF3\xA0\x80\x82", NULL, "Nameprep", STRINGPREP_NO_UNASSIGNED, STRINGPREP_CONTAINS_UNASSIGNED }, { "Larger test (shrinking)", "X\xC2\xAD\xC3\xDF\xC4\xB0\xE2\x84\xA1\x6a\xcc\x8c\xc2\xa0\xc2" "\xaa\xce\xb0\xe2\x80\x80", "xssi\xcc\x87""tel\xc7\xb0 a\xce\xb0 ", "Nameprep" }, { Josefsson Expires August 2, 2003 [Page 35] Internet-Draft Nameprep and IDNA Test Vectors February 2003 "Larger test (expanding)", "X\xC3\xDF\xe3\x8c\x96\xC4\xB0\xE2\x84\xA1\xE2\x92\x9F\xE3\x8c\x80", "xss\xe3\x82\xad\xe3\x83\xad\xe3\x83\xa1\xe3\x83\xbc\xe3\x83\x88" "\xe3\x83\xab""i\xcc\x87""tel\x28""d\x29\xe3\x82\xa2\xe3\x83\x91" "\xe3\x83\xbc\xe3\x83\x88" }, }; Appendix B. IDNA test vectors in C syntax In order to avoid having implementors type in the IDNA test vectors above, a C structure with the data is provided. The name field is the section titles used in this document. The inlen and in field contains Unicode code points. The out field contains expected ToASCII output. The allowunassigned, and usestd3asciirules can be ignored. The toasciirc and tounicoderc field contains expected error codes, where 0 indicates success and the other flags should be self explanatory. struct idna { char *name; size_t inlen; unsigned long in[100]; char *out; int allowunassigned; int usestd3asciirules; int toasciirc; int tounicoderc; } idna[] = { { "Arabic (Egyptian)", 17, { 0x0644, 0x064A, 0x0647, 0x0645, 0x0627, 0x0628, 0x062A, 0x0643, 0x0644, 0x0645, 0x0648, 0x0634, 0x0639, 0x0631, 0x0628, 0x064A, 0x061F}, IDNA_ACE_PREFIX "egbpdaj6bu4bxfgehfvwxn", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Chinese (simplified)", 9, { 0x4ED6, 0x4EEC, 0x4E3A, 0x4EC0, 0x4E48, 0x4E0D, 0x8BF4, 0x4E2D, 0x6587}, IDNA_ACE_PREFIX "ihqwcrb4cv8a8dqg056pqjye", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { Josefsson Expires August 2, 2003 [Page 36] Internet-Draft Nameprep and IDNA Test Vectors February 2003 "Chinese (traditional)", 9, { 0x4ED6, 0x5011, 0x7232, 0x4EC0, 0x9EBD, 0x4E0D, 0x8AAA, 0x4E2D, 0x6587}, IDNA_ACE_PREFIX "ihqwctvzc91f659drss3x8bo0yb", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Czech", 22, { 0x0050, 0x0072, 0x006F, 0x010D, 0x0070, 0x0072, 0x006F, 0x0073, 0x0074, 0x011B, 0x006E, 0x0065, 0x006D, 0x006C, 0x0075, 0x0076, 0x00ED, 0x010D, 0x0065, 0x0073, 0x006B, 0x0079}, IDNA_ACE_PREFIX "Proprostnemluvesky-uyb24dma41a", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Hebrew", 22, { 0x05DC, 0x05DE, 0x05D4, 0x05D4, 0x05DD, 0x05E4, 0x05E9, 0x05D5, 0x05D8, 0x05DC, 0x05D0, 0x05DE, 0x05D3, 0x05D1, 0x05E8, 0x05D9, 0x05DD, 0x05E2, 0x05D1, 0x05E8, 0x05D9, 0x05EA}, IDNA_ACE_PREFIX "4dbcagdahymbxekheh6e0a7fei0b", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Hindi (Devanagari)", 30, { 0x092F, 0x0939, 0x0932, 0x094B, 0x0917, 0x0939, 0x093F, 0x0928, 0x094D, 0x0926, 0x0940, 0x0915, 0x094D, 0x092F, 0x094B, 0x0902, 0x0928, 0x0939, 0x0940, 0x0902, 0x092C, 0x094B, 0x0932, 0x0938, 0x0915, 0x0924, 0x0947, 0x0939, 0x0948, 0x0902}, IDNA_ACE_PREFIX "i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd", 0, 0, IDNA_SUCCESS}, { "Japanese (kanji and hiragana)", 18, { 0x306A, 0x305C, 0x307F, 0x3093, 0x306A, 0x65E5, 0x672C, 0x8A9E, 0x3092, 0x8A71, 0x3057, 0x3066, 0x304F, 0x308C, 0x306A, 0x3044, 0x306E, 0x304B}, IDNA_ACE_PREFIX "n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa", 0, 0, IDNA_SUCCESS}, { "Russian (Cyrillic)", 28, { 0x043F, 0x043E, 0x0447, 0x0435, 0x043C, 0x0443, 0x0436, 0x0435, 0x043E, 0x043D, 0x0438, 0x043D, 0x0435, 0x0433, 0x043E, 0x0432, 0x043E, 0x0440, 0x044F, 0x0442, 0x043F, 0x043E, 0x0440, 0x0443, 0x0441, 0x0441, 0x043A, 0x0438}, IDNA_ACE_PREFIX "b1abfaaepdrnnbgefbadotcwatmq2g4l", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { Josefsson Expires August 2, 2003 [Page 37] Internet-Draft Nameprep and IDNA Test Vectors February 2003 "Spanish", 40, { 0x0050, 0x006F, 0x0072, 0x0071, 0x0075, 0x00E9, 0x006E, 0x006F, 0x0070, 0x0075, 0x0065, 0x0064, 0x0065, 0x006E, 0x0073, 0x0069, 0x006D, 0x0070, 0x006C, 0x0065, 0x006D, 0x0065, 0x006E, 0x0074, 0x0065, 0x0068, 0x0061, 0x0062, 0x006C, 0x0061, 0x0072, 0x0065, 0x006E, 0x0045, 0x0073, 0x0070, 0x0061, 0x00F1, 0x006F, 0x006C}, IDNA_ACE_PREFIX "PorqunopuedensimplementehablarenEspaol-fmd56a", 0, 0, IDNA_SUCCESS}, { "Vietnamese", 31, { 0x0054, 0x1EA1, 0x0069, 0x0073, 0x0061, 0x006F, 0x0068, 0x1ECD, 0x006B, 0x0068, 0x00F4, 0x006E, 0x0067, 0x0074, 0x0068, 0x1EC3, 0x0063, 0x0068, 0x1EC9, 0x006E, 0x00F3, 0x0069, 0x0074, 0x0069, 0x1EBF, 0x006E, 0x0067, 0x0056, 0x0069, 0x1EC7, 0x0074}, IDNA_ACE_PREFIX "TisaohkhngthchnitingVit-kjcr8268qyxafd2f1b9g", 0, 0, IDNA_SUCCESS}, { "Japanese", 8, { 0x0033, 0x5E74, 0x0042, 0x7D44, 0x91D1, 0x516B, 0x5148, 0x751F}, IDNA_ACE_PREFIX "3B-ww4c5e180e575a65lsy2b", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Japanese", 24, { 0x5B89, 0x5BA4, 0x5948, 0x7F8E, 0x6075, 0x002D, 0x0077, 0x0069, 0x0074, 0x0068, 0x002D, 0x0053, 0x0055, 0x0050, 0x0045, 0x0052, 0x002D, 0x004D, 0x004F, 0x004E, 0x004B, 0x0045, 0x0059, 0x0053}, IDNA_ACE_PREFIX "-with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n", 0, 0, IDNA_SUCCESS}, { "Japanese", 25, { 0x0048, 0x0065, 0x006C, 0x006C, 0x006F, 0x002D, 0x0041, 0x006E, 0x006F, 0x0074, 0x0068, 0x0065, 0x0072, 0x002D, 0x0057, 0x0061, 0x0079, 0x002D, 0x305D, 0x308C, 0x305E, 0x308C, 0x306E, 0x5834, 0x6240}, IDNA_ACE_PREFIX "Hello-Another-Way--fc4qua05auwb3674vfr0b", 0, 0, IDNA_SUCCESS}, { "Japanese", 8, { 0x3072, 0x3068, 0x3064, 0x5C4B, 0x6839, 0x306E, 0x4E0B, 0x0032}, IDNA_ACE_PREFIX "2-u9tlzr9756bt3uc0v", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { Josefsson Expires August 2, 2003 [Page 38] Internet-Draft Nameprep and IDNA Test Vectors February 2003 "Japanese", 13, { 0x004D, 0x0061, 0x006A, 0x0069, 0x3067, 0x004B, 0x006F, 0x0069, 0x3059, 0x308B, 0x0035, 0x79D2, 0x524D}, IDNA_ACE_PREFIX "MajiKoi5-783gue6qz075azm5e", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Japanese", 9, { 0x30D1, 0x30D5, 0x30A3, 0x30FC, 0x0064, 0x0065, 0x30EB, 0x30F3, 0x30D0}, IDNA_ACE_PREFIX "de-jg4avhby1noc0d", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Japanese", 7, { 0x305D, 0x306E, 0x30B9, 0x30D4, 0x30FC, 0x30C9, 0x3067}, IDNA_ACE_PREFIX "d9juau41awczczp", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Greek", 8, {0x03b5, 0x03bb, 0x03bb, 0x03b7, 0x03bd, 0x03b9, 0x03ba, 0x03ac}, IDNA_ACE_PREFIX "hxargifdar", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Maltese (Malti)", 10, {0x0062, 0x006f, 0x006e, 0x0121, 0x0075, 0x0073, 0x0061, 0x0127, 0x0127, 0x0061}, IDNA_ACE_PREFIX "bonusaa-5bb1da", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Russian (Cyrillic)", 28, {0x043f, 0x043e, 0x0447, 0x0435, 0x043c, 0x0443, 0x0436, 0x0435, 0x043e, 0x043d, 0x0438, 0x043d, 0x0435, 0x0433, 0x043e, 0x0432, 0x043e, 0x0440, 0x044f, 0x0442, 0x043f, 0x043e, 0x0440, 0x0443, 0x0441, 0x0441, 0x043a, 0x0438}, IDNA_ACE_PREFIX "b1abfaaepdrnnbgefbadotcwatmq2g4l", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, }; Josefsson Expires August 2, 2003 [Page 39] Internet-Draft Nameprep and IDNA Test Vectors February 2003 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Full Copyright Statement Copyright (C) Simon Josefsson (2003). All Rights Reserved. Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING Josefsson Expires August 2, 2003 [Page 40] Internet-Draft Nameprep and IDNA Test Vectors February 2003 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society. Josefsson Expires August 2, 2003 [Page 41]