956 lines
37 KiB
Plaintext
956 lines
37 KiB
Plaintext
|
||
|
||
|
||
|
||
|
||
|
||
Internet Engineering Task Force (IETF) J. Klensin
|
||
Request for Comments: 5891 August 2010
|
||
Obsoletes: 3490, 3491
|
||
Updates: 3492
|
||
Category: Standards Track
|
||
ISSN: 2070-1721
|
||
|
||
|
||
Internationalized Domain Names in Applications (IDNA): Protocol
|
||
|
||
Abstract
|
||
|
||
This document is the revised protocol definition for
|
||
Internationalized Domain Names (IDNs). The rationale for changes,
|
||
the relationship to the older specification, and important
|
||
terminology are provided in other documents. This document specifies
|
||
the protocol mechanism, called Internationalized Domain Names in
|
||
Applications (IDNA), for registering and looking up IDNs in a way
|
||
that does not require changes to the DNS itself. IDNA is only meant
|
||
for processing domain names, not free text.
|
||
|
||
Status of This Memo
|
||
|
||
This is an Internet Standards Track document.
|
||
|
||
This document is a product of the Internet Engineering Task Force
|
||
(IETF). It represents the consensus of the IETF community. It has
|
||
received public review and has been approved for publication by the
|
||
Internet Engineering Steering Group (IESG). Further information on
|
||
Internet Standards is available in Section 2 of RFC 5741.
|
||
|
||
Information about the current status of this document, any errata,
|
||
and how to provide feedback on it may be obtained at
|
||
http://www.rfc-editor.org/info/rfc5891.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 1]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
Copyright Notice
|
||
|
||
Copyright (c) 2010 IETF Trust and the persons identified as the
|
||
document authors. All rights reserved.
|
||
|
||
This document is subject to BCP 78 and the IETF Trust's Legal
|
||
Provisions Relating to IETF Documents
|
||
(http://trustee.ietf.org/license-info) in effect on the date of
|
||
publication of this document. Please review these documents
|
||
carefully, as they describe your rights and restrictions with respect
|
||
to this document. Code Components extracted from this document must
|
||
include Simplified BSD License text as described in Section 4.e of
|
||
the Trust Legal Provisions and are provided without warranty as
|
||
described in the Simplified BSD License.
|
||
|
||
This document may contain material from IETF Documents or IETF
|
||
Contributions published or made publicly available before November
|
||
10, 2008. The person(s) controlling the copyright in some of this
|
||
material may not have granted the IETF Trust the right to allow
|
||
modifications of such material outside the IETF Standards Process.
|
||
Without obtaining an adequate license from the person(s) controlling
|
||
the copyright in such materials, this document may not be modified
|
||
outside the IETF Standards Process, and derivative works of it may
|
||
not be created outside the IETF Standards Process, except to format
|
||
it for publication as an RFC or to translate it into languages other
|
||
than English.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 2]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
Table of Contents
|
||
|
||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
|
||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
|
||
3. Requirements and Applicability . . . . . . . . . . . . . . . . 5
|
||
3.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 5
|
||
3.2. Applicability . . . . . . . . . . . . . . . . . . . . . . 5
|
||
3.2.1. DNS Resource Records . . . . . . . . . . . . . . . . . 6
|
||
3.2.2. Non-Domain-Name Data Types Stored in the DNS . . . . . 6
|
||
4. Registration Protocol . . . . . . . . . . . . . . . . . . . . 6
|
||
4.1. Input to IDNA Registration . . . . . . . . . . . . . . . . 7
|
||
4.2. Permitted Character and Label Validation . . . . . . . . . 7
|
||
4.2.1. Input Format . . . . . . . . . . . . . . . . . . . . . 7
|
||
4.2.2. Rejection of Characters That Are Not Permitted . . . . 8
|
||
4.2.3. Label Validation . . . . . . . . . . . . . . . . . . . 8
|
||
4.2.4. Registration Validation Requirements . . . . . . . . . 9
|
||
4.3. Registry Restrictions . . . . . . . . . . . . . . . . . . 9
|
||
4.4. Punycode Conversion . . . . . . . . . . . . . . . . . . . 9
|
||
4.5. Insertion in the Zone . . . . . . . . . . . . . . . . . . 10
|
||
5. Domain Name Lookup Protocol . . . . . . . . . . . . . . . . . 10
|
||
5.1. Label String Input . . . . . . . . . . . . . . . . . . . . 10
|
||
5.2. Conversion to Unicode . . . . . . . . . . . . . . . . . . 10
|
||
5.3. A-label Input . . . . . . . . . . . . . . . . . . . . . . 10
|
||
5.4. Validation and Character List Testing . . . . . . . . . . 11
|
||
5.5. Punycode Conversion . . . . . . . . . . . . . . . . . . . 13
|
||
5.6. DNS Name Resolution . . . . . . . . . . . . . . . . . . . 13
|
||
6. Security Considerations . . . . . . . . . . . . . . . . . . . 13
|
||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
|
||
8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 13
|
||
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14
|
||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
|
||
10.1. Normative References . . . . . . . . . . . . . . . . . . . 14
|
||
10.2. Informative References . . . . . . . . . . . . . . . . . . 15
|
||
Appendix A. Summary of Major Changes from IDNA2003 . . . . . . . 17
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 3]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
1. Introduction
|
||
|
||
This document supplies the protocol definition for Internationalized
|
||
Domain Names in Applications (IDNA), with the version specified here
|
||
known as IDNA2008. Essential definitions and terminology for
|
||
understanding this document and a road map of the collection of
|
||
documents that make up IDNA2008 appear in a separate Definitions
|
||
document [RFC5890]. Appendix A discusses the relationship between
|
||
this specification and the earlier version of IDNA (referred to here
|
||
as "IDNA2003"). The rationale for these changes, along with
|
||
considerable explanatory material and advice to zone administrators
|
||
who support IDNs, is provided in another document, known informally
|
||
in this series as the "Rationale document" [RFC5894].
|
||
|
||
IDNA works by allowing applications to use certain ASCII [ASCII]
|
||
string labels (beginning with a special prefix) to represent
|
||
non-ASCII name labels. Lower-layer protocols need not be aware of
|
||
this; therefore, IDNA does not change any infrastructure. In
|
||
particular, IDNA does not depend on any changes to DNS servers,
|
||
resolvers, or DNS protocol elements, because the ASCII name service
|
||
provided by the existing DNS can be used for IDNA.
|
||
|
||
IDNA applies only to a specific subset of DNS labels. The base DNS
|
||
standards [RFC1034] [RFC1035] and their various updates specify how
|
||
to combine labels into fully-qualified domain names and parse labels
|
||
out of those names.
|
||
|
||
This document describes two separate protocols, one for IDN
|
||
registration (Section 4) and one for IDN lookup (Section 5). These
|
||
two protocols share some terminology, reference data, and operations.
|
||
|
||
2. Terminology
|
||
|
||
As mentioned above, terminology used as part of the definition of
|
||
IDNA appears in the Definitions document [RFC5890]. It is worth
|
||
noting that some of this terminology overlaps with, and is consistent
|
||
with, that used in Unicode or other character set standards and the
|
||
DNS. Readers of this document are assumed to be familiar with the
|
||
associated Definitions document and with the DNS-specific terminology
|
||
in RFC 1034 [RFC1034].
|
||
|
||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
|
||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
|
||
document are to be interpreted as described in BCP 14, RFC 2119
|
||
[RFC2119].
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 4]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
3. Requirements and Applicability
|
||
|
||
3.1. Requirements
|
||
|
||
IDNA makes the following requirements:
|
||
|
||
1. Whenever a domain name is put into a domain name slot that is not
|
||
IDNA-aware (see Section 2.3.2.6 of the Definitions document
|
||
[RFC5890]), it MUST contain only ASCII characters (i.e., its
|
||
labels must be either A-labels or NR-LDH labels), unless the DNS
|
||
application is not subject to historical recommendations for
|
||
"hostname"-style names (see RFC 1034 [RFC1034] and
|
||
Section 3.2.1).
|
||
|
||
2. Labels MUST be compared using equivalent forms: either both
|
||
A-label forms or both U-label forms. Because A-labels and
|
||
U-labels can be transformed into each other without loss of
|
||
information, these comparisons are equivalent (however, in
|
||
practice, comparison of U-labels requires first verifying that
|
||
they actually are U-labels and not just Unicode strings). A pair
|
||
of A-labels MUST be compared as case-insensitive ASCII (as with
|
||
all comparisons of ASCII DNS labels). U-labels MUST be compared
|
||
as-is, without case folding or other intermediate steps. While
|
||
it is not necessary to validate labels in order to compare them,
|
||
successful comparison does not imply validity. In many cases,
|
||
not limited to comparison, validation may be important for other
|
||
reasons and SHOULD be performed.
|
||
|
||
3. Labels being registered MUST conform to the requirements of
|
||
Section 4. Labels being looked up and the lookup process MUST
|
||
conform to the requirements of Section 5.
|
||
|
||
3.2. Applicability
|
||
|
||
IDNA applies to all domain names in all domain name slots in
|
||
protocols except where it is explicitly excluded. It does not apply
|
||
to domain name slots that do not use the LDH syntax rules as
|
||
described in the Definitions document [RFC5890].
|
||
|
||
Because it uses the DNS, IDNA applies to many protocols that were
|
||
specified before it was designed. IDNs occupying domain name slots
|
||
in those older protocols MUST be in A-label form until and unless
|
||
those protocols and their implementations are explicitly upgraded to
|
||
be aware of IDNs and to accept the U-label form. IDNs actually
|
||
appearing in DNS queries or responses MUST be A-labels.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 5]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
IDNA-aware protocols and implementations MAY accept U-labels,
|
||
A-labels, or both as those particular protocols specify. IDNA is not
|
||
defined for extended label types (see RFC 2671 [RFC2671], Section 3).
|
||
|
||
3.2.1. DNS Resource Records
|
||
|
||
IDNA applies only to domain names in the NAME and RDATA fields of DNS
|
||
resource records whose CLASS is IN. See the DNS specification
|
||
[RFC1035] for precise definitions of these terms.
|
||
|
||
The application of IDNA to DNS resource records depends entirely on
|
||
the CLASS of the record, and not on the TYPE except as noted below.
|
||
This will remain true, even as new TYPEs are defined, unless a new
|
||
TYPE defines TYPE-specific rules. Special naming conventions for SRV
|
||
records (and "underscore labels" more generally) are incompatible
|
||
with IDNA coding as discussed in the Definitions document [RFC5890],
|
||
especially Section 2.3.2.3. Of course, underscore labels may be part
|
||
of a domain that uses IDN labels at higher levels in the tree.
|
||
|
||
3.2.2. Non-Domain-Name Data Types Stored in the DNS
|
||
|
||
Although IDNA enables the representation of non-ASCII characters in
|
||
domain names, that does not imply that IDNA enables the
|
||
representation of non-ASCII characters in other data types that are
|
||
stored in domain names, specifically in the RDATA field for types
|
||
that have structured RDATA format. For example, an email address
|
||
local part is stored in a domain name in the RNAME field as part of
|
||
the RDATA of an SOA record (e.g., hostmaster@example.com would be
|
||
represented as hostmaster.example.com). IDNA does not update the
|
||
existing email standards, which allow only ASCII characters in local
|
||
parts. Even though work is in progress to define
|
||
internationalization for email addresses [RFC4952], changes to the
|
||
email address part of the SOA RDATA would require action in, or
|
||
updates to, other standards, specifically those that specify the
|
||
format of the SOA RR.
|
||
|
||
4. Registration Protocol
|
||
|
||
This section defines the model for registering an IDN. The model is
|
||
implementation independent; any sequence of steps that produces
|
||
exactly the same result for all labels is considered a valid
|
||
implementation.
|
||
|
||
Note that, while the registration (this section) and lookup protocols
|
||
(Section 5) are very similar in most respects, they are not
|
||
identical, and implementers should carefully follow the steps
|
||
described in this specification.
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 6]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
4.1. Input to IDNA Registration
|
||
|
||
Registration processes, especially processing by entities (often
|
||
called "registrars") who deal with registrants before the request
|
||
actually reaches the zone manager ("registry") are outside the scope
|
||
of this definition and may differ significantly depending on local
|
||
needs. By the time a string enters the IDNA registration process as
|
||
described in this specification, it MUST be in Unicode and in
|
||
Normalization Form C (NFC [Unicode-UAX15]). Entities responsible for
|
||
zone files ("registries") MUST accept only the exact string for which
|
||
registration is requested, free of any mappings or local adjustments.
|
||
They MAY accept that input in any of three forms:
|
||
|
||
1. As a pair of A-label and U-label.
|
||
|
||
2. As an A-label only.
|
||
|
||
3. As a U-label only.
|
||
|
||
The first two of these forms are RECOMMENDED because the use of
|
||
A-labels avoids any possibility of ambiguity. The first is normally
|
||
preferred over the second because it permits further verification of
|
||
user intent (see Section 4.2.1).
|
||
|
||
4.2. Permitted Character and Label Validation
|
||
|
||
4.2.1. Input Format
|
||
|
||
If both the U-label and A-label forms are available, the registry
|
||
MUST ensure that the A-label form is in lowercase, perform a
|
||
conversion to a U-label, perform the steps and tests described below
|
||
on that U-label, and then verify that the A-label produced by the
|
||
step in Section 4.4 matches the one provided as input. In addition,
|
||
the U-label that was provided as input and the one obtained by
|
||
conversion of the A-label MUST match exactly. If, for some reason,
|
||
these tests fail, the registration MUST be rejected.
|
||
|
||
If only an A-label was provided and the conversion to a U-label is
|
||
not performed, the registry MUST still verify that the A-label is
|
||
superficially valid, i.e., that it does not violate any of the rules
|
||
of Punycode encoding [RFC3492] such as the prohibition on trailing
|
||
hyphen-minus, the requirement that all characters be ASCII, and so
|
||
on. Strings that appear to be A-labels (e.g., they start with
|
||
"xn--") and strings that are supplied to the registry in a context
|
||
reserved for A-labels (such as a field in a form to be filled out),
|
||
but that are not valid A-labels as described in this paragraph, MUST
|
||
NOT be placed in DNS zones that support IDNA.
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 7]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
If only an A-label is provided, the conversion to a U-label is not
|
||
performed, but the superficial tests described in the previous
|
||
paragraph are performed, registration procedures MAY, and usually
|
||
will, bypass the tests and actions in the balance of Section 4.2 and
|
||
in Sections 4.3 and 4.4.
|
||
|
||
4.2.2. Rejection of Characters That Are Not Permitted
|
||
|
||
The candidate Unicode string MUST NOT contain characters that appear
|
||
in the "DISALLOWED" and "UNASSIGNED" lists specified in the Tables
|
||
document [RFC5892].
|
||
|
||
4.2.3. Label Validation
|
||
|
||
The proposed label (in the form of a Unicode string, i.e., a string
|
||
that at least superficially appears to be a U-label) is then examined
|
||
using tests that require examination of more than one character.
|
||
Character order is considered to be the on-the-wire order. That
|
||
order may not be the same as the display order.
|
||
|
||
4.2.3.1. Hyphen Restrictions
|
||
|
||
The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
|
||
the third and fourth character positions and MUST NOT start or end
|
||
with a "-" (hyphen).
|
||
|
||
4.2.3.2. Leading Combining Marks
|
||
|
||
The Unicode string MUST NOT begin with a combining mark or combining
|
||
character (see The Unicode Standard, Section 2.11 [Unicode] for an
|
||
exact definition).
|
||
|
||
4.2.3.3. Contextual Rules
|
||
|
||
The Unicode string MUST NOT contain any characters whose validity is
|
||
context-dependent, unless the validity is positively confirmed by a
|
||
contextual rule. To check this, each code point identified as
|
||
CONTEXTJ or CONTEXTO in the Tables document [RFC5892] MUST have a
|
||
non-null rule. If such a code point is missing a rule, the label is
|
||
invalid. If the rule exists but the result of applying the rule is
|
||
negative or inconclusive, the proposed label is invalid.
|
||
|
||
4.2.3.4. Labels Containing Characters Written Right to Left
|
||
|
||
If the proposed label contains any characters from scripts that are
|
||
written from right to left, it MUST meet the Bidi criteria [RFC5893].
|
||
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 8]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
4.2.4. Registration Validation Requirements
|
||
|
||
Strings that contain at least one non-ASCII character, have been
|
||
produced by the steps above, whose contents pass all of the tests in
|
||
Section 4.2.3, and are 63 or fewer characters long in
|
||
ASCII-compatible encoding (ACE) form (see Section 4.4), are U-labels.
|
||
|
||
To summarize, tests are made in Section 4.2 for invalid characters,
|
||
invalid combinations of characters, for labels that are invalid even
|
||
if the characters they contain are valid individually, and for labels
|
||
that do not conform to the restrictions for strings containing
|
||
right-to-left characters.
|
||
|
||
4.3. Registry Restrictions
|
||
|
||
In addition to the rules and tests above, there are many reasons why
|
||
a registry could reject a label. Registries at all levels of the
|
||
DNS, not just the top level, are expected to establish policies about
|
||
label registrations. Policies are likely to be informed by the local
|
||
languages and the scripts that are used to write them and may depend
|
||
on many factors including what characters are in the label (for
|
||
example, a label may be rejected based on other labels already
|
||
registered). See the Rationale document [RFC5894], Section 3.2, for
|
||
further discussion and recommendations about registry policies.
|
||
|
||
The string produced by the steps in Section 4.2 is checked and
|
||
processed as appropriate to local registry restrictions. Application
|
||
of those registry restrictions may result in the rejection of some
|
||
labels or the application of special restrictions to others.
|
||
|
||
4.4. Punycode Conversion
|
||
|
||
The resulting U-label is converted to an A-label (defined in Section
|
||
2.3.2.1 of the Definitions document [RFC5890]). The A-label is the
|
||
encoding of the U-label according to the Punycode algorithm [RFC3492]
|
||
with the ACE prefix "xn--" added at the beginning of the string. The
|
||
resulting string must, of course, conform to the length limits
|
||
imposed by the DNS. This document does not update or alter the
|
||
Punycode algorithm specified in RFC 3492 in any way. RFC 3492 does
|
||
make a non-normative reference to the information about the value and
|
||
construction of the ACE prefix that appears in RFC 3490 or Nameprep
|
||
[RFC3491]. For consistency and reader convenience, IDNA2008
|
||
effectively updates that reference to point to this document. That
|
||
change does not alter the prefix itself. The prefix, "xn--", is the
|
||
same in both sets of documents.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 9]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
With the exception of the maximum string length test on Punycode
|
||
output, the failure conditions identified in the Punycode encoding
|
||
procedure cannot occur if the input is a U-label as determined by the
|
||
steps in Sections 4.1 through 4.3 above.
|
||
|
||
4.5. Insertion in the Zone
|
||
|
||
The label is registered in the DNS by inserting the A-label into a
|
||
zone.
|
||
|
||
5. Domain Name Lookup Protocol
|
||
|
||
Lookup is different from registration and different tests are applied
|
||
on the client. Although some validity checks are necessary to avoid
|
||
serious problems with the protocol, the lookup-side tests are more
|
||
permissive and rely on the assumption that names that are present in
|
||
the DNS are valid. That assumption is, however, a weak one because
|
||
the presence of wildcards in the DNS might cause a string that is not
|
||
actually registered in the DNS to be successfully looked up.
|
||
|
||
5.1. Label String Input
|
||
|
||
The user supplies a string in the local character set, for example,
|
||
by typing it, clicking on it, or copying and pasting it from a
|
||
resource identifier, e.g., a Uniform Resource Identifier (URI)
|
||
[RFC3986] or an Internationalized Resource Identifier (IRI)
|
||
[RFC3987], from which the domain name is extracted. Alternately,
|
||
some process not directly involving the user may read the string from
|
||
a file or obtain it in some other way. Processing in this step and
|
||
the one specified in Section 5.2 are local matters, to be
|
||
accomplished prior to actual invocation of IDNA.
|
||
|
||
5.2. Conversion to Unicode
|
||
|
||
The string is converted from the local character set into Unicode, if
|
||
it is not already in Unicode. Depending on local needs, this
|
||
conversion may involve mapping some characters into other characters
|
||
as well as coding conversions. Those issues are discussed in the
|
||
mapping-related sections (Sections 4.2, 4.4, 6, and 7.3) of the
|
||
Rationale document [RFC5894] and in the separate Mapping document
|
||
[IDNA2008-Mapping]. The result MUST be a Unicode string in NFC form.
|
||
|
||
5.3. A-label Input
|
||
|
||
If the input to this procedure appears to be an A-label (i.e., it
|
||
starts in "xn--", interpreted case-insensitively), the lookup
|
||
application MAY attempt to convert it to a U-label, first ensuring
|
||
that the A-label is entirely in lowercase (converting it to lowercase
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 10]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
if necessary), and apply the tests of Section 5.4 and the conversion
|
||
of Section 5.5 to that form. If the label is converted to Unicode
|
||
(i.e., to U-label form) using the Punycode decoding algorithm, then
|
||
the processing specified in those two sections MUST be performed, and
|
||
the label MUST be rejected if the resulting label is not identical to
|
||
the original. See Section 8.1 of the Rationale document [RFC5894]
|
||
for additional discussion on this topic.
|
||
|
||
Conversion from the A-label and testing that the result is a U-label
|
||
SHOULD be performed if the domain name will later be presented to the
|
||
user in native character form (this requires that the lookup
|
||
application be IDNA-aware). If those steps are not performed, the
|
||
lookup process SHOULD at least test to determine that the string is
|
||
actually an A-label, examining it for the invalid formats specified
|
||
in the Punycode decoding specification. Applications that are not
|
||
IDNA-aware will obviously omit that testing; others MAY treat the
|
||
string as opaque to avoid the additional processing at the expense of
|
||
providing less protection and information to users.
|
||
|
||
5.4. Validation and Character List Testing
|
||
|
||
As with the registration procedure described in Section 4, the
|
||
Unicode string is checked to verify that all characters that appear
|
||
in it are valid as input to IDNA lookup processing. As discussed
|
||
above and in the Rationale document [RFC5894], the lookup check is
|
||
more liberal than the registration one. Labels that have not been
|
||
fully evaluated for conformance to the applicable rules are referred
|
||
to as "putative" labels as discussed in Section 2.3.2.1 of the
|
||
Definitions document [RFC5890]. Putative U-labels with any of the
|
||
following characteristics MUST be rejected prior to DNS lookup:
|
||
|
||
o Labels that are not in NFC [Unicode-UAX15].
|
||
|
||
o Labels containing "--" (two consecutive hyphens) in the third and
|
||
fourth character positions.
|
||
|
||
o Labels whose first character is a combining mark (see The Unicode
|
||
Standard, Section 2.11 [Unicode]).
|
||
|
||
o Labels containing prohibited code points, i.e., those that are
|
||
assigned to the "DISALLOWED" category of the Tables document
|
||
[RFC5892].
|
||
|
||
o Labels containing code points that are identified in the Tables
|
||
document as "CONTEXTJ", i.e., requiring exceptional contextual
|
||
rule processing on lookup, but that do not conform to those rules.
|
||
Note that this implies that a rule must be defined, not null: a
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 11]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
character that requires a contextual rule but for which the rule
|
||
is null is treated in this step as having failed to conform to the
|
||
rule.
|
||
|
||
o Labels containing code points that are identified in the Tables
|
||
document as "CONTEXTO", but for which no such rule appears in the
|
||
table of rules. Applications resolving DNS names or carrying out
|
||
equivalent operations are not required to test contextual rules
|
||
for "CONTEXTO" characters, only to verify that a rule is defined
|
||
(although they MAY make such tests to provide better protection or
|
||
give better information to the user).
|
||
|
||
o Labels containing code points that are unassigned in the version
|
||
of Unicode being used by the application, i.e., in the UNASSIGNED
|
||
category of the Tables document.
|
||
|
||
This requirement means that the application must use a list of
|
||
unassigned characters that is matched to the version of Unicode
|
||
that is being used for the other requirements in this section. It
|
||
is not required that the application know which version of Unicode
|
||
is being used; that information might be part of the operating
|
||
environment in which the application is running.
|
||
|
||
In addition, the application SHOULD apply the following test.
|
||
|
||
o Verification that the string is compliant with the requirements
|
||
for right-to-left characters specified in the Bidi document
|
||
[RFC5893].
|
||
|
||
This test may be omitted in special circumstances, such as when the
|
||
lookup application knows that the conditions are enforced elsewhere,
|
||
because an attempt to look up and resolve such strings will almost
|
||
certainly lead to a DNS lookup failure except when wildcards are
|
||
present in the zone. However, applying the test is likely to give
|
||
much better information about the reason for a lookup failure --
|
||
information that may be usefully passed to the user when that is
|
||
feasible -- than DNS resolution failure information alone.
|
||
|
||
For all other strings, the lookup application MUST rely on the
|
||
presence or absence of labels in the DNS to determine the validity of
|
||
those labels and the validity of the characters they contain. If
|
||
they are registered, they are presumed to be valid; if they are not,
|
||
their possible validity is not relevant. While a lookup application
|
||
may reasonably issue warnings about strings it believes may be
|
||
problematic, applications that decline to process a string that
|
||
conforms to the rules above (i.e., does not look it up in the DNS)
|
||
are not in conformance with this protocol.
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 12]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
5.5. Punycode Conversion
|
||
|
||
The string that has now been validated for lookup is converted to ACE
|
||
form by applying the Punycode algorithm to the string and then adding
|
||
the ACE prefix ("xn--").
|
||
|
||
5.6. DNS Name Resolution
|
||
|
||
The A-label resulting from the conversion in Section 5.5 or supplied
|
||
directly (see Section 5.3) is combined with other labels as needed to
|
||
form a fully-qualified domain name that is then looked up in the DNS,
|
||
using normal DNS resolver procedures. The lookup can obviously
|
||
either succeed (returning information) or fail.
|
||
|
||
6. Security Considerations
|
||
|
||
Security Considerations for this version of IDNA are described in the
|
||
Definitions document [RFC5890], except for the special issues
|
||
associated with right-to-left scripts and characters. The latter are
|
||
discussed in the Bidi document [RFC5893].
|
||
|
||
In order to avoid intentional or accidental attacks from labels that
|
||
might be confused with others, special problems in rendering, and so
|
||
on, the IDNA model requires that registries exercise care and
|
||
thoughtfulness about what labels they choose to permit. That issue
|
||
is discussed in Section 4.3 of this document which, in turn, points
|
||
to a somewhat more extensive discussion in the Rationale document
|
||
[RFC5894].
|
||
|
||
7. IANA Considerations
|
||
|
||
IANA actions for this version of IDNA are specified in the Tables
|
||
document [RFC5892] and discussed informally in the Rationale document
|
||
[RFC5894]. The components of IDNA described in this document do not
|
||
require any IANA actions.
|
||
|
||
8. Contributors
|
||
|
||
While the listed editor held the pen, the original versions of this
|
||
document represent the joint work and conclusions of an ad hoc design
|
||
team consisting of the editor and, in alphabetic order, Harald
|
||
Alvestrand, Tina Dam, Patrik Faltstrom, and Cary Karp. This document
|
||
draws significantly on the original version of IDNA [RFC3490] both
|
||
conceptually and for specific text. This second-generation version
|
||
would not have been possible without the work that went into that
|
||
first version and especially the contributions of its authors Patrik
|
||
Faltstrom, Paul Hoffman, and Adam Costello. While Faltstrom was
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 13]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
actively involved in the creation of this version, Hoffman and
|
||
Costello were not and should not be held responsible for any errors
|
||
or omissions.
|
||
|
||
9. Acknowledgments
|
||
|
||
This revision to IDNA would have been impossible without the
|
||
accumulated experience since RFC 3490 was published and resulting
|
||
comments and complaints of many people in the IETF, ICANN, and other
|
||
communities (too many people to list here). Nor would it have been
|
||
possible without RFC 3490 itself and the efforts of the Working Group
|
||
that defined it. Those people whose contributions are acknowledged
|
||
in RFC 3490, RFC 4690 [RFC4690], and the Rationale document [RFC5894]
|
||
were particularly important.
|
||
|
||
Specific textual changes were incorporated into this document after
|
||
suggestions from the other contributors, Stephane Bortzmeyer, Vint
|
||
Cerf, Lisa Dusseault, Paul Hoffman, Kent Karlsson, James Mitchell,
|
||
Erik van der Poel, Marcos Sanz, Andrew Sullivan, Wil Tan, Ken
|
||
Whistler, Chris Wright, and other WG participants and reviewers
|
||
including Martin Duerst, James Mitchell, Subramanian Moonesamy, Peter
|
||
Saint-Andre, Margaret Wasserman, and Dan Winship who caught specific
|
||
errors and recommended corrections. Special thanks are due to Paul
|
||
Hoffman for permission to extract material to form the basis for
|
||
Appendix A from a draft document that he prepared.
|
||
|
||
10. References
|
||
|
||
10.1. Normative References
|
||
|
||
[RFC1034] Mockapetris, P., "Domain names - concepts and
|
||
facilities", STD 13, RFC 1034, November 1987.
|
||
|
||
[RFC1035] Mockapetris, P., "Domain names - implementation and
|
||
specification", STD 13, RFC 1035, November 1987.
|
||
|
||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
|
||
Requirement Levels", BCP 14, RFC 2119, March 1997.
|
||
|
||
[RFC3492] Costello, A., "Punycode: A Bootstring encoding of
|
||
Unicode for Internationalized Domain Names in
|
||
Applications (IDNA)", RFC 3492, March 2003.
|
||
|
||
[RFC5890] Klensin, J., "Internationalized Domain Names for
|
||
Applications (IDNA): Definitions and Document
|
||
Framework", RFC 5890, August 2010.
|
||
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 14]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
[RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and
|
||
Internationalized Domain Names for Applications (IDNA)",
|
||
RFC 5892, August 2010.
|
||
|
||
[RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts
|
||
for Internationalized Domain Names for Applications
|
||
(IDNA)", RFC 5893, August 2010.
|
||
|
||
[Unicode-UAX15]
|
||
The Unicode Consortium, "Unicode Standard Annex #15:
|
||
Unicode Normalization Forms", September 2009,
|
||
<http://www.unicode.org/reports/tr15/>.
|
||
|
||
10.2. Informative References
|
||
|
||
[ASCII] American National Standards Institute (formerly United
|
||
States of America Standards Institute), "USA Code for
|
||
Information Interchange", ANSI X3.4-1968, 1968. ANSI
|
||
X3.4-1968 has been replaced by newer versions with
|
||
slight modifications, but the 1968 version remains
|
||
definitive for the Internet.
|
||
|
||
[IDNA2008-Mapping]
|
||
Resnick, P. and P. Hoffman, "Mapping Characters in
|
||
Internationalized Domain Names for Applications (IDNA)",
|
||
Work in Progress, April 2010.
|
||
|
||
[RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",
|
||
RFC 2671, August 1999.
|
||
|
||
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
|
||
"Internationalizing Domain Names in Applications
|
||
(IDNA)", RFC 3490, March 2003.
|
||
|
||
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
|
||
Profile for Internationalized Domain Names (IDN)",
|
||
RFC 3491, March 2003.
|
||
|
||
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
|
||
Resource Identifier (URI): Generic Syntax", STD 66,
|
||
RFC 3986, January 2005.
|
||
|
||
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
|
||
Identifiers (IRIs)", RFC 3987, January 2005.
|
||
|
||
[RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review
|
||
and Recommendations for Internationalized Domain Names
|
||
(IDNs)", RFC 4690, September 2006.
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 15]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
[RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for
|
||
Internationalized Email", RFC 4952, July 2007.
|
||
|
||
[RFC5894] Klensin, J., "Internationalized Domain Names for
|
||
Applications (IDNA): Background, Explanation, and
|
||
Rationale", RFC 5894, August 2010.
|
||
|
||
[Unicode] The Unicode Consortium, "The Unicode Standard, Version
|
||
5.0", 2007. Boston, MA, USA: Addison-Wesley. ISBN
|
||
0-321-48091-0. This printed reference has now been
|
||
updated online to reflect additional code points. For
|
||
code points, the reference at the time this document was
|
||
published is to Unicode 5.2.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 16]
|
||
|
||
RFC 5891 IDNA2008 Protocol August 2010
|
||
|
||
|
||
Appendix A. Summary of Major Changes from IDNA2003
|
||
|
||
1. Update base character set from Unicode 3.2 to Unicode version
|
||
agnostic.
|
||
|
||
2. Separate the definitions for the "registration" and "lookup"
|
||
activities.
|
||
|
||
3. Disallow symbol and punctuation characters except where special
|
||
exceptions are necessary.
|
||
|
||
4. Remove the mapping and normalization steps from the protocol and
|
||
have them, instead, done by the applications themselves,
|
||
possibly in a local fashion, before invoking the protocol.
|
||
|
||
5. Change the way that the protocol specifies which characters are
|
||
allowed in labels from "humans decide what the table of code
|
||
points contains" to "decision about code points are based on
|
||
Unicode properties plus a small exclusion list created by
|
||
humans".
|
||
|
||
6. Introduce the new concept of characters that can be used only in
|
||
specific contexts.
|
||
|
||
7. Allow typical words and names in languages such as Dhivehi and
|
||
Yiddish to be expressed.
|
||
|
||
8. Make bidirectional domain names (delimited strings of labels,
|
||
not just labels standing on their own) display in a less
|
||
surprising fashion, whether they appear in obvious domain name
|
||
contexts or as part of running text in paragraphs.
|
||
|
||
9. Remove the dot separator from the mandatory part of the
|
||
protocol.
|
||
|
||
10. Make some currently valid labels that are not actually IDNA
|
||
labels invalid.
|
||
|
||
Author's Address
|
||
|
||
John C Klensin
|
||
1770 Massachusetts Ave, Ste 322
|
||
Cambridge, MA 02140
|
||
USA
|
||
|
||
Phone: +1 617 245 1457
|
||
EMail: john+ietf@jck.com
|
||
|
||
|
||
|
||
|
||
Klensin Standards Track [Page 17]
|
||
|