XIDNA-ZONE

From IUCG - Internet Users Contributing Group

Jump to: navigation, search

While the Domain Name System (DNS) has allowed any octets in labels and ressource records since its inception, traditionally, addresses used for hostnames, email addresses, and other names actuall used in the DNS were limited to ASCII characters.

This memo defines an extension to allow the use of a wide range of internationalised addresses that need to be converted to ACE form on the wire, including domain names and email addresses in zone master files, without removing the ability to include arbitrary octets.

.

Contents


1. Introduction

The X-IDNA base specification ([I-D.teint-xidna-base]) provides a generic framework for internationalisation of addresses, based on IDNA. This memo defines an X-IDNA Profile for use with DNS zone master files, as defined in Section 5 of [RFC1035].

It also defines the charset encoding for zone files.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].


2. Profile Definition

2.1. Applicability

This X-IDNA Profile applies to "<domain-name>" syntax elements defined in section 5.1 of [RFC1035]

2.2. Normalisation

The "<domain-name>" syntax elements MUST be normalised in a way that satisfies the following conditions:

  • Numeric escapes ("\DDD", [RFC1035], Section 5.1) MUST NOT encode label characters ([I-D.teint-xidna-base], Section 4.5).
  • Numeric escapes ("\DDD", [RFC1035], Section 5.1) MUST be be separated from subsequent label characters. Implementatiosn MAY simply insert a "\" (U+002F) after the numeric escape sequence or they MAY treat the whole sequence as a separator.
  • A "\" (U+002F) character MUST NOT appear between two label characters ([I-D.teint-xidna-base], Section 4.5).

This can be achieved by doing the following substitutions, in this order:

1. In sequences of an unquoted "\" (U+002F) followed by an ASCII letter (U+0041..U+005A, U+0061..U+007A) or a HYPHEN-MINUS (U+002D), the "\" is removed.

2. A sequence of an unquoted "\" (U+002F) followed by three ASCII digits that, if interpreted as an octal number, represent an ASCII letter (U+0041..U+005A, 0+0061..U+007A), ASCII digit (U+0030..U+0039) or HYPHEN-MINUS (U+002D), is replaced with that character.

3. Between a sequences of an unquoted "\" (U+002F) followed by any other combination of three ASCII digits and a succeeding ASCII letter (U+0041..U+005A, 0+0061..U+007A), ASCII digit (U+0030..U+ 0039) or HYPHEN-MINUS (U+002D), an additional "\" is inserted.

NOTE: Implementations must take care that 'unquoted "\" (U+002F)' means neither 'any "\"' nor 'any "\" not preceeded by another "\"'.

For example, in the sequences "\\\A" and "\\A", the first and third character are unquoted; these sequences encode the same string and would both map to "\\A".

2.3. Validation

Validation of domain-names used as host names is subject to the the Registration Protocol described in Section 4 of [I-D.ietf-idnabis-protocol].

Validation of domain-names not used as host names depends on the type of data encoded as a domain name. It is expected that when the named is entered into a zonefile, it refers to an address defined elsewhere and thus already has been validated. For example, the RNAME field of the SOA record refers to an email address that has already been created on a mail server. Therefore, not additional validation is required.


3. Zone File Charset

Zone master files SHOULD use the UTF-8 charset defined in [RFC2279].

Implementations MAY also allow and detect UTF-16 and and UTF-32 (Section 2.5 of [Unicode]) character encodings.

A BOM (U+FEFF) at the beginning ot a master file SHOULD be removed.


4. Arbitrary Octets

Arbitrary octets can be embedded in domain-names using the "\DDD" escape mechanism defined in Section 5.1 of [RFC1035], i.e. by using a "\", followed by a three-digit octal representation of that characters.

The normalisation ensures that these escaped characters are passed through the X-IDNA processing as-is. When all X-IDNA labels have been converted to ACE form, the zone can be inserted into the server's database normally, decoding the octets during the converstion to binary form.


5. IANA Considerations

This memo includes no request to IANA.


6. Security Considerations

See the Security Considerations of [I-D.ietf-idnabis-defs] and [I-D.ietf-idnabis-bidi] for information on other issues.


7. References

7.1. Normative References

[I-D.ietf-idnabis-bidi] Alvestrand, H. and C. Karp, "Right-to-left scripts for IDNA", draft-ietf-idnabis-bidi-07 (work in progress), January 2010.

[I-D.ietf-idnabis-defs] Klensin, J., "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework", draft-ietf-idnabis-defs-13 (work in progress), January 2010.

[I-D.ietf-idnabis-protocol] Klensin, J., "Internationalized Domain Names in Applications (IDNA): Protocol", draft-ietf-idnabis-protocol-18 (work in progress), January 2010.

[I-D.teint-xidna-base] Teint, N., "Extending IDNA to Other Protocols (X-IDNA)", draft-teint-xidna-base-00 (work in progress), February 2010.

[RFC1035] Mockapetris, P., "Domain names - implementation and specification", STD 13, RFC 1035, November 1987.

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2279] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC 2279, January 1998.

[Unicode] Unicode Consortium, "Unicode Standard, Version 5.2", December 2009, <http://www.unicode.org/versions/Unicode5.2.0/>.

7.2. Informative References

[I-D.teint-xidna-email] Teint, N., "An X-IDNA Profile for Electronic Mail Addresses", draft-teint-xidna-base-00 (work in progress), February 2010.

In the plain text version of this memo, the sequence "&#nnnn;" denotes the literal Unicode character number nnnn (decimal).

Unicode: 例。テスト

Normalised: 例.テスト
Extracted: L:"例", S:".", L:"テスト"
Converted: L:"xn--fsq", S".", L:"xn--zckzah"
Re-Assembled: xn--fsq.xn--zckzah

Unicode: lieselotte\.m\ueller.example.net

Normalised: lieselotte\.mueller.example.net
Extracted: L:"lieselotte" S:"\." L:"mueller" S:. L:"example" S:"." L:"net"
Converted: L:"lieselotte" S:"\." L:"xn--mller-kva" S:. L:"example" S:"." L:"net"
Re-Assembled: lieselotte\.xn--mller-kva.example.net

Unicode: -αλφα-βῆτα-γάμμα

Normalised: -αλφα-βῆτα-γάμμα
Extracted: S:"-" L:"αλφα-βῆτα-γάμμα"
Converted: S:"-" L:"xn-----x8brabcel8esaa2hya7368h"
Re-Assembled: -xn-----x8brabcel8esaa2hya7368h

Unicode: hans\x00muell\x65r\x01-foo

Normalised: hans\x00\mueller\x01\-foo
Extracted: L:"hans" S:"\" L:"x00", S:"\" L:"mueller" S:"\", L:"x01" S:"\-" L:"foo"
Converted: L:"hans" S:"\" L:"x00", S:"\" L:"xn--mller-kva" S:"\", L:"x01" S:"\-" L:"foo"
Re-Assembled: hans\x00\xn--mller-kva\x01\-foo
Equivalent: hans\x00xn--mller-kva\x01-foo

The following two zone master files define exactly the same content for the actual zone:

Unicode form:

$ORIGIN 例。テスト
@ IN SOA αλφα (lieselotte\.mueller.example.net 20 7200 600 3600000 60 )
NS αλφα
NS βῆτα
NS γάμμα
αλφα A 10.1.1.1
βῆτα A 10.2.2.2
γάμμα A 10.3.3.3

ACE form:

$ORIGIN xn--fsq.xn--zckzah
@ IN SOA xn--mxaa3a7b (lieselotte\.xn--mller-kva.example.net 20 7200 600 3600000 60 )
NS xn--mxaa3a7b
NS xn--mxab8c899n
NS xn--hxake1ba
xn--mxaa3a7b A 10.1.1.1
xn--mxab8c899n A 10.2.2.2
xn--hxake1ba A 10.3.3.3

Please note the email address embedded in the SOA record, the conversion of which is compatible with [I-D.teint-xidna-email].