Annex 6. Punyplus
From IUCG - Internet Users Contributing Group
This memo documents an IDNA proposition (AKA IDNAPLUS), which supports upper and lowercases and new possibilities for the Internet name space.
.
Contents |
Introduction
According to the current WG-IDNABIS proposal, which is based on the punycode support algorithm, the entries in domain name locations should be subjected to a systematic preceding lowercasing without later restoration.
This means that uppercases are not supported on an end-to-end basis and that uppercased domain names are resolved as if they were their lowercase equivalent. Therefore, half of all the characters of the world's alphabets are thereby ignored by the IDNA. The WG-IDNABIS considers this to be perfectly acceptable. The WG-IDNABIS also believes that this does not significantly affect the overall "internationalized" domain name usage.
The resulting semantic loss by orthotypographic limitation makes IDNA unfit for:
- a good steganographic use of the namespace (mnemonics, encryption, etc.)
- a direct support of the Semantic Addressing System (SAS) that the IUCG is exploring.
This memo documents a punyplus algorithm that fully supports the SAS and offers a strictly compliant IDNA option.
Discussion list
This proposition can be discussed on the iucg@ietf.org.
The Interplus approach
The Interplus is an Internet "better use" architecture to bring users better Facilitation services. It deploys additional layers on the user side in a manner that is transparent to the Internet layers' operations. It is in this way that it can support extended network functions (active content support) and additional services (such as the support of ambient content).
The Interplus architecture includes an ML-DNS (multi-layers domain name system) pile. This pile comprises three naming layers:
- User Domain Names (UDNs) that can be case sensitive full UTF-8
- Application Domain Names (ADNs) that can be case sensitive NFC
- Internet Domain Names (IDNs) that comprise alphadecimal (0-Z) labels (case insensitive by nature) in using strong (".") separators, and a weak ("-") separator between the sub-labels.
Empty sub-labels (resulting in a "--" separation) are understood as the operationally validated way to support the Internet Presentation virtual layer, in which they separate a presentation prefix (or suffix) from the naming payload. "xn--" is understood as the "extended names presentation" prefix.
Punyplus algorithm
The punyplus algorithm is used to convert (back and forth) ADNs into IDNs. An option of punyplus conforms to the punycode algorithm and does not support uppercases.
To reliably obtain this, punyplus is a strict punycode copy with the added capacit‚ (as a removable option)to insert the "^" prefix to signal the replacement of an uppercase by a lowercase. There are then two possible ways to support this insert.
Support as a regular character
When executing the punycode algorithm itself, it converts "^" as a non-ASCII (UMI) upper-case metadata indicator and then handles its code point in the regular way.
When converting back to non-ASCII, it restores the UMI as "^", or deletes it if the next character is restored as an uppercase.
Exemple:
- UDN: User domain name : Etat.fra
- ADN: Application domain name : ^etat.fra
- IDN: Internet domain name : xn--etat-abc.fra (abc depending on the choice of UMI).
Support through a dedicated format
The "^" insert is replaced by the "---1" sequence. The advantage of this solution is its independence from the coding system, that it is open to the support of other needs by sequence of "---x" or "---zz" type, its readability is simple and direct entries car be easily manually made. Its disadvantage is the user of four or five bytes to send a metadata.
Exemple:
- UDN: User domain name : Etat.fra
- ADN: Application domain name : ^etat.fra
- IDN: Internet domain name : xn--e---1tat.fra
Value of the UMI codepoint
There are four possibilities for the choice of the UMI code point:
To use a DISALLOWED codepoint
This will be transparent to IDNA unaware applications and uncertain with IDNA2003 applications, but blocked by IDNA2008 compliant applications. There is no risk of confusion with a lowercase described destination.
To use an UNASSIGNED codepoint
This will be transparent to IDNA unaware applications and uncertain with IDNA2003 applications, but blocked by IDNA2008 compliant applications. A conflict may result from a further edition of ISO 10646. Otherwise, there is no risk of confusion with a lowercase described destination.
To use a PVALID codepoint
The code point would be out of regular use sequence, therefore signaling a special use. This will be transparent to IDNA unaware applications and accepted, but not necessarily understood, by IDNA2003 and IDNA2008 compatible applications. There is no risk of confusion with a lowercase described destination.
To register a dedicated codepoint
This will be transparent to IDNA unaware applications and uncertain with IDNA2003 applications, but blocked by IDNA2008 compliant applications. There is no risk of confusion with a lowercase described destination. A further IDNA update could make that code point PVALID and extend IDNAPLUS to all the IDNA users.
Solution
These possibilities will be submitted to tests and discussion with users.
Naming added services
The punyplus algorithm being transparent to pure ASCII UDNs/ADNs, it is applicable to all Internet domain names. It can therefore be used as an ML-DNS naming command parser and universal UDN/DNA/IDN converter. Its concerted generalization will therefore participate in an ambient canonalization of the naming space.
Further extensions
The punyplus algorithm will progressively be extended to support DNS related security projects, linguistic mail addresses, DNS class selection, semantic addressing and Netix (interplus explored interapplication system) commands.
Multilinguistics section
Multilinguistics is understood as the cybernetics of linguistic diversity. This memo enables improved Internet multilinguistic support.
Security section
The ability to use a larger number of code points that are denied; or an additional alphadecimal character in naming (corresponding to the "power" sign) that is blocked by all the naming functions, which in turn should not create security issues for those applications that are properly written or those interfaced by properly written applications.
The inability to use uppercases dramatically reduces the interest of randomly generated or encrypted domain names in access protection strategies.
The global stability of the Internet will probably be affected by some uses taking, like IDNA, a better advantage from its protocols. This should motivate a wide separated study. The author things that the punyplus practice may bring a positive added experience in the study and the solving of the new problems that may be encountered.
IANA section
The selected value of the UMI should be recorded by the IANA.
Acknowledgments
I particularly thank the members of the france@large center of expertise on IDNs for their dedicated support and pertinent contributions.
References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
