"szett" exception

From IUCG - Internet Users Contributing Group

Jump to: navigation, search

Maryhofer - NIC.AT 20091025

As you probably know, i'm working for the Austrian Domain Name Registry (nic.at). I've recently prepared a presentation to our board regarding the changes to expect from IDNAbis deployment, and I've been asked by our board to voice our concerns about the "szett" (U+00DF) exception in the current document set. I understand that the documents have progressed very far, and that we should have voiced our concerns earlier - however, i think that the information below is still valuable to the group.

Obviously, the DNS is an extremely important identity and naming system that is crucial to the operation of nearly all internet applications. Therefore, any changes to that structure are delicate operations. This is important for the creation of new portions of namespace, but particularly important when the semantics of a namespace (portion) are changed. The introduction of IDNA2003 was an extension of the namespace, at least from the application perspective (technically, it was changing the definition of an awkward-enough portion of the namespace, namely labels with "xn--").

Changing the semantics of a certain namespace is *really bad*, and i agree to what Marcos said long time ago "Breaking backwards compatibility is to my eyes the big stigma of IDNA2008".

I understand and welcome the introduction of rigid rules in IDNAbis as the primary mechanism to identify copepoint classification and protocol validity. Independence from a certain Unicode revision ensures a stable specification, and should create few "surprises" (essentially, it shifts responsibility of character classification from the IETF to Unicode). I also understand and welcome the 1:1 relation on the protocol level between A-label and U-label.

However, the introduction of *exceptions* that work around those rigid rules, and particularly changing the semantics of a part of a deployed, used namespace is *really really bad* - particularly if the exception concerns such a "weird" character as the "szett" (Unicode folding-wise). Such changes generally have the potential to change the resolved destination for a certain domain name, which in turn creates *major* security issues, and hurts interopability badly, because unlike the introduction of IDN2003, where a label would either work or not, those exceptions now create a situation where such a label would resolve to either destination A (old application), destination B (new application).

I understand that the Rationale document proposes sensible approaches in Section 7.2 - however, i think the security issues could discuss the problems more explicitely, rather than just referring to the rationale document (which is informational anyways). I think that the sentence

"...a few characters that were mapped to others in the earlier version; zone administrators should be aware of the problems that might raise and take appropriate measures"

In the definitions document could easily be overlooked by implementors.

Another issue makes it even harder for zone administrator to deal with the problem: Actually *encouraging* application developers to create their own fancy mapping definitions, beyond the mappings that were included in IDNA2003 allows for even more "variations", and are bound to hurt interopability badly. One example of this is the Unicode TR46, particularly the proposal of "dual lookups" and "trusted registries" for "Deviations", which i believe to be a really really bad idea - but what are the other options?

Shifting the responsibility of mapping, and therefore allowing for creating a myriad of mapping options to application developers seems risky to me, particularly for the Exception codepoints for which protocol definitions have changed between the two versions. From my point of view, it makes such codepoints unusable - the "mapping du jour" of application X could be entirely different than that of application Y.

The Mapping draft says that it's "unusual" for the IETF to disucss user input processing steps - but on the other hand, Section 2.1 of RFC 3761 (the ENUM base specification) clearly provides normative text about how user input should be prepared for a protocol (and i'm sure there are many other examples). So it seems the IETF *is* concerned about how user input is mapped to protocol elements.

To sum up, we would have preferred the "szett" (U+00DF) to be kept "DISALLOWED", and to have the IETF describe the mapping procedures not just "Informational" (The contents of the mapping document itself is perfectly fine). We also hope that the IETF liases with application developers, particularly browser vendors, to establish one single "de facto" mapping procedure, so that at least the szett does not become a moving target.

Mark Davis, Unicode, 20101026

The Unicode Consortium shares your concerns about the treatment of deviations, and the security and interoperability issues resulting from that and custom mappings. Unfortunately, while those points were raised consistently during the development of IDNA2008 (some would say too persistently), the working group decided on its current course.

We have been consulting with browser and search engine vendors (many of whom are members of the consortium), and I would anticipate that most will not end up implementing IDNA2008 lookup as is because of the problems it has. TR46 is designed by those needing to implement IDNA lookup so as to provide a bridge specification, whereby implementations can maximize compatibility with IDNA2003 and IDNA2008 on the lookup side, and avoid these problems. On the "dual lookup" and "trusted registries" point that you mention: the text of TR46 is insufficiently clear. That section is discussing alternative approaches that were considered, but discarded (because they don't work well, as Marcos pointed out in detail). I'll make sure that that feedback is brought into the committee.

TR46 is not really aimed at the registry side. It is feasible for registries to implement IDNA2008 if they additionally DISALLOW the four deviations (including es-zett). This can be done while being conformant to IDNA2008, because registries can further limit the characters they support.

Personal tools