Unicode TR46
From IUCG - Internet Users Contributing Group
20091027 Martin J. Dürst
Subject: High-level comment on TR46
I have been thinking a bit more about TR46 (http://www.unicode.org/reports/tr46/) and all the connections.
I think TR46 on a high level does try to do two things:
- Provide a well-defined mapping for domain names. I think the Unicode consortium is the best place to actually do this, and personally I can easily imagine that some IETF (or W3C) specs may refer to it.
- Try to "fix" some perceived problems in IDNA 2008 and between IDNA 2008 and IDNA 2003 (mainly sz, final sigma,...). While the intent here is very well-meant, and what's proposed is definitely one possible solution, from an IETF perspective, the Internet (and the use of IDNs) is far wider than browsers and search engines, and it is highly desirable for all parties involved if a solution to these problems comes directly from the IETF.
My proposal would therefore be to split the document, e.g. as follows:
- Keep TR46 with the uncontroversial mappings (essentially the extension of the IDNA 2003 mappings to the IDNA 2008 repertoire)
- Submit the proposal for how to deal with both IDNA 2003 and IDNA 2008 at the same time, and in particular how to deal with the special cases (called "deviations" in TR46) as an Internet-Draft (best with some co-authors from other affected communities, e.g. from DENIC,...). That's the best way to get the IETF to actually face the issues.
This is a 'refinement' from what we discussed with Larry about two weeks ago in Mountain View.
20091226 MARK DAVIS
My bandwidth is extremely limited until we get back to the states, so I will be brief. Please forgive me if by being brief, I am also overly brusk.
- 1. I have not been able to follow the 4 deviation character discussion, but it appears that there is agreement on some transition strategies that will work; a key approach appears to be to map on the client side if one is sure that the zone bundles, otherwise map.
- 2. Given that, I'd anticipate that the UTC would modify TR46 to be (a) support of symbols for some transitional period, and (b) a standard mapping. The rest of my comments are on the mapping issue.
- 3. One uniform mapping would be better than multiple, inconsistent mappings.
- 4. While one could argue either way, the advantage of the TR46 mapping is that it preserves compatibility with IDNA2003.
- 5. The current IDNA2008 mapping wouldn't maintain that compatibility, falls short in a number of cases for languages that don't have case/width issues, and has a number of formal problems.
- 6. We have major vendors that intend to implement the TR46 mappings; I don't know of any that have signed up to implement the current idna2008 spec.
- 7. The supposed argument from "harm" is specious.
- 8. First, there is a mixup below. If X is confusable with a PVALID Y, it is no problem to map X to Y; it would only be a (theoretical) problem if X were mapped to a PVALID Z.
- 9. Vastly more importantly, the argument from "harm" is faith-based, not data-based. I don't have access here, but I previously posted notes on the relative frequencies of spoofing techniques. Form that data:
- 1. Spoofing with confusable characters is FAR below spoofing with syntax (like http://safe-amazon.com) in frequency.
- 2. There are essentially no letters that can be spoofed with the mapped characters that can't also be spoofed with other letters that are PVALID.
- 10. In sum, allowing the additional mappings makes *no* significant difference in the ability to spoof.
- 11. Best would be to incorporate the TR46 mappings into IDNA2008. Second best would be to reference them; third would be to remove the idna2008 mappings document, and fourth would be to leave them as is, and just deal with the muddle that results.
Mark
