Annex 11. IDNApplication Medley
From IUCG - Internet Users Contributing Group
This is just a documented list of topics that I believe related to the IDNA2010 work to be done to acclimate the "IDNA2008 tool-kit" with users.
ICANN position regarding IDNA2008 implementations
As far as IDNA2008 deployment and transition from IDNA2003, ICANN represents a potential of a few zone Management contracts (project FAST TRACK) over the billions of concerned existing and future DNS zone managers.
The work they achieve in their particuliar should be part of the IDNA2010 BCP on IDNA2008 deployment, transition and usage.
- The ICANN implementation guidelines reference page was last updated by a 2007 version which did not cover IDNA2008 issues.
- The ICANN IDNA technical considerations are based upon IDNA203 RFCs, RFC 4690 callng for its review. It does not quote the existence of the IETF WG/IDNABIS, not its Charter page, nor its deliverables, however they now are sent to the IESG.
On 15 Dec 2009, Cary Karp indicated :
"The reason that text has been unchanged for so long is that the
authorship group has been waiting for the IDNA protocol revision to be
concluded so its details can be reflected in the next version of the
Guidelines."
"ICANN had been similarly holding off on finalizing the terms of the
ccTLD Fast Track <http://www.icann.org/en/topics/idn/fast-track/> to
ensure its conformity with the revised protocol. That goal was abandoned
a while back and a reference group was appointed to advise about key
pending implementation issues. The report produced by that group
<http://www.icann.org/en/announcements/announcement-2-03dec09-en.htm>
will likely be folded into the Guidelines to render them more applicable
to the TLD space, without or without additional benefit of being able to
address the outcome of the protocol revision. The report considers the
matter of "character variant management" which should also be of
relevance to the IDNA-UPDATE discussion of bundling.
On 17 Dec 2009, Kary kindly gave the following indications:
- In this endeavor, I am one of a group of TLD administrators who meet as peers and have volunteered to be their scribe. (Only two of us regard ourselves as native Anglophones and the other guy keeps forgetting to bring a pencil.)
- The group has been active since IDNA2003 was finalized, and its mandated concern has been to provide a basis for the responsible implementation of that protocol in the SLD space. Beyond that, we make every endeavor to formulate the Guidelines in a manner that will have commonsense appeal to the operators of all zones on all levels of the DNS when formulating their own IDN policies. We also make every effort for the Guidelines to reflect the best wisdom about IDN generated in other fora.
- Our work has been in abeyance for an uncomfortably long time while waiting for IDNA2008, and my proposal was intended to hasten progress on both fronts. The transition between IDNA2203 and IDNA2008 is a glaringly obviously concern for the TLDs that have been accepting registration under the former. Since there are no IDN-labeled TLDs, those that will be established are unlikely to have legacy difficulties in this regard.
- The next version of the Guidelines will of necessity contain transition recommendations. It therefore seems rather obvious for them to reflect the massive amount of work done by the IETF WG and the organizations that have participated in it. If the IETF nonetheless opts to establish a separate instrument in which corresponding recommendations are expressed, the gTLDs -- which are contractually bound to implementing the ICANN Guidelines -- can easily end up in the awkward position of needing to reject anything proposed by the IETF (or in any other context) that is at odds with the ICANN Guidelines.
Microsoft IRT the French language orthotypography
Wed, 09 Dec 2009 14:05:51 -0800:
When asked to comment a WG/IDNABIS Members 25 vs. 4 feed back that Unicode/Google wants to disregard, Microsoft's representative, [Shawn Steele]wrote (Message-ID: <E14011F8737B524BB564B05FF748464A0446592E@TK5EX14MBXC139.redmond.corp.microsoft.com> In-Reply-To: <D3E448A8-20D2-4CFB-ACE0-8A165B297FC2@google.com> :
- "A couple things concern me about the results here.
- One is that, of course, "france@large" supports PVALID as it is closer to their case of separating Ecole and ecole. To me, this is noise.
- The other votes I could almost group into 3 buckets:
- Those that think names need to be distinct and don't like mapping, or bundling, either.
- Those that think there's a linguistic difference and support is necessary, yet like mapping. This includes people who would bundle, so it's not clear to me if PVALID is voted to support the character, or because differentiation is necessary. Bundling seems to be at odds with the need for differentiation.
- Those who think the proper presentation is needed, yet think that some sort of enforced bundling, or mapping, is not harmful, and is important for compatibility.
- These groups seem to also have different perspectives, and areas of expertise (all valid), so it concerns me that strictly counting numbers might ignore one perspective or area of expertise."
The main idea seems to be that a 25 vs. 4 feed-back addressing the French language requirements should actually be read as a nill to 3 expertise opposing it.
NB: france@large does not take "Ecole" and "ecole" as a paradigme. Project.FRA uses to express the difficulty in explaining that two .fra domain names of which the othotypography would clearly differentiate the meaning would however either be read as "State" or as "status". This would represent a semantically non-acceptable loss of meaning. This would therefore prevent the Internet DNS to be interoperable with the generalized intertechnology Semantic Address System (SAS), that Project.FRA and other Internet multilinguistic projects are jointly exploring.
Developper information
091104 14:31 Erik van der Poel
There are several different operations that you can perform on the labels of a domain name, and these operations occur at different times. Here are just a few examples:
- (1) single-label registration time
- (2) multi-label DNAME definition time
- (3) multi-label domain name lookup time
- (4) multi-label domain name display time
idnabis-protocol-17 focuses on (1) and (3). For (1), it says:
"If the proposed label contains any characters that are written from right to left it MUST meet the BIDI criteria [IDNA2008-BIDI]."
Note that the above is talking about a single label. For (3), the protocol says:
"Verification that the string is compliant with the requirements for right to left characters, specified in [IDNA2008-BIDI]."
Note that the above is talking about a "string", which presumably might contain more than one label. As far as I can tell, IDNAbis does not say much about operations (2) and (4) for bidi.
idnabis-bidi-06 says:
'A "BIDI domain name" is a domain name that contains at least one RTL label.' and
'The following rule, consisting of six conditions, applies to labels in BIDI domain names.'
Clearly, the above is talking about multi-label domain names. However, the rule itself tells you how to test a single label, so that part of the spec can be used at registration time (1).
Let's take an example. One of our favorite examples is 3com.com. At registration time, when we are registering the label "3com", there is no way of knowing that someone may, at some point in the future, define a DNAME that breaks the IDNAbis bidi rules. Since there is no way of knowing that, the registration is simply allowed.
Later, someone tries to define a DNAME, say, HEBREW.3com.com where HEBREW is a string of right-to-left Hebrew characters. At this point, the implementation might choose to check the IDNAbis bidi rules and either reject the DNAME or emit a warning about it if it breaks the rules.
Even later, someone tries to lookup HEBREW.3com.com. The implementation can check the entire domain name against the IDNAbis bidi rules. It does not have to check since the protocol says "SHOULD".
Yet later again, someone tries to display HEBREW.3com.com. The implementation probably should check against the IDNAbis bidi rules. If the domain name breaks the rules, the implementation can refuse to display it in Unicode form (choosing Punycode instead), or produce a warning of some kind.
So, the IDNAbis drafts are in some sense incomplete, since they don't fully address DNAME time (2) and display time (4). But if the WG does start to discuss these, you can imagine what my position is going to be.
Erik
091104 15:38 Gihan Dias
the WG concluded that the registry or registrar had to be cognizant of this kind of anomaly and reject problematic registration requests.
Unfortunately, it is impossible to make sure *all* registrars (or even all registries) would uniformly handle problematic requests.
We need mandatory rules for registrations, which are checked by software in lookups.
091104 15:47 Andrew Sullivan
(2) multi-label DNAME definition time
The problem there is that there is _no way_ to have a rule about this. It's not that we never thought about it; it's that you can't possibly specify a rule that will solve this problem.
If there are zone cuts, it is _just impossible_ to know what might end up lurking on the other side of the cut, not only at the time of registration but also later, since a DNAME could be introduced at a later date (indeed, that's actually one of the target use cases of DNAMEs).
091104 15:59 Vint Cerf
There is no way to guarantee this uniformity; nor was the WG able to define every possible case in which problems would arise. The range of possibilities is too great given the huge number of new symbols introduced by IDNA. This was debated extensively during the last 2 years and that is the working group consensus as I see it.
901104 16:16 Gihan Dias
I understand.
My point is that if the WG was unable to reach a consensus, then registrars would not be able to do so either, so what we'll have is an "anything goes" situation.
At the least, we (software developers, sys admins, website owners, advertising companies, users, etc.) need to be aware of the problems.
091123 16:23 Erik van der Poel
I was talking about a single DNAME definition, not about other DNAMEs that might enter the picture at some point. If somebody were to implement a tool that checks DNAMEs before inserting them into a zone, that tool could check a *single* DNAME against the IDNAbis bidi rules.
Here is an example from http://tools.ietf.org/html/rfc2672
frobozz.example. DNAME frobozz-division.acme.example.
When the tool is about to insert this DNAME into the zone, it can apply the bidi rules to the full domain name (which consists of multiple labels, as you can see).
Even if some other DNAMEs were defined later that would allow the creation of domain names that break the bidi rules, the client can check the rules at lookup time and at display time.
091104 16:32 Andrew Sullivan <ajs@shinkuro.com>
My point is that if the WG was unable to reach a consensus, then registrars would not be able to do so either, so what we'll have is an "anything goes" situation.
Yes, but not by registrars. Registry operators need to make policy that makes sense given the characters they're willing to support. Note that the documents explicitly say, "Registries need to develop policy in this area." Will this vary from registry to registry? Yes: that's part of the goal. The idea is to make the protocol flexible enough that different operators can make different policies to match their different circumstances.
At the least, we (software developers, sys admins, website owners, advertising companies, users, etc.) need to be aware of the problems.
You bet. But how is that different from being aware of the linguistic environment in which your software runs, your systems work, or your advertising and webites are interpreted? For instance, if you're aiming to attract fans of 1970s Detroit automobiles, you'd be pretty foolish to name your site "underthebonnet.com". How is this different?
091104 16:37 John C Klensin
This would create more vunerabilities than it would eliminate. The argument for "more run time checking" involves not trusting zone administrators. I think we have found a decent balance on that subject -- not going quite as far as we possibly could, but striking a balance with the other considerations, including different practices by different languages using the same script. If I were running a registry, I'd certainly consider it reasonable to apply some of the tests that you suggest. But remember that DNAMEs can be created by anyone with appropriate dynamic update privileges and in zones anywhere in the tree pointing to anywhere else in the tree. If we say "the client MUST make these checks", then people will rely on their being made... and bad-guy clients simply won't.
I do believe that, at some point, the world is going to need better sets of "guidelines to registries and zone administrators for best practices" -- maybe globally (e.g., "you should think about these issues") but more likely on a per-language basis. But Rationale already does some of the former (personally, I'd welcome going a bit further, but WG consensus about what to say has been hard to achieve and it is even harder to know where to stop) and it is worth noting that there are already several per-script efforts for the latter (starting with the JET work that also brought us variants).
But trying to make a normative requirement would, I fear, just backfire.
091104 16:48 Andrew Sullivan
I was talking about a single DNAME definition, not about other DNAMEs that might enter the picture at some point.
That sounds like operational guidance, not protocol. "Be careful" is not a useful thing to put in a protocol document, even if it's perfectly good operational advice.
I think you're right that it'd be nice to put together a set of guidelines for what people ought to do when operating IDNA-aware zones. There are several sets of considerations that need to be taken into account, and such advice ought indeed to be offered.
The IDNABIS WG is not the forum that should generate that advice: hardly anybody here is a DNS operator. I in fact previously offered (in a fit of insanity) to put together a -00 draft outlining such advice. I think it should probably be taken to DNSOP to see if there's any interest. But the response to me (one with which I have considerable sympathy) was that it'd be crazy to try to produce operational suggestions for a protocol that hasn't made it out of the IETF's process yet.
If somebody were to implement a tool that checks DNAMEs before inserting them into a zone, that tool could check a *single* DNAME against the IDNAbis bidi rules. Here is an example from http://tools.ietf.org/html/rfc2672 frobozz.example. DNAME frobozz-division.acme.example.
When the tool is about to insert this DNAME into the zone, it can apply the bidi rules to the full domain name (which consists of multiple labels, as you can see).
I think you don't fully understand how DNAMEs work. Your example does not alias "frobozz.example." to "frobozz-division.acme.example.", because DNAMEs don't alias the owner name where the DNAME appears. That DNAME _does_ alias www.frobozz.example to www.frobozz-division.acme.example. It also aliases א1.frobozz.example to א1.frobozz-division.acme.example. In fact, it aliases
- .frobozz.example. The problem is to know whether there is in fact
the label א1 in frobozz-division.acme.example, and without controlling both zones, the administrator of frobozz.example doesn't know (and can't know unless zone transfer is permitted or frobozz-division.acme.example is running DNSSEC with NSEC). This is why there is a problem. DNAMEs don't do what a lot of people think they do.
Cross/inter label testing
091104 11:11 Vint Cerf
the question of inter-label or cross-label testing was extensively discussed on the WG list and rejected as overly complex at the protocol level. As with a number of cases, the WG concluded that the registry or registrar had to be cognizant of this kind of anomaly and reject problematic registration requests.
Unicode TR46
20091027 Martin J. Dürst
Subject: High-level comment on TR46
I have been thinking a bit more about TR46 (http://www.unicode.org/reports/tr46/) and all the connections.
I think TR46 on a high level does try to do two things:
- Provide a well-defined mapping for domain names. I think the Unicode consortium is the best place to actually do this, and personally I can easily imagine that some IETF (or W3C) specs may refer to it.
- Try to "fix" some perceived problems in IDNA 2008 and between IDNA 2008 and IDNA 2003 (mainly sz, final sigma,...). While the intent here is very well-meant, and what's proposed is definitely one possible solution, from an IETF perspective, the Internet (and the use of IDNs) is far wider than browsers and search engines, and it is highly desirable for all parties involved if a solution to these problems comes directly from the IETF.
My proposal would therefore be to split the document, e.g. as follows:
- Keep TR46 with the uncontroversial mappings (essentially the extension of the IDNA 2003 mappings to the IDNA 2008 repertoire)
- Submit the proposal for how to deal with both IDNA 2003 and IDNA 2008 at the same time, and in particular how to deal with the special cases (called "deviations" in TR46) as an Internet-Draft (best with some co-authors from other affected communities, e.g. from DENIC,...). That's the best way to get the IETF to actually face the issues.
This is a 'refinement' from what we discussed with Larry about two weeks ago in Mountain View.
20091226 MARK DAVIS
My bandwidth is extremely limited until we get back to the states, so I will be brief. Please forgive me if by being brief, I am also overly brusk.
- 1. I have not been able to follow the 4 deviation character discussion, but it appears that there is agreement on some transition strategies that will work; a key approach appears to be to map on the client side if one is sure that the zone bundles, otherwise map.
- 2. Given that, I'd anticipate that the UTC would modify TR46 to be (a) support of symbols for some transitional period, and (b) a standard mapping. The rest of my comments are on the mapping issue.
- 3. One uniform mapping would be better than multiple, inconsistent mappings.
- 4. While one could argue either way, the advantage of the TR46 mapping is that it preserves compatibility with IDNA2003.
- 5. The current IDNA2008 mapping wouldn't maintain that compatibility, falls short in a number of cases for languages that don't have case/width issues, and has a number of formal problems.
- 6. We have major vendors that intend to implement the TR46 mappings; I don't know of any that have signed up to implement the current idna2008 spec.
- 7. The supposed argument from "harm" is specious.
- 8. First, there is a mixup below. If X is confusable with a PVALID Y, it is no problem to map X to Y; it would only be a (theoretical) problem if X were mapped to a PVALID Z.
- 9. Vastly more importantly, the argument from "harm" is faith-based, not data-based. I don't have access here, but I previously posted notes on the relative frequencies of spoofing techniques. Form that data:
- 1. Spoofing with confusable characters is FAR below spoofing with syntax (like http://safe-amazon.com) in frequency.
- 2. There are essentially no letters that can be spoofed with the mapped characters that can't also be spoofed with other letters that are PVALID.
- 10. In sum, allowing the additional mappings makes *no* significant difference in the ability to spoof.
- 11. Best would be to incorporate the TR46 mappings into IDNA2008. Second best would be to reference them; third would be to remove the idna2008 mappings document, and fourth would be to leave them as is, and just deal with the muddle that results.
Mark
"eszett" exception
Maryhofer - NIC.AT 20091025
As you probably know, i'm working for the Austrian Domain Name Registry (nic.at). I've recently prepared a presentation to our board regarding the changes to expect from IDNAbis deployment, and I've been asked by our board to voice our concerns about the "szett" (U+00DF) exception in the current document set. I understand that the documents have progressed very far, and that we should have voiced our concerns earlier - however, i think that the information below is still valuable to the group.
Obviously, the DNS is an extremely important identity and naming system that is crucial to the operation of nearly all internet applications. Therefore, any changes to that structure are delicate operations. This is important for the creation of new portions of namespace, but particularly important when the semantics of a namespace (portion) are changed. The introduction of IDNA2003 was an extension of the namespace, at least from the application perspective (technically, it was changing the definition of an awkward-enough portion of the namespace, namely labels with "xn--").
Changing the semantics of a certain namespace is *really bad*, and i agree to what Marcos said long time ago "Breaking backwards compatibility is to my eyes the big stigma of IDNA2008".
I understand and welcome the introduction of rigid rules in IDNAbis as the primary mechanism to identify copepoint classification and protocol validity. Independence from a certain Unicode revision ensures a stable specification, and should create few "surprises" (essentially, it shifts responsibility of character classification from the IETF to Unicode). I also understand and welcome the 1:1 relation on the protocol level between A-label and U-label.
However, the introduction of *exceptions* that work around those rigid rules, and particularly changing the semantics of a part of a deployed, used namespace is *really really bad* - particularly if the exception concerns such a "weird" character as the "szett" (Unicode folding-wise). Such changes generally have the potential to change the resolved destination for a certain domain name, which in turn creates *major* security issues, and hurts interopability badly, because unlike the introduction of IDN2003, where a label would either work or not, those exceptions now create a situation where such a label would resolve to either destination A (old application), destination B (new application).
I understand that the Rationale document proposes sensible approaches in Section 7.2 - however, i think the security issues could discuss the problems more explicitely, rather than just referring to the rationale document (which is informational anyways). I think that the sentence
- "...a few characters that were mapped to others in the earlier version; zone administrators should be aware of the problems that might raise and take appropriate measures"
In the definitions document could easily be overlooked by implementors.
Another issue makes it even harder for zone administrator to deal with the problem: Actually *encouraging* application developers to create their own fancy mapping definitions, beyond the mappings that were included in IDNA2003 allows for even more "variations", and are bound to hurt interopability badly. One example of this is the Unicode TR46, particularly the proposal of "dual lookups" and "trusted registries" for "Deviations", which i believe to be a really really bad idea - but what are the other options?
Shifting the responsibility of mapping, and therefore allowing for creating a myriad of mapping options to application developers seems risky to me, particularly for the Exception codepoints for which protocol definitions have changed between the two versions. From my point of view, it makes such codepoints unusable - the "mapping du jour" of application X could be entirely different than that of application Y.
The Mapping draft says that it's "unusual" for the IETF to disucss user input processing steps - but on the other hand, Section 2.1 of RFC 3761 (the ENUM base specification) clearly provides normative text about how user input should be prepared for a protocol (and i'm sure there are many other examples). So it seems the IETF *is* concerned about how user input is mapped to protocol elements.
To sum up, we would have preferred the "szett" (U+00DF) to be kept "DISALLOWED", and to have the IETF describe the mapping procedures not just "Informational" (The contents of the mapping document itself is perfectly fine). We also hope that the IETF liases with application developers, particularly browser vendors, to establish one single "de facto" mapping procedure, so that at least the szett does not become a moving target.
Mark Davis, Unicode, 20101026
The Unicode Consortium shares your concerns about the treatment of deviations, and the security and interoperability issues resulting from that and custom mappings. Unfortunately, while those points were raised consistently during the development of IDNA2008 (some would say too persistently), the working group decided on its current course.
We have been consulting with browser and search engine vendors (many of whom are members of the consortium), and I would anticipate that most will not end up implementing IDNA2008 lookup as is because of the problems it has. TR46 is designed by those needing to implement IDNA lookup so as to provide a bridge specification, whereby implementations can maximize compatibility with IDNA2003 and IDNA2008 on the lookup side, and avoid these problems. On the "dual lookup" and "trusted registries" point that you mention: the text of TR46 is insufficiently clear. That section is discussing alternative approaches that were considered, but discarded (because they don't work well, as Marcos pointed out in detail). I'll make sure that that feedback is brought into the committee.
TR46 is not really aimed at the registry side. It is feasible for registries to implement IDNA2008 if they additionally DISALLOW the four deviations (including es-zett). This can be done while being conformant to IDNA2008, because registries can further limit the characters they support.
WG/IDNABIS Chair dialog with UNICODE before finalizing IDNA2008 text and sending it to IESG.
Vint Cerf letter to Unicode
Letter to Unicode
- Ms. Lisa Moore
- Chairman, Unicode Technical Committee
- via email: lisam@us.ibm.com
- CC:
- Eric Muller
- Vice Chairman, Unicode Technical Committee
- via email: emuller@adobe.com
- Mark Davis
- President, Unicode Consortium
- via email: markdavis@googlle.com
28 November 2010
Dear Ms. Moore:
I am writing to you in my role as chairman of the IDNABIS working group, addressing this request to you as president of the Unicode Consortium. As you know, treatment of the two characters, Greek Small Letter Final Sigma (U+03C2) and Latin Small Letter Sharp S (U+00DF) have been the source of considerable discussion during the IDNABIS Working Group effort on specifying the IDNA2008 proposed replacement of the IDNA2003 standard for the use of Unicode in Internationalized Domain Names. Latin Capital Letter Sharp S (U+1E9E) was added in Unicode version 5.1.0 but recommended rules for its use were provided as shown below:
Begin quote from Unicode Version 5.1.0
Tailored Casing Operations
The Unicode Standard provides default casing operations. There are circumstances in which the default operations need to be tailored for specific locales or environments. Some of these tailorings have data that is in the standard, in the SpecialCasing.txt file, notable for the Turkish dotted capital I and dotless small i. In other cases, more specialized tailored casing operations may be appropriate. These include:
- Titlecasing of IJ at the start of words in Dutch
- Removal of accents when uppercasing letters in Greek
- Uppercasing U+00DF ( ) LATIN SMALL LETTER SHARP S to the new U+1E9E LATIN CAPITAL LETTER SHARP S
However, these tailorings may or may not be desired, depending on the implementation in question.
In particular, capital sharp s is intended for typographical representations of signage and uppercase titles, and other environments where users require the sharp s to be preserved in uppercase. Overall, such usage is rare. In contrast, standard German orthography uses the string "SS" as uppercase mapping for small sharp s. Thus, with the default Unicode casing operations, capital sharp s will lowercase to small sharp s, but not the reverse: small sharp s uppercases to "SS". In those instances where the reverse casing operation is needed, a tailored operation would be required.
End quote from Unicode Version 5.1.0
In IDNA2003, Sharp S was mapped to "ss" by means of a casing operation that mapped lower case Sharp S to uppercase "SS" and then down to lowercase "ss". Registrations and lookups using the IDNA2003 rules applied this mechanism.
During the discussions in the IDNABIS Working Group on IDNA2008, a strong consensus developed around not mapping for example for registration purposes and also for preserving the property that the IDNA2008-defined A-Label and U-Label forms be fully symmetric (i.e., convertible into one another without change or loss).
During these same discussions, a consensus seemed to develop to permit (ie. make "PVALID" in IDNA2008 parlance) Latin Small Letter Sharp S (U+00DF) and Greek Small Letter Final Sigma (U+03C2). The recommended casing actions of Unicode (i.e. toCaseFold) on Sharp S and Final Sigma produce "ss" in the case of Sharp S and Greek Small Letter Sigma (U+03C3) in the case of Final Sigma.
To make the lowercase forms PVALID using the functional rules of IDNA2008, exceptions were required to overcome the recommended casing mechanics of Unicode (i.e. application of CaseFolding).
Note that IDNA2008 explicitly permits mapping for User Interface purposes:
- a) draft-ietf-idnabis-protocol-17#section-5.2
- c) draft-ietf-idnabis-rationale-14#section-4.4
- d) draft-ietf-idnabis-rationale-14#section-6
- e) draft-ietf-idnabis-rationale-14#section-7.3
- f) draft-ietf-idnabis-mappings-05
If Small Letter Sharp S and Small Letter Final Sigma were to be made DISALLOWED, these mapping provisions would permit these characters to be handled as a User Interface matter prior to lookup.
Because the practices of IDNA2003 are in conflict with the proposed practices of IDNA2008, and because the Last Call discussions have surfaced controversy over the incorporation of the two lowercase forms in question, I request an organizational recommendation from UTC as to the treatment of these characters. Taking into account the prohibition of mapping on registration, which I take to be firm, and the requirement that A-Label and U-Label forms must be unambiguously convertible into each other, would the UTC recommend to exclude the use of Small Letter Sharp S and Small Letter Final Sigma in IDNA2008 by removing their exceptions and making each DISALLOWED?
A prompt response would be much appreciated considering we have delayed reporting the results of the IETF LAST CALL to the Internet Engineering Steering Group while this matter is debated.
Sincerely,
Vinton Cerf
Chairman, IDNABIS Working Group of the Internet Engineering Task Force
Unicode response
- Dear Vint,
The UTC appreciates the difficulty for users of IDNs, the registries,and all involved if lowercase sharp s (Latin Small Letter Sharp S (U+00DF)) and small final sigma (Greek Small Letter Final Sigma (U+03C2)), in particular, are mapped to other characters. As you know, our concerns are compatibility and potential security issues. However, based on the many ongoing dicussions and much thought, the UTC would not be opposed to have lowercase sharp s, final sigma, and even joiner and non-joiner be valid and not mapped, as long as there can be policies in place for a transition period (of say 5 years) that will manage the expected compatibility issues.
The key for us is having policies for a well-managed transition with sufficient time for browser and other application upgrades. Without such policies in place, we would favor continuing the IDNA2003 treatment of the four above-mentioned characters.
- Best regards,
- Lisa Moore
- Chair, Unicode Technical Committee
Mark Davis, Unicode President
Problem
We would like to have the 4 deviation characters be valid, at some point. The key problem is that we don't want current URLs in web pages, etc. to go to two different locations depending on the browser, nor do we want joe@fußball.com to go sometimes to joe@fußball.com and sometimes to joe@fussball.com. Even once IDNA2008 is approved, for a long time a majority of the implementations will still be IDNA2003, so this also goes for new label registrations during the transition period.
Proposal
IDNA2008 changes as follows:
- The 4 deviation characters get the property PVALID_AFTER_2015
The requirements are:
- On registration, PVALID_AFTER_2015 is equivalent to PVALID
- On lookup, PVALID_AFTER_2015 is treated as DISALLOWED up until 2016 Jan 1, 00:00:00 GMT, and treated as PVALID thereafter.
- Implementations must not map the characters after the switchover date.
- Implementations that map the characters before that date, must map as in IDNA2003.
The goal is to
allow the 4 character to become valid, as soon as possible; avoid the 'nightmare' scenario of the same URL going to two different locations, as much as possible.
Scenarios
Let's see what happens with fußball.xxx over time, where xxx is some registry (eg .de, .blogspot.com, or others). Background: essentially all browsers and other major implementations are planning to map for compatibility. We'll look at browsers, but this also applies to email, etc.
- Early 2010 (just as IDNA2008 is approved)
- At this time the world browsers are 100% IDNA2003
- browsers map fußball.xxx to fussball.xxx.
- registries can start accepting eszett, and should bundle with ss.
- fußball shows up as fussball in the address bar
- note: it is only by convention that fussball is seen in the address bar in this case; a browser could also display fußball, as in UTS46.
- results:
- if the registry bundles, both fußball.xxx and fussball.xxx go to the same owner.
- if the registry doesn't bundle, both fußball.xxx and fussball.xxx go to the same owner.
- The odd IDNA2008 browser that doesn't map just fails, because ß is not PVALID; it doesn't take fußball.xxx to a different location than the vast majority of browsers.
- In 2013
- At this time the world browsers are 50% IDNA2003, 50% IDNA2008
same as above. No ambiguity in results.
- In 2016 Feb
- At this time the world browsers are 1% IDNA2003, 99% IDNA2008
- 99% of browsers switch to not mapping fußball.xxx.
- Registries no longer need to bundle; they can have different owners for fußball.xxx and fussball.xxx.
- fußball shows up as fußball in the address bar
- results:
- if the registry bundles, both fußball.xxx and fussball.xxx go to the same owner.
- if the registry doesn't bundle, fußball.xxx and fussball.xxx go to different owners.
- The odd IDNA2003 browser that is left goes to the wrong location for the affected languages; people that use them need to upgrade.
Acknowledgment of the WG/IDNABIS Chair
- Dear Lisa,
Thank you and the UTC for its rapid response.
I believe that the discussions of the past week have confirmed a general consensus on the preference that Final Sigma and Sharp-S be PVALID. We did not poll for the joiner/non-joiner question because a consensus already existed, in my opinion, as chair, for these to be contextually valid (CONTEXTJ).
The method of introduction of IDNA2008 is important to all of us, to promote its utility. At the close of the day, I will review all of the comments received and attempt to synthesize what I believe is a plan around which consensus can be obtained.
Vint Cerf
Pete Resnick's Recipe
>> One slightly more solid question for browsers is, would it be entirely crazy to have different mapping algorithms for typed-in domain names than for links followed? There might be a locale-dependent mapping as well as a global mapping. (I assume that having every established locale mapping installed would be complete craziness.)
> IMO crazy. People type URLs into web browsers. Other people type URLs into HTML documents. We can't have the same letters in those two locations taking you to different places.
Generally staying out of this discussion, but: If we were considering to do this correctly (IMO), typed-in domains with DISALLOWED characters would get mapped, and links followed would get "Bite Me!" in big bold letters and the equivalent of an NXDOMAIN. Anything that takes typed input that maps should dynamically "correct" to PVALID things, and anything that is taking data should scream bloody murder if the domain labels are not all PVALID.
Yes, I know I'm not convincing anyone of this. But it's the right thing to do.
Back to my hidey-hole.
pr
IDNA 2008 in light of DYN DNS
John Klensin 200910025
> Uh, yes. If dynamic update is configured to require that an > RRSET (from the viewpoint of IDNA, a label) is already present, > then one has a lookup situation. If it is configured for the > "RRSET does not exist" or "Name not in use" cases, then one has > a registration situation. That said, my personal recommendation > would be to use the more conservative Registration rules any > time one is going to start modifying DNS zones rather than > simply looking something up. But the WG has not discussed this > topic. If people are convinced that something must be said on > the subject, we will need to have that discussion.
Bernard Adoba 20091026
I do think that something needs to be said about this, since the issue has come up in implementation. For example, based on the distinction above, a client handling a dynamic update on its own using TKEY would implement the lookup protocol, whereas a DHCP server handling a dynamic update on behalf of the client might implement the registration protocol.
Vint Cerf 20091026
another alternative would be for you to issue an informational RFC about interpretation of IDNA 2008 in light of DYN DNS, would it not?
RTL & Labels
Abdulrahman I. ALGhadir 20091013
while I was reading draft-ietf-idnabis-bidi-06.txt I found this:
“4.3. Strings with numbers
By requiring that the first or last character of a string be category R or AL, RFC 3454 prohibited a string containing right-to-left characters from ending with a number.
Consider the strings ALEF 5 (HEBREW LETTER ALEF + DIGIT FIVE) and 5 ALEF. Displayed in an LTR context, the first one will be displayed from left to right as 5 ALEF (with the 5 being considered right-to- left because of the leading ALEF), while 5 ALEF will be displayed in exactly the same order (5 taking the direction from context). Clearly, only one of those should be permitted as a registered label, but barring them both seems unnecessary.”
Why permitting done on the protocol level ? shouldn’t this be done at registry level?
Ex. If someone wants to register 3COM (COM is a RTL word) registry will register 3COM for him/her and will lock COM3. At least this will give users the choice for picking not forcing them on one type?
Vint Cerf 20091013
for the most part, the IDNA2008 specification does place a great deal of responsibility on the registry but for some cases, it was considered important to bar particularly confusing situations at the protocol level. This was debated substantially during the course of the development of IDNA2008.
Abdulrahman I. ALGhadir 20091014
Well yes it is confusing and yes there are other confusing problems which the protocol simply assign it to registry what I say it is unfair to prevent all domains Which they start with digits to be as choice for user if this problem moved to registry level a registry can simply do:
- 1) Act same as what the protocol do now (preventing all domains which start with digits).
- 2) Register domains which start with digits and lock the same domain which end with digits and the opposite.
And any further problems which may result from using 2) can be solved as the registry wants. Plus if the protocol allowed this maybe bidi algorithm will support domains who knows.
Full stop Mapping
20091011 Sarmad Hussain
In earlier discussions on U+06D4 (ARABIC FULL STOP), which is necessary for Urdu as a label separator (the reasons have been given on this list earlier), it was suggested that the various full stops will not be allowed and be mapped. It was subsequently requested to include the mapping reference in IDNA200x documents to ensure that the application providers incorporate it, but the request was not considered positively as it was perhaps suggested that such recommendations can not be made part of the protocol. However, the recent mapping document (http://tools.ietf.org/html/draft-ietf-idnabis-mappings-04) says on pg. 2:
- 4. [I-D.ietf-idnabis-protocol] is specified such that the protocol acts on the indvidual labels of the domain name. If an implementation of this mapping is also performing the step of separation of the parts of a domain name into labels by using the FULL STOP character (U+002E), the following character can be mapped to the FULL STOP before label separation occurs:
- IDEOGRAPHIC FULL STOP (U+3002)
There are other characters that are used as "full stops" that one could consider mapping as label separators, but their use as such has not been investigated thoroughly.
- IDEOGRAPHIC FULL STOP (U+3002)
If this is being explicitly done for U+3002, it could be done explicitly for ARABIC FULL STOP (U+06D4) as well. What is the reason for not including other such possible cases explicitly?
20091011 Vint Cerf
keep in mind that the Mappings document is NOT normative. It is intended to give some ideas for localization and pre-processing. The important point is that only U+002E will be recognized in protocol as a label separator. For purposes of exchanging IDNs, that's important. For local contexts, one might allow alternative full-stop inputs but these would need to be converted to the U+002E form prior to initiating a DNS query. It would probably be wise also to convert to U+002E for purposes of canonical exchange of domain names with other parties.
For Pete Resnick and Paul Hoffman:
this email might be interpreted as a request to add U+06D4 to the Mappings list of potential local mappings to U+002E. Have you an opinion whether this edit would be appropriate?
200910111703 Erik van der Poel
In my opinion, it would be premature to include U+06D4 in the IDNAbis mapping draft (apart from the fact that it is rather late in the Last Calls process to be making such a change). U+3002 has a much longer history in IDNA and is much more firmly established. If U+06D4 would be mapped to U+002E at the data interchange level (think HTML), there would be a period where IDNA2003 implementations and new implementations would resolve domain names differently. Of course, the IDNAbis mapping draft explicitly states that it is intended to be used at the UI level (e.g. keyboard input), but, frankly, I don't think we have much experience with IDNA implementations that distinguish between the UI and data interchange levels.
20091011 At 17:24 int Cerf
Thanks for this observations.
Clearly we would not advocate use of U+064D for interchange. Mappings was intended to focus on non-interchange UI treatments.
I agree that this is a rather substantive change; may we hear from others in IDNABIS WG please?
We are going to have to draw these last call discussions to a close soon.
vint
20091011 At 18:52 11/10/2009, Paul Hoffman
It is not late at all. We are still in IETF Last Call, which is exactly the time we are supposed to be having these community-wide discussions.
U+3002 has a much longer history in IDNA and is much more firmly established.
Correct.
If U+06D4 would be mapped to U+002E at the data interchange level (think HTML), there would be a period where IDNA2003 implementations and new implementations would resolve domain names differently.
No one has suggested that we do that, of course. If someone wants to mis-implement draft-ietf-idnabis-mappings, there is nothing we can do to stop them.
Of course, the IDNAbis mapping draft explicitly states that it is intended to be used at the UI level (e.g. keyboard input), but, frankly, I don't think we have much experience with IDNA implementations that distinguish between the UI and data interchange levels.
So, what do you propose instead? If this is a proposal to abandon draft-ietf-idnabis-mappings, you need to be more explicit about it. If it is not such a proposal, then you need to say what parts of the draft need to be abandoned to deal with your concern.
=== 20091012 At 02:17 YAO Jiankang ===:
In my opinion, it is not necessary to abandon idnabis-mappings.
+1
It is trying to solve a perceived problem at the UI level (which may be somewhat unusual for an IETF document). As long as it is not normative, it's fine with me. Maybe it should become an Experimental RFC instead of Informational.
I think that at least it should be informational. Mapping is also very important. it is one way to make idna2008 be compatible with the idna2003 protocol. the suggested category is proposal standard.
20091012 At 14:53 John C Klensin
I see two issues with this idea. The second leads to a suggestion.
(1) While very local mapping of full stops makes perfectly good sense, any leakage is going to interfere with programs that need to parse dot-separated-label form into length-label pairs. Such programs may exist because they are not IDNA-aware or because they are trying to resolve names in private name spaces and follow the RFC 2181 observation that the DNS can accommodate any string of octets. We know that things "leak", so these mappings need to be performed very carefully --even more carefully than mappings of characters within labels and any document should reflect that.
(2) One of the advantages of the list of characters that are now discussed in the mapping document is that the list is fairly stable. Because it is largely motivated by within-label IDNA2003 compatibility, there should be little or no need to expand the list as Unicode evolves. That is a good thing because we don't have comprehensive rules to generate the character and mapping list (even though much of it is either NFKC or CaseFold) -- the mappings document is essentially Unicode version-dependent.
By contrast, I think the general rule for candidate alternate full-stop characters is going to be:
- (i) The traditional, DNS-specified, label separator U+002E (ASCII dot) is unnatural or hard to type or render in the local environment.
- (ii) There is a character in the local environment and script that is in common use, that is an obvious logical substitute label separator, and possibly that users will expect it to be a substitute regardless of what we have to say on the subject.
Although I think one can make a slightly stronger argument for U+06D4 because of bidi implications, I don't personally see a very strong justification for a recommendation to map one of these characters and not other. Erik and others have made that point for some specific cases.
The list of alternate full stops (referred to on the IDNA list for a while as "dot-oids") second list initialized with the three East Asian full stops called out in RFC 3490, i.e.,
- U+3002 (ideographic full stop),
- U+FF0E (fullwidth full stop), and
- U+FF61 (halfwidth ideographic full stop)
plus U+06D4 (Arabic full stop) for the reasons identified in Sarmad's note.
But a search in the UnicodeData file shows 38 characters with names containing "FULL STOP" and 60 (22 more) containing "STOP". Some of these are clearly irrelevant compatibility characters (e.g., those in the 2488..249B range which encode numerals followed by full stop) as single code points or used for some completely different purpose (e.g., a series of Glottal Stop code points). We've also been told repeatedly that putting too much reliance on the assumption that Unicode names encode characteristic information is a bad idea -- there may well be characters out there that could be seen locally as reasonable label separators that are not identified as "Full Stop" in their names.
So building a comprehensive list would require character-by-character examinations and decisions based on local contexts and usage. While we clearly have the expertise available to make those judgments for East Asian scripts the Urdu use of Arabic script, we do not have that knowledge in the general case. The list is not close-ended either. As new scripts are added to Unicode, it is safe to assume that at least some of them will have their own full stop (or equivalent) characters that will need to be considered for addition to the list.
Whether publishing a list of recommended mappings to U+002E is a good idea or not depends on (1) above and causes me to again recall that only ASCII separators are permitted in IRI and URI syntax -- separators that raise many of the same issues as the label separator in domain names. But, assuming that there is rough consensus that doing so is appropriate, I suggest that the right way to proceed is to create yet another document that
- focuses on the label-separator mapping alone
- explains the issues and risks of doing such mappings
- creates an IANA registry, initially populated by the four characters identified above, and with a mechanism for adding addition characters to it as the need for them is identified.
Since that document would presumably be informational anyway, if Lisa is agreeable, it could be handled as an AD-sponsored individual submission, thereby separating it from the WG's schedule and task list. I would hope we would not go forward with it (and that Lisa would not sponsor it) unless there were rough consensus on this list that having such a document and list of characters would we wise. But decoupling it from the current Mapping document seems to be to appropriate, if only because of the expandability and open-endedness of the list of characters.
20091012 At 15:22 Paul Hoffman
But decoupling it from the current Mapping document seems to be to appropriate, if only because of the expandability and open-endedness of the list of characters.
Sarmad's message did not seem to be a request to add all full stops to draft-ietf-idnabis-mappings, just U+06D4. (Sarmad, please correct me if I'm wrong.) The list that is given in the draft is clearly labelled as examples of full stops that a UI implementer might consider, not as the full list. The reason that the list exists at all is because of input from experts in a particular script; it seems reasonable to take input from experts on other scripts as well. If we get bombarded with experts from more than half a dozen languages before the end of IETF Last Call, I could see your view, but this is a single request from someone in a language community that has been part of the IDNAbis process for quite some time.
Before I consider adding U+06D4, I would want to hear from additional experts, but other than that, I see no danger in adding another character to an optional list in an optional document.
20091012 Shawn Steele
I think that at least it should be informational. Mapping is also very important. it is one way to make idna2008 be compatible with the idna2003 protocol. the suggested category is proposal standard.
Mapping isn't at all sufficient for IDNA2003 compatibility.
First of all, if there is going to be a list in the document, I
think Sarmad's request (which has been supported by other
analyses, by the way) is a completely reasonable one and that it
should be added to the list. I am, of course, not an expert on
either Urdu or Arabic script usage generally, but the consensus
among those who are seems to be that adding this is at least as
strongly justified as the East Asian characters we have and, at
worst, harmless (whether you believe Sarmad, or me, about that
or whether you seek other experts is up to you... and Vint).
One can generalize from the "harmless" part: it looks like all of the likely possibilities are going to be Po characters and hence DISALLOWED. As long as that relationship holds, and as long as they don't leak, they are all harmless. If they leak, we get into a whole new family of visual confusability issues and questions, but it seems to me that "won't leak" is a fairly basic assumption of the mapping document.
We discussed label-separator mappings (including, if I recall, U+06D4) much earlier in the life of the WG. What I think Sarmad's request points to was part of that earlier discussion: the East Asian character list is not a complete list of all reasonable "full stop" characters that might be justified and recommended for mapping as label separators. That pointer is independent of the specifics of U+06D4, whether he intended it or not.
Before I consider adding U+06D4, I would want to hear from additional experts, but other than that, I see no danger in adding another character to an optional list in an optional document.
I don't either (see above about "harmless"), except that I think it would be very unfortunate to have to reopen and re-review this document in six months or a year when someone argues that one or two of U+0589, U+1632, U+166E (a special case because I'm sure Eric can advise us as to whether that request would be likely to arise and be plausible), U+1803, and so on should be added to the list for the same types of reasons as the ones Sarmad describes.
So I guess I'm making a recommendation about something I should have spotted and made the recommendation about long ago: create a registry for these things. The reason for that is precisely to prevent our needing to reopen the mapping document to add one or more of these characters, not to treat them with more or less authority. I think such additions are extremely probable, either as new scripts are added to Unicode or as communities that are now unrepresented in this WG and probably underrepresented on the Internet show up and explain their needs. Of course, YMMD on that likelihood assessment.
My suggestion about careful explanation and cautions is separate but, if we preserve the current model for separation of materials, if the registry is created by the mapping document, the explanation would need to be in Rationale anyway.
- john
At 17:06 12/10/2009, Paul Hoffman wrote:
At 11:13 AM -0400 10/12/09, John C Klensin wrote: I don't either (see above about "harmless"), except that I think it would be very unfortunate to have to reopen and re-review this document in six months or a year when someone argues that one or two of U+0589, U+1632, U+166E (a special case because I'm sure Eric can advise us as to whether that request would be likely to arise and be plausible), U+1803, and so on should be added to the list for the same types of reasons as the ones Sarmad describes.
Fully agree. I can't speak for Pete, but I am not willing to re-open the document to add characters to a list of optional characters. In fact, I'm not even sure there will be a "we" around in six months: the WG should close down after we have fulfilled our charter.
So I guess I'm making a recommendation about something I should have spotted and made the recommendation about long ago: create a registry for these things.
And here we fully disagree. I think defining the rules for registries for any of the optional parts of the document will take at least another year, and we have already hit the exhaustion point. I see absolutely no upside for a registry of optional suggestions for an Informational RFC, with a significant downside of taking a lot of review time that could be better spent elsewhere.
The reason for that is precisely yo prevent our needing to reopen the mapping document to add one or more of these characters, not to treat them with more or less authority.
I see no "need" to reopen the document, ever. If someone wants to prepare a document with a different view, or with the same view but additional characters, that's fine, and quite easy in the RFC process.
I think such additions are extremely probable, either as new scripts are added to Unicode or as communities that are now unrepresented in this WG and probably underrepresented on the Internet show up and explain their needs. Of course, YMMD on that likelihood assessment.
We certainly agree on the likelihood of desired additions, but I think not on the "need" for a revision to the WG's document.
At 17:16 12/10/2009, Vint Cerf wrote:
Folks,
I think we might be able to come to a conclusion this way:
1. it is important that we agree that the canonical, exchange format for domain names contain ONLY U+002E as the label separator
2. UI contexts and practices will likely vary and reasonable choices for local use of "full-stop" characters will vary
3. The mappings document is essentially informational and notional, not prescriptive, so editing it as if it were normative is likely counterproductive
4. The idea of documenting UI practices seems useful, but, as I think Paul argues, this need not be the work of the IDNABIS working group.
If ICANN sees some value in (4) perhaps it can formulate a kind of "known mapping practices" registry (I hesitate to say "best practices") for various scripts (languages??) I don't see this as an IETF function, however.
comments?
vint
Bidi discussion
From: jefsey [1] Sent: 13/Feb/2010 4:15 PM
As you know I prepare an appeal to the IESG over the IDNA issue. In a nutshell this appeal concerns the way IESG/IAB handles IDNA2008 as a standalone architectural issue. On an usage point of view the architectural concepts approved at the occasion of IDNA2008 legitimate a much more innovative vision of the same Internet technology. However, that vision introduces new issues which should be addressed or at least announced at the same time in order to avoid confusion. The real matter therefore is most probably with the IAB. My evaluation is that IDNA2008 uses the Internet technology intrinsic capacity for multiplicity (what we are not used to) to address diversity.
The purpose of the appeal is therefore to obtain an (we think urgent) authoritative IESG and probably IAB position about:
- (1) the way they consider the IDNA2008 architectural insertion and impact
- (2) if such an impact exists who is to address its consequences. IETF, users, someone else, under which form.
In order to clarify the issues I consider three areas needing to be documented:
- 1. IDNA and the Internet system - this is what IDNA documents - where IETF can specify MUSTs
- 2. IDNA and the IDNs Internet peritem - this is what Mapping considers.- where IETF can specify SHOULDs
- 3. IDNA and the Users Internet exotem - this is the IDNA open use architecture by users' real world, which is not considered - where
IETF can specified MAYs
I am not really familiar with Bidi. My question therefore is: where in the above scheme do you locate your discussion?
- 1. does that affect IDNA2008 as the interface between the DNS and IDNs?
- 2. does that relates to IDNs conversion from/to A-labels/U-labels
- 3. does that belongs to the User experience
My question is how Bidi should be best inserted in the appeal I consider to be of best use?
Slim Amamou <slim@alixsys.com> 14 février 2010 04:57
hi,
It belongs probably to User experience. Although the issue is broader
than that, and deals with the question of the universality of the
representation of the domain name. In other words : will a domain name
look the same in all countries, for all cultures and on every media
(screen, print, etc...) or not?
Abdulrahman I. ALGhadir <aghadir@citc.gov.sa> 14 février 2010 05:39
1. does that affect IDNA2008 as the interface between the DNS and IDNs?
No, it doesn't.
2. does that relates to IDNs conversion from/to A-labels/U-labels
No, it doesn't.
3. does that belongs to the User experience
Well, it doesn't give the user him/herself any problem in reading domain names (because they are rendered in correct way) the issue is that the logical order sometimes doesn't have the same order as network order (which is against some rules in IDNA rfc and some other rfcs).
My question is how Bidi should be best inserted in the appeal I consider to be of best use?
Well I don't see the IDNA has major impact on this problem (it is rendering problem) so I am not sure in this part.
Slim Amamou <slim@alixsys.com> 14 février 2010 11:23
does that belongs to the User experience -- Well, it doesn't give the user him/herself any problem in reading domain names (because they are rendered in correct way)
Actually Abdulrahman and I are not agreeing here. I started the thread because I consider rendering L1.R2.R3.L4 as L1.R3.R2.L4 obviously not correct. And I still don't understand how this could be acceptable for anyone. Note that this will be a very common usage, since most domain names begin with LTR www. and end with LTR TLD.
Vint Cerf <vint@google.com> 14 février 2010 15:36
unless there are strong indicators that a string IS a domain name, this is going to be an unsolvable problem I think. Won't it also depend on the actual label contents and the appearance of numerics in the labels, to add more complexity?
vint
Slim Amamou <slim@alixsys.com> 14 février 2010 20:41
unless there are strong indicators that a string IS a domain name, this is going to be an unsolvable problem I think.
I think dealing with cases where a domain name can not be identified as such is out of the scope. It always was, there is no way to tell that example. is a domain name. But our problem could be narrowed to displayed URLs (or should we say IRLs?) and email addresses. In the two contexts the domain name is obvious.
Won't it also depend on the actual label contents and the appearance of numerics in the labels, to add more complexity?
No, because
- 1- the structure of the domain name will prevail. DNS was not meant to be used like del.icio.us or to write sentences using the period in place of the space. Some components of the internet depend on the implied hierarchy of authority. like HTML5 same origin policy for example, and in general whenever the word "subdomain" is used in a spec.
2- If someone really can't help using a domain name for writing a sentence like www.this.is.an8.label.domain.name.com in arabic, it won't matter to him to input it in reverse order : name.domain.label.an8.is.this.www during domain registration process. what will matter to him is that the domain will *always* be displayed the same whether it is LTR or RTL context. (if this is not clear, please ask, I will develop more)
I'm aware that it would be confusing for someone writing RTL to input a domain name including periods during the registration process, just to have it inverted on display. But this is due to the limitations of the current deployed domain registration platforms, and could be mitigated by simple GUI improvements (GUI improvements have to be made anyway to make them BIDI capable).
Vint Cerf <vint@google.com> 14 février 2010 21:03
Slim,
one doesn't register the whole domain name in one place - one registers a label with each zone manager.
so I am not sure we should take your example below as literally as you might have meant it?
vint
Slim Amamou <slim@alixsys.com> 14 février 2010 22:50
one doesn't register the whole domain name in one place - one registers a label with each zone manager.
But current registrar web interfaces allow for authority zone management. And from what I recall, they unfortunately allow periods in subdomains. If that is not the case, that's even better because enforcing network order for label ordering won't introduce any usability problem.
Slim Amamou <slim@alixsys.com> 16 février 2010 10:21
(...) The way I approach semantic addressing is to locate a notion throught its coordinated concepts. So, this a genitive chain: abc of cde of ghi of ijk which can also be ijk's ghi's cde's abc. When it is a domain name it will use the first format, in the case it is a C structure it will be the second format. In that perspective I presuppose that order within labels is to be purely conventional respecting the punycode order respecting the user entered/registered order.
I agree with this, and I chose the network order out of convenience. But what is a C structure?
(...) To do that I just consider that the DNS uses alphadigital digits (i.e. from 0 to Z, what fits with the way it relates to uppercases). I then consider that a label may be made of several sublabels united by "-" "subjoiner/subseparator", and that labels are tied into a LTR or RTL name seqences in using "." joiner/separator (the xn-- header is therefore a sublabel separated from the other sublabels, by an empty sublabel).
So you are introducing a new separator class, namely the "-" within the label. Thus structuring the label in it's punycode form (A-label) for the sake of meaningfulness. Right?
In such a syntax only URN is reordered if it is a domain name or a structure, not sublabels. This means that this is transparent to punycode.
For that matter, I support the non-reordering of URN components and subcomponents.
As far as I understand Bidi is only treated on "UDN" (user entered domain name, that can still be more complex than the IDN U-label). This is why I am quite concerned by your mention that Bidi registries could "rigidify" the label order in entering "." that would be some kind of intermediary between what I call a subjoiner and a joiner. This could be addressed as a special joiner, but would complexify the whole syntax analysis. The same a "to be hidden" subsubjoiner/separator, could indicate sequences indifferent to Bidi?
Yes. If it's hidden, it's not a problem. In the layer you are working on, only ascii is allowed, so BIDI and i18n altogether is not a concern.
I wish to underline that in some way or another, a semantic address will have to indicate it is a semantic address and to indicate if it uses the DNS or the C structure format. The idea I currently have (to be script and language transparent) is to use something prefix like "|+ |" and "| +|" or "++ +" that are universal. Such a prefix/posfix to be removed before using the URL could also indicate that a string is a domain name (out of the habit in semantic addresses) ?
I have no problem with that, it's just out of the scope of the issue I indicated.
Jefsey
have no problem with that, it's just out of the scope of the issue I indicated.
Yes.
This is totally out of the WG Charter. Yet, IMHO, the WG Charter if correctly completed (as it has been, except that the Mapping document is not approved yet) can only raise the question "is its continuation on the user side in the scope of the whole IETF, if yes who does take care of it, if not who should be the leader because IETF has to interface with them". This is why I consider that this is in the IAB scope/area of decision and that IESG should have not approved the IDNA2008 document set without making it extremely clear. So, no one respecting the IETF positions, like ICANN and responsible lead users do, attempts to use IDNA2008 in an operational (test or not) conditions.
