Full stop Mapping

From IUCG - Internet Users Contributing Group

Jump to: navigation, search

Contents

20091011 Sarmad Hussain

In earlier discussions on U+06D4 (ARABIC FULL STOP), which is necessary for Urdu as a label separator (the reasons have been given on this list earlier), it was suggested that the various full stops will not be allowed and be mapped. It was subsequently requested to include the mapping reference in IDNA200x documents to ensure that the application providers incorporate it, but the request was not considered positively as it was perhaps suggested that such recommendations can not be made part of the protocol. However, the recent mapping document (http://tools.ietf.org/html/draft-ietf-idnabis-mappings-04) says on pg. 2:


4. [I-D.ietf-idnabis-protocol] is specified such that the protocol acts on the indvidual labels of the domain name. If an implementation of this mapping is also performing the step of separation of the parts of a domain name into labels by using the FULL STOP character (U+002E), the following character can be mapped to the FULL STOP before label separation occurs:
  • IDEOGRAPHIC FULL STOP (U+3002)
    There are other characters that are used as "full stops" that one could consider mapping as label separators, but their use as such has not been investigated thoroughly.

If this is being explicitly done for U+3002, it could be done explicitly for ARABIC FULL STOP (U+06D4) as well. What is the reason for not including other such possible cases explicitly?

20091011 Vint Cerf

keep in mind that the Mappings document is NOT normative. It is intended to give some ideas for localization and pre-processing. The important point is that only U+002E will be recognized in protocol as a label separator. For purposes of exchanging IDNs, that's important. For local contexts, one might allow alternative full-stop inputs but these would need to be converted to the U+002E form prior to initiating a DNS query. It would probably be wise also to convert to U+002E for purposes of canonical exchange of domain names with other parties.

For Pete Resnick and Paul Hoffman:

this email might be interpreted as a request to add U+06D4 to the Mappings list of potential local mappings to U+002E. Have you an opinion whether this edit would be appropriate?

200910111703 Erik van der Poel

In my opinion, it would be premature to include U+06D4 in the IDNAbis mapping draft (apart from the fact that it is rather late in the Last Calls process to be making such a change). U+3002 has a much longer history in IDNA and is much more firmly established. If U+06D4 would be mapped to U+002E at the data interchange level (think HTML), there would be a period where IDNA2003 implementations and new implementations would resolve domain names differently. Of course, the IDNAbis mapping draft explicitly states that it is intended to be used at the UI level (e.g. keyboard input), but, frankly, I don't think we have much experience with IDNA implementations that distinguish between the UI and data interchange levels.

20091011 At 17:24 int Cerf

Thanks for this observations.

Clearly we would not advocate use of U+064D for interchange. Mappings was intended to focus on non-interchange UI treatments.

I agree that this is a rather substantive change; may we hear from others in IDNABIS WG please?

We are going to have to draw these last call discussions to a close soon.

vint

20091011 At 18:52 11/10/2009, Paul Hoffman

It is not late at all. We are still in IETF Last Call, which is exactly the time we are supposed to be having these community-wide discussions.

U+3002 has a much longer history in IDNA and is much more firmly established.

Correct.

If U+06D4 would be mapped to U+002E at the data interchange level (think HTML), there would be a period where IDNA2003 implementations and new implementations would resolve domain names differently.

No one has suggested that we do that, of course. If someone wants to mis-implement draft-ietf-idnabis-mappings, there is nothing we can do to stop them.

Of course, the IDNAbis mapping draft explicitly states that it is intended to be used at the UI level (e.g. keyboard input), but, frankly, I don't think we have much experience with IDNA implementations that distinguish between the UI and data interchange levels.

So, what do you propose instead? If this is a proposal to abandon draft-ietf-idnabis-mappings, you need to be more explicit about it. If it is not such a proposal, then you need to say what parts of the draft need to be abandoned to deal with your concern.

== 20091012 At 02:17 YAO Jiankang ==:

In my opinion, it is not necessary to abandon idnabis-mappings.

+1

It is trying to solve a perceived problem at the UI level (which may be somewhat unusual for an IETF document). As long as it is not normative, it's fine with me. Maybe it should become an Experimental RFC instead of Informational.

I think that at least it should be informational. Mapping is also very important. it is one way to make idna2008 be compatible with the idna2003 protocol. the suggested category is proposal standard.

20091012 At 14:53 John C Klensin

I see two issues with this idea. The second leads to a suggestion.

(1) While very local mapping of full stops makes perfectly good sense, any leakage is going to interfere with programs that need to parse dot-separated-label form into length-label pairs. Such programs may exist because they are not IDNA-aware or because they are trying to resolve names in private name spaces and follow the RFC 2181 observation that the DNS can accommodate any string of octets. We know that things "leak", so these mappings need to be performed very carefully --even more carefully than mappings of characters within labels and any document should reflect that.

(2) One of the advantages of the list of characters that are now discussed in the mapping document is that the list is fairly stable. Because it is largely motivated by within-label IDNA2003 compatibility, there should be little or no need to expand the list as Unicode evolves. That is a good thing because we don't have comprehensive rules to generate the character and mapping list (even though much of it is either NFKC or CaseFold) -- the mappings document is essentially Unicode version-dependent.

By contrast, I think the general rule for candidate alternate full-stop characters is going to be:

(i) The traditional, DNS-specified, label separator U+002E (ASCII dot) is unnatural or hard to type or render in the local environment.

(ii) There is a character in the local environment and script that is in common use, that is an obvious logical substitute label separator, and possibly that users will expect it to be a substitute regardless of what we have to say on the subject.

Although I think one can make a slightly stronger argument for U+06D4 because of bidi implications, I don't personally see a very strong justification for a recommendation to map one of these characters and not other. Erik and others have made that point for some specific cases.

The list of alternate full stops (referred to on the IDNA list for a while as "dot-oids") second list initialized with the three East Asian full stops called out in RFC 3490, i.e.,

U+3002 (ideographic full stop), U+FF0E (fullwidth full stop), and U+FF61 (halfwidth ideographic full stop)

plus U+06D4 (Arabic full stop) for the reasons identified in Sarmad's note.

But a search in the UnicodeData file shows 38 characters with names containing "FULL STOP" and 60 (22 more) containing "STOP". Some of these are clearly irrelevant compatibility characters (e.g., those in the 2488..249B range which encode numerals followed by full stop) as single code points or used for some completely different purpose (e.g., a series of Glottal Stop code points). We've also been told repeatedly that putting too much reliance on the assumption that Unicode names encode characteristic information is a bad idea -- there may well be characters out there that could be seen locally as reasonable label separators that are not identified as "Full Stop" in their names.

So building a comprehensive list would require character-by-character examinations and decisions based on local contexts and usage. While we clearly have the expertise available to make those judgments for East Asian scripts the Urdu use of Arabic script, we do not have that knowledge in the general case. The list is not close-ended either. As new scripts are added to Unicode, it is safe to assume that at least some of them will have their own full stop (or equivalent) characters that will need to be considered for addition to the list.

Whether publishing a list of recommended mappings to U+002E is a good idea or not depends on (1) above and causes me to again recall that only ASCII separators are permitted in IRI and URI syntax -- separators that raise many of the same issues as the label separator in domain names. But, assuming that there is rough consensus that doing so is appropriate, I suggest that the right way to proceed is to create yet another document that

  • focuses on the label-separator mapping alone
  • explains the issues and risks of doing such mappings
  • creates an IANA registry, initially populated by the four characters identified above, and with a mechanism for adding addition characters to it as the need for them is identified.

Since that document would presumably be informational anyway, if Lisa is agreeable, it could be handled as an AD-sponsored individual submission, thereby separating it from the WG's schedule and task list. I would hope we would not go forward with it (and that Lisa would not sponsor it) unless there were rough consensus on this list that having such a document and list of characters would we wise. But decoupling it from the current Mapping document seems to be to appropriate, if only because of the expandability and open-endedness of the list of characters.

20091012 At 15:22 Paul Hoffman

But decoupling it from the current Mapping document seems to be to appropriate, if only because of the expandability and open-endedness of the list of characters.

Sarmad's message did not seem to be a request to add all full stops to draft-ietf-idnabis-mappings, just U+06D4. (Sarmad, please correct me if I'm wrong.) The list that is given in the draft is clearly labelled as examples of full stops that a UI implementer might consider, not as the full list. The reason that the list exists at all is because of input from experts in a particular script; it seems reasonable to take input from experts on other scripts as well. If we get bombarded with experts from more than half a dozen languages before the end of IETF Last Call, I could see your view, but this is a single request from someone in a language community that has been part of the IDNAbis process for quite some time.

Before I consider adding U+06D4, I would want to hear from additional experts, but other than that, I see no danger in adding another character to an optional list in an optional document.

20091012 Shawn Steele

I think that at least it should be informational. Mapping is also very important. it is one way to make idna2008 be compatible with the idna2003 protocol. the suggested category is proposal standard.

Mapping isn't at all sufficient for IDNA2003 compatibility.


First of all, if there is going to be a list in the document, I think Sarmad's request (which has been supported by other analyses, by the way) is a completely reasonable one and that it should be added to the list. I am, of course, not an expert on either Urdu or Arabic script usage generally, but the consensus among those who are seems to be that adding this is at least as strongly justified as the East Asian characters we have and, at worst, harmless (whether you believe Sarmad, or me, about that or whether you seek other experts is up to you... and Vint).

One can generalize from the "harmless" part: it looks like all of the likely possibilities are going to be Po characters and hence DISALLOWED. As long as that relationship holds, and as long as they don't leak, they are all harmless. If they leak, we get into a whole new family of visual confusability issues and questions, but it seems to me that "won't leak" is a fairly basic assumption of the mapping document.

We discussed label-separator mappings (including, if I recall, U+06D4) much earlier in the life of the WG. What I think Sarmad's request points to was part of that earlier discussion: the East Asian character list is not a complete list of all reasonable "full stop" characters that might be justified and recommended for mapping as label separators. That pointer is independent of the specifics of U+06D4, whether he intended it or not.

> Before I consider adding U+06D4, I would want to hear from > additional experts, but other than that, I see no danger in > adding another character to an optional list in an optional > document.

I don't either (see above about "harmless"), except that I think it would be very unfortunate to have to reopen and re-review this document in six months or a year when someone argues that one or two of U+0589, U+1632, U+166E (a special case because I'm sure Eric can advise us as to whether that request would be likely to arise and be plausible), U+1803, and so on should be added to the list for the same types of reasons as the ones Sarmad describes.

So I guess I'm making a recommendation about something I should have spotted and made the recommendation about long ago: create a registry for these things. The reason for that is precisely to prevent our needing to reopen the mapping document to add one or more of these characters, not to treat them with more or less authority. I think such additions are extremely probable, either as new scripts are added to Unicode or as communities that are now unrepresented in this WG and probably underrepresented on the Internet show up and explain their needs. Of course, YMMD on that likelihood assessment.

My suggestion about careful explanation and cautions is separate but, if we preserve the current model for separation of materials, if the registry is created by the mapping document, the explanation would need to be in Rationale anyway.

   john

_______________________________________________ Idna-arabicscript mailing list Arabic Script IDN Working Group (ASIWG) Idna-arabicscript@lists.irnic.ir http://lists.irnic.ir/mailman/listinfo/idna-arabicscript

At 17:06 12/10/2009, Paul Hoffman wrote: At 11:13 AM -0400 10/12/09, John C Klensin wrote: >I don't either (see above about "harmless"), except that I think >it would be very unfortunate to have to reopen and re-review >this document in six months or a year when someone argues that >one or two of U+0589, U+1632, U+166E (a special case because I'm >sure Eric can advise us as to whether that request would be >likely to arise and be plausible), U+1803, and so on should be >added to the list for the same types of reasons as the ones >Sarmad describes.

Fully agree. I can't speak for Pete, but I am not willing to re-open the document to add characters to a list of optional characters. In fact, I'm not even sure there will be a "we" around in six months: the WG should close down after we have fulfilled our charter.

>So I guess I'm making a recommendation about something I should >have spotted and made the recommendation about long ago: create >a registry for these things.

And here we fully disagree. I think defining the rules for registries for any of the optional parts of the document will take at least another year, and we have already hit the exhaustion point. I see absolutely no upside for a registry of optional suggestions for an Informational RFC, with a significant downside of taking a lot of review time that could be better spent elsewhere.

> The reason for that is precisely >to prevent our needing to reopen the mapping document to add one >or more of these characters, not to treat them with more or less >authority.

I see no "need" to reopen the document, ever. If someone wants to prepare a document with a different view, or with the same view but additional characters, that's fine, and quite easy in the RFC process.

>I think such additions are extremely probable, >either as new scripts are added to Unicode or as communities >that are now unrepresented in this WG and probably >underrepresented on the Internet show up and explain their >needs. Of course, YMMD on that likelihood assessment.

We certainly agree on the likelihood of desired additions, but I think not on the "need" for a revision to the WG's document. _______________________________________________ Idna-update mailing list Idna-update@alvestrand.no http://www.alvestrand.no/mailman/listinfo/idna-update

At 17:06 12/10/2009, Paul Hoffman wrote: At 11:13 AM -0400 10/12/09, John C Klensin wrote: >I don't either (see above about "harmless"), except that I think >it would be very unfortunate to have to reopen and re-review >this document in six months or a year when someone argues that >one or two of U+0589, U+1632, U+166E (a special case because I'm >sure Eric can advise us as to whether that request would be >likely to arise and be plausible), U+1803, and so on should be >added to the list for the same types of reasons as the ones >Sarmad describes.

Fully agree. I can't speak for Pete, but I am not willing to re-open the document to add characters to a list of optional characters. In fact, I'm not even sure there will be a "we" around in six months: the WG should close down after we have fulfilled our charter.

>So I guess I'm making a recommendation about something I should >have spotted and made the recommendation about long ago: create >a registry for these things.

And here we fully disagree. I think defining the rules for registries for any of the optional parts of the document will take at least another year, and we have already hit the exhaustion point. I see absolutely no upside for a registry of optional suggestions for an Informational RFC, with a significant downside of taking a lot of review time that could be better spent elsewhere.

> The reason for that is precisely >to prevent our needing to reopen the mapping document to add one >or more of these characters, not to treat them with more or less >authority.

I see no "need" to reopen the document, ever. If someone wants to prepare a document with a different view, or with the same view but additional characters, that's fine, and quite easy in the RFC process.

>I think such additions are extremely probable, >either as new scripts are added to Unicode or as communities >that are now unrepresented in this WG and probably >underrepresented on the Internet show up and explain their >needs. Of course, YMMD on that likelihood assessment.

We certainly agree on the likelihood of desired additions, but I think not on the "need" for a revision to the WG's document. _______________________________________________ Idna-arabicscript mailing list Arabic Script IDN Working Group (ASIWG) Idna-arabicscript@lists.irnic.ir http://lists.irnic.ir/mailman/listinfo/idna-arabicscript

At 17:16 12/10/2009, Vint Cerf wrote: Folks,

I think we might be able to come to a conclusion this way:

1. it is important that we agree that the canonical, exchange format for domain names contain ONLY U+002E as the label separator 2. UI contexts and practices will likely vary and reasonable choices for local use of "full-stop" characters will vary 3. The mappings document is essentially informational and notional, not prescriptive, so editing it as if it were normative is likely counterproductive 4. The idea of documenting UI practices seems useful, but, as I think Paul argues, this need not be the work of the IDNABIS working group.

If ICANN sees some value in (4) perhaps it can formulate a kind of "known mapping practices" registry (I hesitate to say "best practices") for various scripts (languages??) I don't see this as an IETF function, however.

comments?

vint


On Oct 12, 2009, at 12:06 PM, Paul Hoffman wrote:

> At 11:13 AM -0400 10/12/09, John C Klensin wrote: >> I don't either (see above about "harmless"), except that I think >> it would be very unfortunate to have to reopen and re-review >> this document in six months or a year when someone argues that >> one or two of U+0589, U+1632, U+166E (a special case because I'm >> sure Eric can advise us as to whether that request would be >> likely to arise and be plausible), U+1803, and so on should be >> added to the list for the same types of reasons as the ones >> Sarmad describes. > > Fully agree. I can't speak for Pete, but I am not willing to re-open > the document to add characters to a list of optional characters. In > fact, I'm not even sure there will be a "we" around in six months: > the WG should close down after we have fulfilled our charter. > >> So I guess I'm making a recommendation about something I should >> have spotted and made the recommendation about long ago: create >> a registry for these things. > > And here we fully disagree. I think defining the rules for > registries for any of the optional parts of the document will take > at least another year, and we have already hit the exhaustion point. > I see absolutely no upside for a registry of optional suggestions > for an Informational RFC, with a significant downside of taking a > lot of review time that could be better spent elsewhere. > >> The reason for that is precisely >> to prevent our needing to reopen the mapping document to add one >> or more of these characters, not to treat them with more or less >> authority. > > I see no "need" to reopen the document, ever. If someone wants to > prepare a document with a different view, or with the same view but > additional characters, that's fine, and quite easy in the RFC process. > >> I think such additions are extremely probable, >> either as new scripts are added to Unicode or as communities >> that are now unrepresented in this WG and probably >> underrepresented on the Internet show up and explain their >> needs. Of course, YMMD on that likelihood assessment. > > We certainly agree on the likelihood of desired additions, but I > think not on the "need" for a revision to the WG's document. > _______________________________________________ > Idna-update mailing list > Idna-update@alvestrand.no > http://www.alvestrand.no/mailman/listinfo/idna-update

_______________________________________________ Idna-update mailing list Idna-update@alvestrand.no http://www.alvestrand.no/mailman/listinfo/idna-update

At 17:16 12/10/2009, Vint Cerf wrote: Folks,

I think we might be able to come to a conclusion this way:

1. it is important that we agree that the canonical, exchange format for domain names contain ONLY U+002E as the label separator 2. UI contexts and practices will likely vary and reasonable choices for local use of "full-stop" characters will vary 3. The mappings document is essentially informational and notional, not prescriptive, so editing it as if it were normative is likely counterproductive 4. The idea of documenting UI practices seems useful, but, as I think Paul argues, this need not be the work of the IDNABIS working group.

If ICANN sees some value in (4) perhaps it can formulate a kind of "known mapping practices" registry (I hesitate to say "best practices") for various scripts (languages??) I don't see this as an IETF function, however.

comments?

vint


On Oct 12, 2009, at 12:06 PM, Paul Hoffman wrote:

At 11:13 AM -0400 10/12/09, John C Klensin wrote: I don't either (see above about "harmless"), except that I think it would be very unfortunate to have to reopen and re-review this document in six months or a year when someone argues that one or two of U+0589, U+1632, U+166E (a special case because I'm sure Eric can advise us as to whether that request would be likely to arise and be plausible), U+1803, and so on should be added to the list for the same types of reasons as the ones Sarmad describes.

Fully agree. I can't speak for Pete, but I am not willing to re-open the document to add characters to a list of optional characters. In fact, I'm not even sure there will be a "we" around in six months: the WG should close down after we have fulfilled our charter.

So I guess I'm making a recommendation about something I should have spotted and made the recommendation about long ago: create a registry for these things.

And here we fully disagree. I think defining the rules for registries for any of the optional parts of the document will take at least another year, and we have already hit the exhaustion point. I see absolutely no upside for a registry of optional suggestions for an Informational RFC, with a significant downside of taking a lot of review time that could be better spent elsewhere.

The reason for that is precisely to prevent our needing to reopen the mapping document to add one or more of these characters, not to treat them with more or less authority.

I see no "need" to reopen the document, ever. If someone wants to prepare a document with a different view, or with the same view but additional characters, that's fine, and quite easy in the RFC process.

I think such additions are extremely probable, either as new scripts are added to Unicode or as communities that are now unrepresented in this WG and probably underrepresented on the Internet show up and explain their needs. Of course, YMMD on that likelihood assessment.

We certainly agree on the likelihood of desired additions, but I think not on the "need" for a revision to the WG's document. _______________________________________________ Idna-update mailing list Idna-update@alvestrand.no http://www.alvestrand.no/mailman/listinfo/idna-update

_______________________________________________ Idna-arabicscript mailing list Arabic Script IDN Working Group (ASIWG) Idna-arabicscript@lists.irnic.ir http://lists.irnic.ir/mailman/listinfo/idna-arabicscript

At 17:16 12/10/2009, Vint Cerf wrote: Folks,

I think we might be able to come to a conclusion this way:

1. it is important that we agree that the canonical, exchange format for domain names contain ONLY U+002E as the label separator 2. UI contexts and practices will likely vary and reasonable choices for local use of "full-stop" characters will vary 3. The mappings document is essentially informational and notional, not prescriptive, so editing it as if it were normative is likely counterproductive 4. The idea of documenting UI practices seems useful, but, as I think Paul argues, this need not be the work of the IDNABIS working group.

If ICANN sees some value in (4) perhaps it can formulate a kind of "known mapping practices" registry (I hesitate to say "best practices") for various scripts (languages??) I don't see this as an IETF function, however.

comments?

vint


On Oct 12, 2009, at 12:06 PM, Paul Hoffman wrote:

> At 11:13 AM -0400 10/12/09, John C Klensin wrote: >> I don't either (see above about "harmless"), except that I think >> it would be very unfortunate to have to reopen and re-review >> this document in six months or a year when someone argues that >> one or two of U+0589, U+1632, U+166E (a special case because I'm >> sure Eric can advise us as to whether that request would be >> likely to arise and be plausible), U+1803, and so on should be >> added to the list for the same types of reasons as the ones >> Sarmad describes. > > Fully agree. I can't speak for Pete, but I am not willing to re-open > the document to add characters to a list of optional characters. In > fact, I'm not even sure there will be a "we" around in six months: > the WG should close down after we have fulfilled our charter. > >> So I guess I'm making a recommendation about something I should >> have spotted and made the recommendation about long ago: create >> a registry for these things. > > And here we fully disagree. I think defining the rules for > registries for any of the optional parts of the document will take > at least another year, and we have already hit the exhaustion point. > I see absolutely no upside for a registry of optional suggestions > for an Informational RFC, with a significant downside of taking a > lot of review time that could be better spent elsewhere. > >> The reason for that is precisely >> to prevent our needing to reopen the mapping document to add one >> or more of these characters, not to treat them with more or less >> authority. > > I see no "need" to reopen the document, ever. If someone wants to > prepare a document with a different view, or with the same view but > additional characters, that's fine, and quite easy in the RFC process. > >> I think such additions are extremely probable, >> either as new scripts are added to Unicode or as communities >> that are now unrepresented in this WG and probably >> underrepresented on the Internet show up and explain their >> needs. Of course, YMMD on that likelihood assessment. > > We certainly agree on the likelihood of desired additions, but I > think not on the "need" for a revision to the WG's document. > _______________________________________________ > Idna-update mailing list > Idna-update@alvestrand.no > http://www.alvestrand.no/mailman/listinfo/idna-update

_______________________________________________ Idna-update mailing list Idna-update@alvestrand.no http://www.alvestrand.no/mailman/listinfo/idna-update

At 17:16 12/10/2009, Vint Cerf wrote: Folks,

I think we might be able to come to a conclusion this way:

1. it is important that we agree that the canonical, exchange format for domain names contain ONLY U+002E as the label separator 2. UI contexts and practices will likely vary and reasonable choices for local use of "full-stop" characters will vary 3. The mappings document is essentially informational and notional, not prescriptive, so editing it as if it were normative is likely counterproductive 4. The idea of documenting UI practices seems useful, but, as I think Paul argues, this need not be the work of the IDNABIS working group.

If ICANN sees some value in (4) perhaps it can formulate a kind of "known mapping practices" registry (I hesitate to say "best practices") for various scripts (languages??) I don't see this as an IETF function, however.

comments?

vint


On Oct 12, 2009, at 12:06 PM, Paul Hoffman wrote:

At 11:13 AM -0400 10/12/09, John C Klensin wrote: I don't either (see above about "harmless"), except that I think it would be very unfortunate to have to reopen and re-review this document in six months or a year when someone argues that one or two of U+0589, U+1632, U+166E (a special case because I'm sure Eric can advise us as to whether that request would be likely to arise and be plausible), U+1803, and so on should be added to the list for the same types of reasons as the ones Sarmad describes.

Fully agree. I can't speak for Pete, but I am not willing to re-open the document to add characters to a list of optional characters. In fact, I'm not even sure there will be a "we" around in six months: the WG should close down after we have fulfilled our charter.

So I guess I'm making a recommendation about something I should have spotted and made the recommendation about long ago: create a registry for these things.

And here we fully disagree. I think defining the rules for registries for any of the optional parts of the document will take at least another year, and we have already hit the exhaustion point. I see absolutely no upside for a registry of optional suggestions for an Informational RFC, with a significant downside of taking a lot of review time that could be better spent elsewhere.

The reason for that is precisely to prevent our needing to reopen the mapping document to add one or more of these characters, not to treat them with more or less authority.

I see no "need" to reopen the document, ever. If someone wants to prepare a document with a different view, or with the same view but additional characters, that's fine, and quite easy in the RFC process.

I think such additions are extremely probable, either as new scripts are added to Unicode or as communities that are now unrepresented in this WG and probably underrepresented on the Internet show up and explain their needs. Of course, YMMD on that likelihood assessment.

We certainly agree on the likelihood of desired additions, but I think not on the "need" for a revision to the WG's document. _______________________________________________ Idna-update mailing list Idna-update@alvestrand.no http://www.alvestrand.no/mailman/listinfo/idna-update

_______________________________________________ Idna-arabicscript mailing list Arabic Script IDN Working Group (ASIWG) Idna-arabicscript@lists.irnic.ir http://lists.irnic.ir/mailman/listinfo/idna-arabicscript

At 17:33 12/10/2009, John C Klensin wrote:


--On Monday, October 12, 2009 12:16 -0400 Vint Cerf <vint@google.com> wrote:

> Folks, > > I think we might be able to come to a conclusion this way: >... > 3. The mappings document is essentially informational and > notional, not prescriptive, so editing it as if it were > normative is likely counterproductive

Agreed (and with your first two points as well).

> 4. The idea of documenting UI practices seems useful, but, as > I think Paul argues, this need not be the work of the > IDNABIS working group.

I agree with you and Paul on this. The key place where I think Paul and I disagree is that I don't see creation of a registry into which candidate label separator characters could be inserted for information (along with whatever explanations are available as to why that is or is not appropriate) as either a big deal or requiring any sort of normative / standards action classification. Neither of us believe it should be part of the effort of this WG (or, I think, any successor WG).

> If ICANN sees some value in (4) perhaps it can formulate a > kind of "known mapping practices" registry (I hesitate to > say "best practices") for various scripts (languages??) > I don't see this as an IETF function, however.

FWIW, I think ICANN knows far less about UIs and similar issues than the IETF and has a tendency to turn any project like this into a big deal. By contrast, the IETF has created several registries that are managed more or less for the convenience of the broader community, without much (or any) normative effect other than public notice that someone is already using a particular identifier and promoting the exchange of information.

If we are not going to create a registry, I'm opposed to fine-tuning "mapping" with additional, last-minute, characters and would argue that the character list should be removed from the dot-oid mapping discussion because the list is not complete and cannot be perfected in a reasonable amount of time.

  john

_______________________________________________ Idna-update mailing list Idna-update@alvestrand.no http://www.alvestrand.no/mailman/listinfo/idna-update

At 17:33 12/10/2009, John C Klensin wrote:


--On Monday, October 12, 2009 12:16 -0400 Vint Cerf <vint@google.com> wrote:

> Folks, > > I think we might be able to come to a conclusion this way: >... > 3. The mappings document is essentially informational and > notional, not prescriptive, so editing it as if it were > normative is likely counterproductive

Agreed (and with your first two points as well).

> 4. The idea of documenting UI practices seems useful, but, as > I think Paul argues, this need not be the work of the > IDNABIS working group.

I agree with you and Paul on this. The key place where I think Paul and I disagree is that I don't see creation of a registry into which candidate label separator characters could be inserted for information (along with whatever explanations are available as to why that is or is not appropriate) as either a big deal or requiring any sort of normative / standards action classification. Neither of us believe it should be part of the effort of this WG (or, I think, any successor WG).

> If ICANN sees some value in (4) perhaps it can formulate a > kind of "known mapping practices" registry (I hesitate to > say "best practices") for various scripts (languages??) > I don't see this as an IETF function, however.

FWIW, I think ICANN knows far less about UIs and similar issues than the IETF and has a tendency to turn any project like this into a big deal. By contrast, the IETF has created several registries that are managed more or less for the convenience of the broader community, without much (or any) normative effect other than public notice that someone is already using a particular identifier and promoting the exchange of information.

If we are not going to create a registry, I'm opposed to fine-tuning "mapping" with additional, last-minute, characters and would argue that the character list should be removed from the dot-oid mapping discussion because the list is not complete and cannot be perfected in a reasonable amount of time.

  john

_______________________________________________ Idna-arabicscript mailing list Arabic Script IDN Working Group (ASIWG) Idna-arabicscript@lists.irnic.ir http://lists.irnic.ir/mailman/listinfo/idna-arabicscript

At 17:49 12/10/2009, Martin J. Dürst wrote:

}´ÓßõÓýv×MÄ®)/jw^¬ú�—

èµáG®)â¢x­Â‹¥u·©­éš¶êÞ¶ˆ§r[yO´è>"žØ^ 3@n+�Ý��ßn¨qü^Ô|z¹Zµè§¶�‹jËgD�è­�ö�viÝ��¤rajx�!¨+yì"¶�Úº[ajØ­²z-•«^Š{ax��¶¬´&¥­šèqë,SíôÓhZ±©®r�hž�«†+-¢¼¢œ€Í�©ÝŠÉ®r�¨­çâ®irzËZnX¬…çH}O´è>0¢é]ØuØD•@�qöö½è�kaŠy�LÂí…êÞ®ý‡�úiØt ,Ti•éžžÖ­Š‰ìjwg{�¦¦W¦z{Z¶*'³ .•ÚÞ²‰oyÚ&j)çjg¬v'ßz·§¶\­jÅ>Ó øjYhÁçbžÈy©Zmébœ€Í�m4܇òzËaz{aŠÌ"­X§uçmæ©®†åzb�ž‹m…éí…çb}÷«zw�ŠÊ'—&Þ·�ž·¢ž Þ²‰oyÜ"¶�©¦º)®&­zf©¦)àŠr�4 ´ÓƧvz-­ë(­÷Šr�4 ´Ó|!‰Èb±©îú+¶éÚµæî¶Ú%z¶›•çb}÷«zw�Šx¦¦W¦z{Z¶*'mèZ¾*+Eè�­Û�j»bœÆ«¶)É� ì´úè}ë,¢°(É©š�©.Šu'Š÷«²+r†Ûiÿü0ÂÌ"µª2jfšr:fj)m¡Ûž®Ëbµª2jfšr:Hvv®¥Ö­zf¢­)à­+-!ÙÚº—Zµæ¥½ë-­©Ýžˆm¶ŸÿÃ��­÷¬¶¶§vz?™¨¥™©ÿ­+-Šwèþ'gjê]j×

At 17:49 12/10/2009, Martin J. Dürst wrote:


On 2009/10/12 1:03, Erik van der Poel wrote: In my opinion, it would be premature to include U+06D4 in the IDNAbis mapping draft (apart from the fact that it is rather late in the Last Calls process to be making such a change).

I agree with Paul that it's not late in the IETF Last Call process.

U+3002 has a much longer history in IDNA and is much more firmly established. If U+06D4 would be mapped to U+002E at the data interchange level (think HTML), there would be a period where IDNA2003 implementations and new implementations would resolve domain names differently.

Was U+06D4 allowed inside a label in IDNA 2003? If yes, then this will indeed be a problem. If not, then the difference is only between being resolved (with appropriate mapping in IDNA 2008) and not resolved (in IDNA 2003), which is an unfortunate but tolerable difference in implementation behavior.

Regards, Martin.

--

  1. -# Martin J. Dürst, Professor, Aoyama Gakuin University
  2. -# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp

_______________________________________________ Idna-arabicscript mailing list Arabic Script IDN Working Group (ASIWG) Idna-arabicscript@lists.irnic.ir http://lists.irnic.ir/mailman/listinfo/idna-arabicscript


At 17:57 12/10/2009, Erik van der Poel wrote:

On Mon, Oct 12, 2009 at 9:49 AM, "Martin J. Dürst" <duerst@it.aoyama.ac.jp> wrote: > On 2009/10/12 1:03, Erik van der Poel wrote: >> U+3002 has a much longer >> history in IDNA and is much more firmly established. If U+06D4 would >> be mapped to U+002E at the data interchange level (think HTML), there >> would be a period where IDNA2003 implementations and new >> implementations would resolve domain names differently. > > Was U+06D4 allowed inside a label in IDNA 2003? If yes, then this will > indeed be a problem. If not, then the difference is only between being > resolved (with appropriate mapping in IDNA 2008) and not resolved (in IDNA > 2003), which is an unfortunate but tolerable difference in implementation > behavior.

U+06D4 is allowed inside a label in IDNA2003. Sorry, I should have been more explicit about that.

Erik

_______________________________________________ Idna-arabicscript mailing list Arabic Script IDN Working Group (ASIWG) Idna-arabicscript@lists.irnic.ir http://lists.irnic.ir/mailman/listinfo/idna-arabicscript


On Mon, Oct 12, 2009 at 09:57:01AM -0700, Erik van der Poel wrote: > On Mon, Oct 12, 2009 at 9:49 AM, "Martin J. Dürst"

> > Was U+06D4 allowed inside a label in IDNA 2003? If yes, then this will > > indeed be a problem. If not, then the difference is only between being > > resolved (with appropriate mapping in IDNA 2008) and not resolved (in IDNA > > 2003), which is an unfortunate but tolerable difference in implementation > > behavior. > > U+06D4 is allowed inside a label in IDNA2003. Sorry, I should have > been more explicit about that.

Ick. Surely we can't be advocating that things are sometimes in-label and sometimes not?

Do we have any evidence, positive or negative, about the use of U+06D4 intra-label?

A

-- Andrew Sullivan ajs@shinkuro.com Shinkuro, Inc. _______________________________________________ Idna-arabicscript mailing list Arabic Script IDN Working Group (ASIWG) Idna-arabicscript@lists.irnic.ir http://lists.irnic.ir/mailman/listinfo/idna-arabicscript

At 18:28 12/10/2009, Vint Cerf wrote: if inside the label then its use as a substitute for U+002E seems highly problematic.


On Oct 12, 2009, at 12:57 PM, Erik van der Poel wrote:

> On Mon, Oct 12, 2009 at 9:49 AM, "Martin J. Dürst" > <duerst@it.aoyama.ac.jp> wrote: >> On 2009/10/12 1:03, Erik van der Poel wrote: >>> U+3002 has a much longer >>> history in IDNA and is much more firmly established. If U+06D4 would >>> be mapped to U+002E at the data interchange level (think HTML), >>> there >>> would be a period where IDNA2003 implementations and new >>> implementations would resolve domain names differently. >> >> Was U+06D4 allowed inside a label in IDNA 2003? If yes, then this >> will >> indeed be a problem. If not, then the difference is only between >> being >> resolved (with appropriate mapping in IDNA 2008) and not resolved >> (in IDNA >> 2003), which is an unfortunate but tolerable difference in >> implementation >> behavior. > > U+06D4 is allowed inside a label in IDNA2003. Sorry, I should have > been more explicit about that. > > Erik

_______________________________________________ Idna-update mailing list Idna-update@alvestrand.no http://www.alvestrand.no/mailman/listinfo/idna-update

At 18:28 12/10/2009, Vint Cerf wrote: if inside the label then its use as a substitute for U+002E seems highly problematic.


On Oct 12, 2009, at 12:57 PM, Erik van der Poel wrote:

On Mon, Oct 12, 2009 at 9:49 AM, "Martin J. Dürst" <duerst@it.aoyama.ac.jp> wrote: On 2009/10/12 1:03, Erik van der Poel wrote: U+3002 has a much longer history in IDNA and is much more firmly established. If U+06D4 would be mapped to U+002E at the data interchange level (think HTML), there would be a period where IDNA2003 implementations and new implementations would resolve domain names differently.

Was U+06D4 allowed inside a label in IDNA 2003? If yes, then this will indeed be a problem. If not, then the difference is only between being resolved (with appropriate mapping in IDNA 2008) and not resolved (in IDNA 2003), which is an unfortunate but tolerable difference in implementation behavior.

U+06D4 is allowed inside a label in IDNA2003. Sorry, I should have been more explicit about that.

Erik

Personal tools